Teaching computers what’s placed in their environment
What are Apple, Facebook, and Snap planning to make using “contextually aware systems”?
Apple has won a patent that takes point cloud data in the environment and creates 3D reconstructions. Basically whatever the device is, there’s sensors that will be able to get spacial point data, and then a method to turn that into virtual objects for applications. From the patent abstract:
A voxel feature learning network receives a raw point cloud and converts the point cloud into a sparse 4D tensor comprising three-dimensional coordinates (e.g. X, Y, and Z) for each voxel of a plurality of voxels and a fourth voxel feature dimension for each non-empty voxel. In some embodiments, convolutional mid layers further transform the 4D tensor into a high-dimensional volumetric representation of the point cloud. In some embodiments, a region proposal network identifies 3D bounding boxes of objects in the point cloud based on the high-dimensional volumetric representation. In some embodiments, the feature learning network and the region proposal network are trained end-to-end using training data comprising known ground truth bounding boxes, without requiring human intervention.
Going deeper on the applications of this, from Mathew Olson in The Information:
Apple’s patent claims the company’s system can provide more precision than past technologies in detecting 3D objects. Using a LiDAR scanner, it builds a 3D representation of the world out of voxels — a grid of blocks — which provides an idea of what space is empty and filled with objects. Further computer vision processing can identify whether a blob of voxels represents a particular object, like a car or a person.
This type of technology is obviously helpful for making self-driving cars safe for pedestrians and drivers alike. But such improvements could also trickle down to much smaller AR devices over time, allowing applications that more accurately identify users’ surroundings and believably place AR objects in them. As we’ve reported, Apple is looking to include LiDAR in its upcoming mixed reality headset.
Much of the popular consumer AR applications we see today are entertainment — face effects on IG or Snap. This patent moves further towards the “assistant” model where an AR app builds up some context of the environment and time to provide some value. This is the same AR strategy coming out of Facebook:
We’re talking about a contextually aware system that has a sense that if you say “Where [are] my keys,” and you’re about to leave the house, you’re probably looking for your car keys. But if you’re about to come into the house, you’re probably looking for your house keys.
It’s really a question of artificial intelligence and sensors and awareness. The more context a machine has, the more efficient I can be with my intentions.
Gathering more data and boiling down insights from that data using ML is nothing new for business applications. Though, who will be served by that data or insights? I can imagine a situation where that info is sold to give me commercial recommendations. You can identify the objects around me, and then that becomes additional real estate for ads. Or additional info for targeting. That feels more likely to happen with Facebook than Apple. Well, to an extent this is happening at Snap:
Snap is planning a bigger push into online shopping with a new feature in the Snapchat app that will recommend clothes users can buy based on photos they upload to the messaging app… To aid the development of the feature, Snap last fall purchased Craze, the New York–based startup behind shopping app Screenshop, said the two people with knowledge of the matter. Screenshop uses software to detect clothing and furniture in screenshots people upload from their phones. It then makes it easy to buy exactly those items or ones similar to them on the internet.
On a lighter note and thinking more creatively, if the environment around you was recorded spatially and indexed, how might you use that day to day? What would you do with that information? Is that even useful?
Inspo that comes to mind here is that beeping sound when you get close to something while reversing a car. Sensors around the car create some representation of the car in space, and the audio cues helps me navigate my environment where precision is necessary. That’s valuable right in context, without the need to sell that information or place ads in it.