To improve self-driving algorithms, add the concept of empty space

Author: EIS Release Date: Jul 1, 2020


Teaching self-driving artificial intelligence about the empty space around it could improve the recognition of surrounding objects, according to researchers at Carnegie Mellon University – combining the merit of classic mapping and deep learning for 3D object detection.

According to the university, when tested against a standard benchmark, the new method outperformed the previous top-performing technique – improving detection by 10.7% for cars, 5.3% for pedestrians, 7.4% for trucks, 18.4% for buses and 16.7% for articulated lorries.

The technique follows an observation that data from a vehicle lidar is 2.5D rather than 3D – its point cloud represents the surface around the vehicle which the laser hits. This is less than ideal if the object identification algorithm is comparing that data with true 3D representations in a library, according to the university.

“Perception systems need to know their unknowns,” said researcher Peiyun Hu, from CMU’s robotics institute.

Speaking of the above diagram, Hu told Electronics Weekly: “Intuitively, red objects that do not touch the green free space are occluded from the sensor’s line of sight. By providing an algorithm with the green free space region, the algorithm can better reason about what parts of the scene are empty [green], visible [outer boundary of the green] or occluded [everything else].”

The CMU team borrowed the idea of free space and occupied space from a mapping technique known as ‘occupancy mapping’. “Current deep learning based algorithms for lidar 3D object detection do not explicitly reason about such cues,” said Hu.

In a paper being presented at the (now virtual) Computer Vision and Pattern Recognition (CVPR) conference this week, the CMU researchers argue that representing 2.5D data as collections of points destroys hidden information about free space, and that ‘3D raycasting’ can recover this information in a way that can be incorporated into batch-based gradient learning.

“We describe a simple approach to augmenting voxel-based networks with visibility: we add a voxelised visibility map as an additional input stream,” according to a published abstract of the paper. “In addition, we show that visibility can be combined with two crucial modifications common to state-of-the-art 3D detectors: synthetic data augmentation of virtual objects and temporal aggregation of lidar sweeps over multiple time frames.”

On an example system, the computation takes 24ms to run, compared with 100ms for a lidar sweep.