AI reconstructs realistic 3D face from phone camera video

Author: EIS Release Date: Apr 15, 2020


An accurate 3D face model that “doesn’t look creepy” can be reconstructed from smartphone video if AI is allowed to post-process the point cloud, according to Carnegie Mellon University. No laser scanned is required.

“Building a 3D reconstruction of the face has been an open problem in computer vision and graphics because people are very sensitive to the look of facial features,” said Simon Lucey of the CMU Robotics Institute. “Even slight anomalies in the reconstructions can make the end result look unrealistic.”

The first step in the process is standard, beginning with videoing for 15-20, in this case using the slow-motion setting of an iPhone X.

“The high frame rate of slow motion is one of the key things for our method because it generates a dense point cloud,” sad Lucey.

Visual simultaneous localisation and mapping (SLAM) is then used to triangulates points on the face surface while also calculating the path that the camera swept.

At this point, the basic geometry of the face has been established (right), but there will be gaps. If these gaps are filled poorly the results will be disconcerting.

In their second step, the researchers fill in those gaps using deep learning algorithms and ‘non-rigid mesh refinement’.

“Deep learning has a tendency to memorise solutions,” said Lucey, which works against efforts to include distinguishing details of the face. “If you use these algorithms just to find the landmarks, you can use classical methods to fill in the gaps much more easily.”

On a phone, the necessary processing takes 30-40 minutes. “The team’s experiments show that their method can achieve sub-millimetre accuracy, outperforming other camera-based processes,” according to the university.

The work was presented at the IEEE Winter Conference on Applications of Computer Vision (WACV) in Colorado.