We present a novel method to obtain fine-scale detail in 3D reconstructions generated with RGB-D cameras or other commodity scanning devices. As the depth data of these sensors is noisy, truncated signed distance fields are typically used to regularize out this noise in the reconstructions, which unfortunately over-smooths results. In our approach, we leverage RGB data to refine these reconstructions through inverse shading cues, as color input is typically of much higher resolution than the depth data. As a result, we obtain reconstructions with high geometric detail — far beyond the depth resolution of the camera itself — as well as highly-accurate surface albedo, at high computational efficiency. Our core contribution is shading-based refinement directly on the implicit surface representation, which is generated from globally-aligned RGB-D images. We formulate the inverse shading problem on the volumetric distance field, and present a novel objective function which jointly optimizes for fine-scale surface geometry and spatially-varying surface reflectance. In addition, we solve for incident illumination, allowing application in general and unconstrained environments. In order to enable the efficient reconstruction of sub-millimeter detail, we store and process our surface using a sparse voxel hashing scheme that we augmented by introducing a grid hierarchy. A tailored GPU-based Gauss-Newton solver enables us to refine large shape models to previously unseen resolution within only a few seconds. Non-linear shape optimization directly on the implicit shape model allows for a highly-efficient parallelization, and enables much higher reconstruction detail. Our method is versatile and can be combined with a sea of scanning approaches based on implicit surfaces.
We provide a dataset containing RGB-D data of a variety of objects, for the purpose of shading-based refinement. The RGB-D data contains sequences both taken from a PrimeSense sensor and generated from multi-view stereo. Each sequence contains color and depth images, along with the camera trajectory. Additionally, we provide meshes extracted from the original TSDFs and refined TSDFs. Please refer to the respective publication when using this data.
For each scene, we provide a zip file containing a sequence of tracked RGB-D camera frames. We use the VoxelHashing framework for initial camera tracking and reconstruction. Several sequences of larger objects, as denoted, use bundle-adjusted camera trajectories (please refer to the paper for further detail). Each sequence contains:
Color frames (frame-XXXXXX.color.png): RGB, 24-bit, PNG
Depth frames (frame-XXXXXX.depth.png): depth (mm), 16-bit, PNG (invalid depth is set to 0)
Camera poses (frame-XXXXXX.pose.txt): camera-to-world
Camera Calibration: The color and depth camera intrinsics for each sequence are provided in colorIntrinsics.txt and depthIntrinsics.txt. Note that these are the default values provided and we did not perform any calibration.