September 2019 Public - page 633

Enhanced 3D Mapping with an RGB-D Sensor
via Integration of Depth Measurements
and Image Sequences
Bo Wu, Xuming Ge, Linfu Xie, and Wu Chen
Abstract
State-of-the-art visual simultaneous localization and mapping
(
SLAM
) techniques greatly facilitate three-dimensional (3D)
mapping and modeling with the use of low-cost red-green-
blue-depth (
RGB-D
) sensors. Howeve
such sensors is limited due to the w
red (
IR
) camera, which provides dep
the practicability of such sensors in
ing is limited. To address this limita
solution for enhanced 3D mapping using a low-cost
RGB-D
sensor. We carry out state-of-the-art visual
SLAM
to obtain 3D
point clouds within the mapping range of the
RGB-D
sensor
and implement an improved structure-from-motion (
SfM
) on
the collected
RGB
image sequences with additional constraints
from the depth information to produce image-based 3D point
clouds. We then develop a feature-based scale-adaptive reg-
istration to merge the gained point clouds to further generate
enhanced and extended 3D mapping results. We use two chal-
lenging test sites to examine the proposed method. At these
two sites, the coverage of both generated 3D models increases
by more than 50% with the proposed solution. Moreover, the
proposed solution achieves a geometric accuracy of about
1% in a measurement range of about 20 m. These positive
experimental results not only demonstrate the feasibility and
practicality of the proposed solution but also its potential.
Introduction
Red-green-blue-depth (
RGB-D
) sensors (such as the Kinect or
the Structure sensors) have remarkable advantages, such as
mobility and low cost, and have been used extensively in
three-dimensional (3D) mapping and visual simultaneous
localization and mapping (
SLAM
) (Dryanovski
et al.
2013). An
RGB-D
sensor captures
RGB
images by a
RGB
camera and pixel-
wise depth information by an infrared (
IR
) camera together
with an
IR
projector, and it can thus produce textured 3D
point clouds in the object coordinate system via transforma-
tion from the
RGB
camera coordinate system to the
IR
camera
coordinate system (Tang
et al.
2016).
RGB-D
sensors have thus
been used for 3D mapping and modeling both in indoor and
outdoor environments in recent years (Henry
et al.
2010; Kerl
et al.
2013; Tang
et al.
2016). Although
RGB-D
sensors can be
used to build 3D models of unprecedented richness, they
have drawbacks that limit their application in practice: they
measure depth only up to a limited distance (typically less
than 5 m), and the gained depth values are much noisier than
those provided by traditional laser scanners. Moreover, the
field of view of a depth camera is far more constrained than
that of traditional laser scanners that are typically used for 3D
mapping and modeling.
The above-mentioned drawbacks of
RGB-D
sensors narrow
the scope of their application in 3D mapping and modeling
(Ye and Wu 2018). For example, when it is impossible to get
close enough to scan targets in the working environment, an
RGB-D
sensor cannot provide desirable results. Other solutions
are therefore needed to improve the feasibility and practical-
aper introduces the structure-from-
SfM
stereo (
MVS
) methods to generate
atively distant ranges using
RGB
ment the 3D models obtained from
the
ange. The proposed solution has
three main advantages. (1) It requires no additional effort;
although the
SfM
and
MVS
are offline productions, the
RGB
image sequences can be simultaneously collected with the
online
SLAM
. (2) It provides improved
SfM
constrained by the
additional depth information. (3) It uses scale-adaptive regis-
tration to fuse the multisensor point clouds.
The remainder of this paper is organized as follows. In the
section “Related Work”, the paper briefly introduces works
related to the
RGB-D
SLAM
and
SfM
. In the section “Enhanced
3D Mapping by Integrating Depth Measurements and Image
Sequences”, the proposed solution for enhanced 3D mapping
is described in detail. In the “Experimental Evaluation” sec-
tion, two challenging cases are tested to demonstrate the posi-
tive properties of the proposed system. In the final section,
concluding remarks are presented.
Related Work
The advent of
RGB-D
sensors has led to a great deal of progress
in
SLAM
. Typically, a visual
SLAM
system consists of a camera-
tracking frontend that uses visual odometry (Engel
et al.
2014;
Kerl
et al.
2013) and a backend that generates and maintains
a map of key-frames and reduce global drift via loop closure
detection and map optimization (Mur-Artal and Tardós 2017;
Gao
et al.
2018). Most state-of-the-art methods estimate six
degrees of freedom between adjacent frames based on the
sparsely selected visual features to represent the camera
motion (Liu
et al.
2018). The two main classical approaches
are to minimize the photometric error between consecu-
tive stereo pairs (Comport
et al.
2007) and to minimize the
geometric error between 3D points (e.g., the iterative closest
point [
ICP
] (Besl and McKay 1992) and the three-dimensional
normal-distributions transform (Magnusson 2009). Tykkälä
et
al.
(2011) and Whelan
et al.
(2013) used the minimization of
photometric and geometric error to estimate camera motion.
Newcombe
et al.
(2011) proposed an incremental strategy to
register
RGB-D
frames. In recent years, visual-inertial
SLAM
has been increasingly used (Hesch
et al.
2014) and these
The Hong Kong Polytechnic University, Department of Land
Surveying & Geo-Informatics (
.
Photogrammetric Engineering & Remote Sensing
Vol. 85, No. 9, September 2019, pp. 633–642.
0099-1112/19/633–642
© 2019 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.85.9.633
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
September 2019
633
611...,623,624,625,626,627,628,629,630,631,632 634,635,636,637,638,639,640,641,642,643,...702
Powered by FlippingBook