Evaluation of Close-Range Stereo Matching
Algorithms Using Stereoscopic Measurements
Dongjoe Shin, Yu Tao, and Jan-Peter Muller
Abstract
The performance of binocular stereo reconstruction is highly
dependent on the quality of the stereo matching result. In
order to evaluate the performance of different stereo match-
ers, several quality metrics have been developed based on
quantifying error statistics with respect to a set of indepen-
dent measurements usually referred to as ground truth data.
However, such data are frequently not available, particu-
larly in practical applications or planetary data processing.
To address this, we propose a ground truth independent
evaluation protocol based on manual measurements. A
stereo visualization tool has been specifically developed
to evaluate the quality of the computed correspondences.
We compare the quality of disparity maps calculated from
three stereo matching algorithms, developed based on a
variation of
GOTCHA
, which has been used in planetary
robotic rover image reconstruction at
UCL
-
MSSL
(Otto and
Chau, 1989). From our evaluation tests with the images pairs
from Mars Exploration Rover (
MER
) Pancam and the field
data collected in PRoViScout 2012, it has been found that
all three processing pipelines used in our test (
NASA-JPL
,
JR
,
UCL
-
MSSL
) trade off matching accuracy and completeness
differently.
NASA-JPL
’s stereo pipeline produces the most ac-
curate but less complete disparity map, while
JR’s
pipeline
performs best in terms of the reconstruction completeness.
Introduction
Stereo matching has long been a fundamental and challeng-
ing research topic in computer vision. A large number of fully
automated stereo matching algorithms have been developed
since the earliest approach made by Hannah (Hannah, 1974)
and further variations of local algorithms, which rely on the
computation of correlations of local patches, developed in
the 1990s. Follow-on optimisation and statistical machine
learning techniques including dynamic programming (Birch-
field and Tomasi, 1998), Markov random field (Geman, 1984),
graph cuts (Boykov, 2001), belief propagation (Sun
et al
.,
2003), semi-global matching (Hirschmuller, 2008), and seed-
growing algorithms (Lhuillier and Quan, 2002), have been
shown to be able to produce high quality disparity maps, but
it is getting difficult to evaluate various matching algorithms
developed for different purposes.
To our best knowledge, the Middlebury test is the most
influential work on recent stereo evaluation (Scharstein
and Szeliski, 2002). In this test, the authors propose a new
taxonomy of comprehensive stereo algorithms and a C++ test
bed for the quantitative evaluation of dense two-frame stereo
correspondence algorithms. The Middlebury test basically
performs an evaluation based on the error metrics computed
from sparse “ground truth” point pairs or by synthesizing
a warped image from pre-computed dense disparity maps.
Therefore, the reference data plays an important role in the
evaluation process.
When the algorithms were not strong enough to process
complicated scenes, the 3D geometry of reference data does
not need to be complex, but it needs to be dense enough to
evaluate a sparse set of point correspondences produced
by test algorithms. For this reason, Scharstein
et al
. config-
ured a test scene with a set of slanted 2D planes. Since a 2D
homography of a planar object can be easily defined by four
point correspondences, this approach can produce a virtu-
ally complete disparity map of two images from a few manual
correspondences (Scharstein
et al
., 2001). However, as stereo
algorithms evolve, a simple geometry is no longer able to
differentiate advanced algorithms and people need more com-
plex geometry at higher pixel resolution.
Synthetic images can be an option to improve the scene
complexity (Morales and Klette, 2011) but they are generally
insufficient to synthesize practical scenes affected by a range
of noise and various lighting conditions. Alternatively, an ac-
tive 3D sensor can be used to produce reference data. For ex-
ample, a special structured light system was developed in the
2003 Middlebury test, where one or two projectors are used
with a translating camera to create a dense reference dispar-
ity map for a stereo pair (Scharstein and Szeliski, 2003). This
approach is particularly useful as we can have control over
the spatial resolution of a disparity map with higher depth
accuracy. However, a structured light is more suitable for
capturing small objects in a controlled indoor environment.
Geiger
et al
. also pointed out this limitation, mentioning that
higher ranking algorithms from the Middlebury reference
data can go below average when it is tested against the images
from outside the laboratory (Geiger
et al
., 2012).
Creating reference data for multi-view stereo algorithms
could be even more challenging. In addition to classic stereo
matching, estimating external transforms between image pairs
and locating the position of a camera in a previously recon-
structed scene are other imperative features of a multi-view
stereo algorithm (e.g., visual odometry or
SLAM
). Therefore,
the reference data should be registered with correct positional
information. This normally requires combining multiple het-
erogeneous sensors and more complicated calibration steps.
For example, the Middlebury test images for multi-view
stereo algorithms were obtained using a robotic arm that can
move on the surface of one-meter radius sphere with high
precision (Seitz, 2006). In addition, to improve the accuracy
Dongjoe Shin is with the Visual Computing Group, School
of Creative Technologies, University of Portsmouth; and
the Imaging Group, Mullard Space Science Laboratory,
Department of Space & Climate Physics, University College
London (
).
Yu Tao and Jan-Peter Muller are with the Imaging Group,
Mullard Space Science Laboratory, Department of Space &
Climate Physics, University College London.
Photogrammetric Engineering & Remote Sensing
Vol. 84, No. 3, March 2018, pp. 159–167.
0099-1112/17/159–167
© 2018 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.84.3.159
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
March 2018
159