PE&RS March 2018 Full

Evaluation of Close-Range Stereo Matching

Algorithms Using Stereoscopic Measurements

Dongjoe Shin, Yu Tao, and Jan-Peter Muller

Abstract

The performance of binocular stereo reconstruction is highly

dependent on the quality of the stereo matching result. In

order to evaluate the performance of different stereo match-

ers, several quality metrics have been developed based on

quantifying error statistics with respect to a set of indepen-

dent measurements usually referred to as ground truth data.

However, such data are frequently not available, particu-

larly in practical applications or planetary data processing.

To address this, we propose a ground truth independent

evaluation protocol based on manual measurements. A

stereo visualization tool has been specifically developed

to evaluate the quality of the computed correspondences.

We compare the quality of disparity maps calculated from

three stereo matching algorithms, developed based on a

variation of

GOTCHA

, which has been used in planetary

robotic rover image reconstruction at

UCL

-

MSSL

(Otto and

Chau, 1989). From our evaluation tests with the images pairs

from Mars Exploration Rover (

MER

) Pancam and the field

data collected in PRoViScout 2012, it has been found that

all three processing pipelines used in our test (

NASA-JPL

,

JR

,

UCL

-

MSSL

) trade off matching accuracy and completeness

differently.

NASA-JPL

’s stereo pipeline produces the most ac-

curate but less complete disparity map, while

JR’s

pipeline

performs best in terms of the reconstruction completeness.

Introduction

Stereo matching has long been a fundamental and challeng-

ing research topic in computer vision. A large number of fully

automated stereo matching algorithms have been developed

since the earliest approach made by Hannah (Hannah, 1974)

and further variations of local algorithms, which rely on the

computation of correlations of local patches, developed in

the 1990s. Follow-on optimisation and statistical machine

learning techniques including dynamic programming (Birch-

field and Tomasi, 1998), Markov random field (Geman, 1984),

graph cuts (Boykov, 2001), belief propagation (Sun

et al

.,

2003), semi-global matching (Hirschmuller, 2008), and seed-

growing algorithms (Lhuillier and Quan, 2002), have been

shown to be able to produce high quality disparity maps, but

it is getting difficult to evaluate various matching algorithms

developed for different purposes.

To our best knowledge, the Middlebury test is the most

influential work on recent stereo evaluation (Scharstein

and Szeliski, 2002). In this test, the authors propose a new

taxonomy of comprehensive stereo algorithms and a C++ test

bed for the quantitative evaluation of dense two-frame stereo

correspondence algorithms. The Middlebury test basically

performs an evaluation based on the error metrics computed

from sparse “ground truth” point pairs or by synthesizing

a warped image from pre-computed dense disparity maps.

Therefore, the reference data plays an important role in the

evaluation process.

When the algorithms were not strong enough to process

complicated scenes, the 3D geometry of reference data does

not need to be complex, but it needs to be dense enough to

evaluate a sparse set of point correspondences produced

by test algorithms. For this reason, Scharstein

et al

. config-

ured a test scene with a set of slanted 2D planes. Since a 2D

homography of a planar object can be easily defined by four

point correspondences, this approach can produce a virtu-

ally complete disparity map of two images from a few manual

correspondences (Scharstein

et al

., 2001). However, as stereo

algorithms evolve, a simple geometry is no longer able to

differentiate advanced algorithms and people need more com-

plex geometry at higher pixel resolution.

Synthetic images can be an option to improve the scene

complexity (Morales and Klette, 2011) but they are generally

insufficient to synthesize practical scenes affected by a range

of noise and various lighting conditions. Alternatively, an ac-

tive 3D sensor can be used to produce reference data. For ex-

ample, a special structured light system was developed in the

2003 Middlebury test, where one or two projectors are used

with a translating camera to create a dense reference dispar-

ity map for a stereo pair (Scharstein and Szeliski, 2003). This

approach is particularly useful as we can have control over

the spatial resolution of a disparity map with higher depth

accuracy. However, a structured light is more suitable for

capturing small objects in a controlled indoor environment.

Geiger

et al

. also pointed out this limitation, mentioning that

higher ranking algorithms from the Middlebury reference

data can go below average when it is tested against the images

from outside the laboratory (Geiger

et al

., 2012).

Creating reference data for multi-view stereo algorithms

could be even more challenging. In addition to classic stereo

matching, estimating external transforms between image pairs

and locating the position of a camera in a previously recon-

structed scene are other imperative features of a multi-view

stereo algorithm (e.g., visual odometry or

SLAM

). Therefore,

the reference data should be registered with correct positional

information. This normally requires combining multiple het-

erogeneous sensors and more complicated calibration steps.

For example, the Middlebury test images for multi-view

stereo algorithms were obtained using a robotic arm that can

move on the surface of one-meter radius sphere with high

precision (Seitz, 2006). In addition, to improve the accuracy

Dongjoe Shin is with the Visual Computing Group, School

of Creative Technologies, University of Portsmouth; and

the Imaging Group, Mullard Space Science Laboratory,

Department of Space & Climate Physics, University College

London (

dongjoe.shin@port.ac.ul

).

Yu Tao and Jan-Peter Muller are with the Imaging Group,

Mullard Space Science Laboratory, Department of Space &

Climate Physics, University College London.

Photogrammetric Engineering & Remote Sensing

Vol. 84, No. 3, March 2018, pp. 159–167.

0099-1112/17/159–167

and Remote Sensing

doi: 10.14358/PERS.84.3.159

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

March 2018

159

PE&RS March 2018 Full - page 159

Warning.