An Improved Method of Refining Relative
Orientation in Global Structure from Motion
with a Focus on Repetitive Structure
and Very Short Baselines
X. Wang and C. Heipke
Abstract
Recently, global structure from motion has successfully
gained many followers, mainly because of its computational
speed. Most of these global methods take the parameters of
relative orientation (
ROs
) as input and then perform averag-
ing operations. Therefore, eliminating incorrect
ROs
is of
great significance for improving the robustness of global
structure from motion. In this article, we propose a method
to eliminate wrong
ROs
which have resulted from repetitive
structure and very short baselines. We present two corre-
sponding criteria that indicate the quality of
ROs
. Repetitive
structure is detected based on counts of conjugate points
of the various image pairs, while very short baselines are
found by inspecting the intersection angles of correspond-
ing image rays. By analyzing these two criteria, we detect
and eliminate incorrect
ROs
. As correct
ROs
of image pairs
with a longer baseline nearly parallel to both viewing direc-
tions can be valuable, a method to identify and keep these
ROs
is also part of our approach. We demonstrate the new
method on various data sets, including public benchmarks
as well as close-range images and images from unmanned
aerial vehicles, by inserting our refined
ROs
into a global
structure-from-motion pipeline. The experiments show that
compared to other methods, we can generate the best results.
Introduction
In recent years, structure from motion (
S
impressive development in the fields of computer vision and
photogrammetry (Agarwal
et al.
2009; Wang, Rottensteiner
and Heipke 2019a, 2019b). So-called incremental
SfM
has re-
ceived a notable amount of attention, demonstrated by, for ex-
ample, the success of the software packages Bundler (Snavely
2008), Visual
SFM
(Wu 2011, 2013), and
COLMAP
(Schonberger
2016; Schonberger and Frahm 2016). The general idea is
that one good initial image pair, which normally has enough
correspondences with reasonably large intersection angles, is
first selected to do stereo reconstruction. Additional images
are sequentially added based on some criteria to extend the
photogrammetric block, and bundle adjustment is repetitively
used to refine the results. As Jiang
et al.
(2013), Wang
et al.
(2018), and Wang
et al.
(2019a) have demonstrated, this ap-
proach is impeded by a long computational time and artifacts
such as visual drift. To overcome these drawbacks, Martinec
and Pajdla (2007), Arie-Nachimson
et al.
(2012), Jiang
et al.
(2013), Moulon, Monasse, and Marlet (2013), Wilson and
Snavely (2014), Cui
et al.
(2015), and Wang
et al.
(2019a) have
presented global solutions. Global
SfM
is typically separated
into two steps, global rotation averaging (Hartley and Zisser-
man 2003; Govindu 2001, 2004; Chatterjee and Govindu
2013; Wilson, Bindel and Snavely 2016; Reich and Heipke
2015; Reich, Yang and Heipke 2017) and global translation
estimation (Cui and Tan, 2015; Wang
et al.
2019a, 2019b). The
exterior orientation parameters of all available images are first
simultaneously estimated, followed by only one final bun-
dle adjustment. There are also approaches which solve both
sets of parameters, rotations and translations together, in one
step by using an algebraic characterization of the so-called
multiview essential matrix (Kasten
et al.
2019). Compared to
incremental
SfM
, global
SfM
is more sensitive to outliers in the
relative orientations (
ROs
) between image pairs (Cui and Tan
2015; Wang
et al.
2019b).
Many outliers in
ROs
can be avoided by using the five-point
algorithm combined with
RANSAC
(Fischler and Bolles 1981;
Nister 2004) for computing the parameters of relative orienta-
tion. However, some incorrect
ROs
typically remain undetected,
mainly due to two reasons: repetitive structure (
RS
) and critical
configurations stemming from very short baselines (
VSB
).
Repetitive structure is a characteristic of a single image
and describes the fact that many parts of the image look
similar. Typically, the reason is that the 3D structure of the
scene is repetitive (this is why we speak about repetitive and
xture, as texture refers to the 2D image
nce, when features are extracted, the
re rather similar. Matching images with
ads to many ambiguous point pairs and
many outliers. In our context, an
image pair due to repetitive
structure
is a nonoverlapping image pair for which incorrect
conjugate point pairs were extracted due to these ambiguities.
Such nonoverlapping but nevertheless similar-looking images
can stem from, for example, a set of facade images when the
façade is somewhat symmetric. If enough such incorrect point
pairs are extracted, it is possible that the five-point algorithm
will not be able to detect the error, and incorrect
RO
parame-
ters will be derived.
A critical configuration with a very short baseline results
from improper image acquisition planning, for example when
images are taken in different directions but from basically
the same projection center. In addition, crowd-sourced data
sets such as images available on the Internet are widely used
nowadays. These data sets may contain pairs with critical
configurations as well.
X. Wang and C. Heipke are with the Institute of Photogrammetry
and GeoInformation, Leibniz Universität Hannover, Hannover
D-30167, Germany (
).
Photogrammetric Engineering & Remote Sensing
Vol. 86, No. 5, May 2020, pp. 299–315.
0099-1112/20/299–315
© 2020 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.86.5.299
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2020
299