PE&RS March 2018 Full - page 136

to a threshold (Moravec, 1981). The Förstner detector extracts
interest points by searching for optimal windows using the
auto-correlated matrix and optimizes the point locations
based on a differential edge intersection approach (Förstner,
1986). The Harris detector improves the Moravec detector
by using an auto-related matrix to ensure better repeatability
(Harris and Stephens, 1988). Traditional software packages
equipped with these detectors generally used the normalized
correlation coefficient as the matching metrics. Various con-
straints have been employed to improve the matching perfor-
mance, such as the epipolar constraint (Gruen and Baltsavias,
1988), triangulation constraint (Wu
et al
., 2012), and redun-
dant information from multiple images (Agouris and Schenk,
1992; Zhu
et al
., 2010). The point matches can be optimized
to sub-pixel level by using the least-squares matching meth-
ods (Gruen, 1985 and 2012).
Since the milestone work of the scale invariant feature
transform algorithm (
SIFT
) (Lowe, 1999), the use of invariant
features has offered another paradigm for sparse point match-
ing.
SIFT
combines a scale invariant interest point detector
and a descriptor based on the gradient distribution in the
detected local regions (Lowe, 2004).
SURF
, which was inspired
by the
SIFT
algorithm (Bay
et al
., 2008), incorporated the
Hessian matrix approximation to speed up matching. Other
algorithms that use machine learning techniques to detect
features, such as
SUSAN
(Smith and Brady, 1997) and
FAST
(Rosten and Drummond, 2006), have also been used widely
for point matching. Despite the subsequent introduction of
many new developments, the
SIFT
has proven remarkably
successful, and most subsequent improvements to it were
designed for efficiency.
In the past, a large number of dense matching methods
have been developed to suit different contexts. For dense
matching, disparity computation is the most important step.
Methods used to calculate disparity can be classified into
three main categories: local methods, global methods, and
methods based on a combination of local and global factors.
Local methods simply select the disparity with the mini-
mum cost value; i.e., a local “winner-take-all” optimization.
These local methods feature different strategies, such as block
matching and gradient-based optimization. Block matching
methods calculate disparity by seeking the highest correlation
between the reference image and the matching image within
a region around the pixel of interest. The most significant
drawback of block matching methods is sensitivity to lo-
cally ambiguous regions in images. Gradient-based methods
determine small local disparities between two images by
formulating a differential equation that correlates motion and
image brightness. Notably, local methods share a common
limitation: the uniqueness of matches is only enforced for the
reference image, whereas points in the searching image might
be matched to multiple points (Stentoumis
et al
., 2014).
In contrast to local methods, global methods have been
formulated in an energy-minimization framework (Geiger
et
al
., 2010; Shaobo
et al
., 2012; Blumenthal-Barby and Eisert,
2014). These methods aim to identify a disparity function
that minimizes global energy. The global methods incorporate
different strategies, including dynamic programming, belief
propagation, and graph cut. Dynamic programming approach-
es (Van Meerbergen
et al
., 2002; Veksler, 2005) involve the
calculation of a minimum-cost path through the matrix of all
pairwise matching costs between two corresponding scan-
lines. Belief propagation approaches (Sun
et al
., 2003; Yang
et al
., 2009) attempt to solve the maximum posteriori Markov
Random Field problem using global optimization. The graph
cut method (Boykov
et al
., 2001) was proposed based on the
max flow algorithm in graph theory.
Local methods are considered more straightforward and ef-
ficient for real-time applications and large dataset manipula-
tion. However, these methods are inferior to others in terms of
accuracy and robustness. The elaborate models used by global
methods to describe the matching process tend to increase
computational loads. Therefore, combinations of local and
global methods, which retain the strengths while discard-
ing the weaknesses of each technique, are now considered
the most successful option for image matching. Semi-global
matching (
SGM
) (Hirschmuller, 2011) is the most widely used
of these combination methods.
SGM
was initially proposed to handle complex image
matching and uses mutual information (
MI
) as the cost func-
tion (Viola and Wells III, 1997).
MI
depends on the individual
and joint entropies of two images.
SGM
uses the global cost
function with a smoothness constraint that penalizes discon-
tinuities. In contrast to the original version, the latest cost
function separates the penalty function into two parts: one
with pixel discontinuities, and one with larger discontinuities
(Hirschmuller, 2011), as shown in Equation 1.
E D
C p D
P T D D
P T D D
p
p q
q N
p q
q N
p
p
( )
=
(
)
+
− =
 +
− >


,
1
2
1
1
p
.
(1)
This cost function performs well in flat or radiometrically
uniform areas, or even in areas with lower building densities.
However, the cost terms
C
(
p
,
D
p
) may decrease while the penalty
terms
P T D D
p q
q N
p
2
1
− >
increase in areas with high
densities of buildings. As a result, the matching performance
in these areas may not be as good as expected (with reference
to the experimental analysis in this paper).
Regarding the image matching methods described above,
matching difficulties in urban areas, which are mainly caused
by the occlusion problems from buildings, have rarely been
considered. In most image matching methods, areas of oc-
clusion are addressed by detection after matching. Some
research studies have compensated for occlusion problems
by using hard constraints, such as the uniqueness constraint
(Kolmogorov and Zabih, 2001) and ordering constraint (Silva
and Santos-Victor, 2000). The uniqueness constraint enforces
one-to-one mapping between pixels in two images, while the
ordering constraint preserves order along scanlines. However,
the abilities of these constraints to solve occlusion issues are
limited. The computer vision community has also used image
segmentation to estimate disparities in close-range images
(Bleyer and Gelautz, 2005; Bleyer
et al
., 2011). The resulting
disparity maps are normally interpolated from the matching
results and are relatively smooth. However, images in urban ar-
eas are very complicated. Disparities vary drastically between
buildings, and may not be smooth even within a single build-
ing. Therefore, a direct disparity interpolation is not rigorous.
This paper proposes an integrated image matching and
segmentation approach. By incorporating image matching
with image segmentation, objects on images with significantly
different disparities will be handled separately. The image
segmentation results will be used for occlusion filtering and
similarity measurements to provide better matching results
for 3D surface reconstruction in urban areas.
Integrated Image Matching and Segmentation Approach
The integrated image matching and segmentation approach,
named
SATM
+, is based on our existing self-adaptive triangula-
tion-constrained matching (
SATM
) framework (Wu
et al
., 2011,
2012). In the
SATM
framework, the first step is to identify
several robust seed points on stereo images, which are used to
136
March 2018
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
111...,126,127,128,129,130,131,132,133,134,135 137,138,139,140,141,142,143,144,145,146,...170
Powered by FlippingBook