4FP-Structure: A Robust Local
Region Feature Descriptor
Jiayuan Li, Qingwu Hu, and Mingyao Ai
Abstract
Establishing reliable correspondence for images of the same
scene is still challenging work due to repetitive texture and
unknown distortion. In this paper, we propose a region-
matching method to simultaneously filter false matches
and maximize good correspondence between images, even
those with irregular distortion. First, a novel region descrip-
tor, represented by a structure formed by four feature points
(
4FP
-Structure), is presented to simplify matching with severe
deformation. Furthermore, an expansion stage based on the
special
4FP
-Structure is adapted to detect and select as many
high location accuracy correspondences as possible under a
local affine-transformation constraint. Extensive experiments
on both rigid and non-rigid image datasets demonstrate that
the proposed algorithm has a very high degree of correctness
and significantly outperforms other state-of-the-art methods.
Introduction
As a basic step for many remote sensing and computer vision
applications, such as image registration (Brown and Lowe,
2003), structure from motion (Snavely
et al
., 2006), and
simultaneous localization and mapping (
SLAM
) (Montemerlo
et al
., 2002), automatic image matching has been well studied
in recent years. Current feature matching algorithms (Bay
et
al
., 2008; Ke and Sukthankar, 2004; Lourenço
et al
., 2012;
Lowe, 2004; Rublee
et al
., 2011; Tola
et al
., 2010) typically
consist of three major stages: keypoint detection, keypoint
description and keypoint matching. In the first stage, salient
and stable interest points are extracted. These keypoints are
then described based on their photometric neighborhoods
using properties such as local gradients. In the third step, the
distances between the descriptor vectors are calculated to
recognize reliable correspondences. Among these methods,
the most famous is the scale-invariant feature transform (
SIFT
)
(Lowe, 2004) due to its robustness to image scale, rotation,
illumination and viewpoint change.
For rigid scenes, such a framework can achieve remarkable
results. Point correspondences can be produced with high
correctness rate. Although there are some false matches be-
cause of ambiguities that arise from poor or repetitive texture,
a postprocessing step such as
RANSAC
(Fischler and Bolles,
1981) or graph matching (Conte
et al
., 2004) can be adopted.
The
RANSAC
algorithm is a robust technique for model
fitting with noise and outliers, which has been widely used
in computer vision and machine learning. The basic idea of
RANSAC
is simple but effective: first, randomly select a subset
of correspondences to compute the candidate fundamental
or homography matrix because perspective images satisfy the
epipolar or homography constraint. Then, count the number
of correspondences that support this transformation model.
If the number is sufficiently large, the transformation matrix
can be considered a good solution. The matches that sup-
port it will be accepted as inliers; in contrast, others will be
discarded as outliers.
RANSAC
, however, works well only if
two prerequisites are satisfied. The first is a sufficiently high
inlier rate. Literature (Liu and Marlet, 2012) reports that
RANSAC
-like (Chum and Matas, 2005b; Chum
et al
., 2003; Torr
and Zisserman, 2000) methods may fail and become very
time-consuming when the inlier rate is less than 50 percent.
If the inlier rate is very small, the number of required itera-
tions becomes huge. The other is the transformation model.
A putative model must be given in advance, and the inlier set
should satisfy this model well.
Graph matching (Cho and Lee, 2012; Conte
et al
., 2004;
Duchenne
et al
., 2011) is another powerful and general tool
for feature matching. It represents scene images as graphs
using feature points, and correct correspondences can be ex-
tracted by solving a global optimization function to minimize
the structural distortions between graphs (Cho and Lee, 2012).
Unlike the
RANSAC
algorithm which only uses rigid geometric
constraints, graph matching can also be applied to non-rigid
scenes. However, current methods still assume that the inlier
rate is relatively high. The large number of outliers aris-
ing from strong distortion may make them impractical. For
instance, Duchenne
et al
. (2011) show that if the outlier rate is
more than 70 percent, the performance of graph matching will
severely drop. Another problem of graph matching is that it is
NP
-hard, so the computational costs in time and memory limit
the permissible sizes of input graphs.
In this paper, we also focus on feature matching for non-
rigid scenes, e.g., fisheye images. A fisheye lens has a large
field of view (
FOV
), which is needed for many vision tasks
in photogrammetry and computer vision. For instance, five
fisheye images are sufficient for 360° panoramic stitching,
but nine perspective images are needed; self-driving vehicles
(Geiger
et al
., 2012) need a large
FOV
to accurately sense the
environment to plan their route. However, fisheye images
have an inherent drawback: distortion is severe. Because of
that,
SIFT
(Lowe, 2004) usually cannot work well, and the
outlier rate may be very high (higher than 50 percent). In
addition, a fisheye image no longer satisfies the homography
constraint and has its own epipolar geometry, which can be
applied only if the calibration information is provided. These
issues make feature matching challenging, as the prerequisites
of
RANSAC
and graph matching are not well satisfied.
To exactly distinguish inliers from outliers for both rigid
and non-rigid images, a region-matching method is proposed.
We first define a
4FP
-Structure, formed by four neighbor-
hood feature points, to represent the local region. Using
local regions instead of feature points for matching has two
advantages: (a) The
4FP
-Structure is a 4-node graph that has
the ability to resist the distortion in a small region, and it
contains four feature points that can restrain each other to
School of Remote Sensing and Information Engineering,
Wuhan University, Wuhan, China (
).
Photogrammetric Engineering & Remote Sensing
Vol. 83, No. 12, December 2017, pp. 813–826.
0099-1112/17/813–826
© 2017 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.83.12.813
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
November 2017
813