pixels in our experiments. Table 9 shows the basic infor-
mation of the key points using different feature detection
algorithms, including
BRISK
and
SIFT
, in the three image pairs.
The number of correspondences of the three image pairs
using different combinations of detector and descriptors is
shown in Table 10. Although all detection algorithms can find
correspondences, the proposed
PPD
descriptor can find more
matches than
SIFT
,
BRISK
, and TFeat. Figures 17, 18, and 19
show the correspondences in oblique image pairs using differ-
ent combinations of handcrafted feature detection algorithms
and learned descriptors.
Comparison of the Algorithm Efficiency
The testing experiments for deep learning–based descriptors
are carried out on a
GPU
with the following configuration: an
NVIDIA GTX
1060
GPU
with 6 GB memory. The nonlearning
descriptors are generated on a CPU with the following con-
figuration: an Intel Core i7-7700HQ 2.8
GHz
processor. Since
the learned deep descriptors are based on Python, the default
OpenCV code for traditional handcr
BRISK
in Python are selected in the
As shown in Table 11, the binary
fastest among all descriptors and ne
time compared with the classical floating-point
SIFT
descrip-
tor. The learning-based descriptors, both TFeat and the pro-
posed
PPD
, generate 128-dimensional floating-point descrip-
tors like
SIFT
. TFeat is slightly slower than the
SIFT
descriptor,
and the proposed
PPD
is almost 3.3 times slower than
SIFT
.
Table 11. Time consumption of extracting a descriptor from a
patch using different algorithms.
BRISK
SIFT
TFeat
Proposed
Time (ms)
0.019
0.036
0.045
0.119
Discussion
From the extensive experiments, the proposed learning-based
deep descriptor
PPD
was proven to be an effective approach
to extract descriptors from a given image patch. Specifically,
several practical observations from the experiments are sum-
marized as follows:
1. The patch descriptor extracted by the proposed learning-
based deep descriptor
PPD
is superior to other state-of-the-
art learning-based descriptors as well as the handcrafted
feature descriptors on the Brown data set. Except for the
HardNet descriptor, the proposed
PPD
descriptor obtains
higher performance on the HPatches benchmark data
set when those learning-based descriptors are trained on
Liberty of the Brown data set. The experiments on real re-
mote sensing data sets show that the proposed
PPD
can also
be applied on feature matching in remote sensing image
pairs, and it can find more correct correspondences than
classical handcrafted feature descriptors, such as
BRISK
,
SIFT
,
ORB
,
SURF
, and
AKAZE
. In addition, the descriptors
extracted by the proposed
PPD
are more distinctive such
that the correspondences contain fewer incorrect matches
when compared to the handcrafted
SIFT
descriptor. Table
12 shows the correct ratios of the correspondences using
the
SIFT
and
PPD
descriptors. Figure 20 indicates that the
proposed
PPD
contains fewer incorrect matches compared
with the
SIFT
descriptor. This is because the learned de-
scriptor
PPD
has a stronger distinctive ability to distinguish
the nonmatching patches from all patches.
2. The learned descriptor
PPD
is more robust to viewpoint
variation compared to the classical descriptors
BRISK
and
SIFT
. Figure 11 illustrates that the learned deep descriptor
to handcrafted features in
v_graf-
wpoint changes using either
SIFT
terest point. This finding suggests
descriptor may have the poten-
tial to solve oblique image matching problem, which is a
bottleneck in 3D city modeling in photogrammetry due
to the nonzero pitch and roll angles of the platform. The
experiments on
i_leuven
reveal that the proposed
PPD
exceeds the
SIFT
descriptor but is worse than the
BRISK
de-
scriptor. Since
SIFT
is a widely used feature detection and
matching algorithm in the field of photogrammetry, the
results on remote sensing, including satellite and oblique
image pairs, suggest that it is possible to use the proposed
descriptor as a replacement to the
SIFT
descriptor in aerial
images captured on different illumination and viewpoint
change conditions. Note that the learned descriptor
PPD
used in this article is trained on only one data set: Liberty.
It is possible that the effectiveness of the
PPD
descriptor
can significantly increase if we train the deep descriptor
on a data set that contains more diversity, including view-
point and illumination variations.
3. We observed that the proposed deep descriptor
PPD
is
about 3.3 times slower than the classical
SIFT
descriptor.
This is an inevitable limitation of learning-based descrip-
tors. One way to decrease the matching time is to use a
very shallow network to learn the descriptor for an image
patch, which is adopted in TFeat. However, the perfor-
mance of TFeat based on a shallow network is clearly
worse than
PPD
, which uses deeper layers. We all know
that the speed of extracting a deep descriptor using a
Table 12. The correct ratio of the matching results for the three image pairs using SIFT and the proposed PPD descriptor. Total:
the number of correspondences before RANSAC. Correct: the number of correspondences after RANSAC. Ratio: the ratio
between Correct and Total.
Pair1
Pair2
Pair3
Total
Correct
Ratio
Total
Correct
Ratio
Total
Correct
Ratio
PPD
2660
2274
0.855
2402
1983
0.826
6145
5721
0.931
SIFT
2083
1296
0.622
1834
1166
0.636
5098
3749
0.735
Figure 20. Feature point matching results of image pair Pair1 using
SIFT
and the proposed
PPD
descriptor.
684
September 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING