terrain; however, there are two difficulties in an aerial im-
age task (Song
et al.
2019). One is the multi-temporal image
matching, meaning that radiation information of the pixel can
unexpectedly change greatly due to weather and sun condi-
tions. For example, one image is taken in the daytime and the
other at night. Registration of multi-temporal images is gener-
ally considered difficult because of the scene variation; that
is, illumination variation can occur when images are obtained
at different times (Zhao and Goshtasby 2016). The other dif-
ficulty of point matching in aerial images results from the
viewpoint variation. An unmanned aerial vehicle (
UAV
) could
always rotate during a flight, so an on-board camera may not
always fly parallel to the ground; that is, viewpoint variation
occurs between adjacent images. The registration could be
more challenging when the images are taken in urban areas
where building exhibit different appearances according to the
angle of the camera. The learned descriptors
PPD
and TFeat
are used to evaluate the image matching performance between
images under viewpoint and illumination variation along
with the classical handcrafted descriptors
SIFT
and
BRISK
.
Close range images:
i_leuven and v_graffiti
, two image pairs
under illumination and viewpoint change, from the HPatches
data set are used as the experimental data set. Table 6 shows
basic information, including the change type of the two image
pairs and the number of interest points extracted from the two
image pairs using
BRISK
and
SIFT
detector.
The number of correspondences after outlier removal using
RANSAC for different combinations of detectors and de-
scriptors is shown in Figure 11. For the image pair
v_graffiti
with rotation variation, the learned deep descriptors, includ-
ing TFeat and the proposed
PPD
, obviously outperform the
traditional handcrafted descriptors, such as
SIFT
and
BRISK
.
Figure 8. Feature point matching r
Pair1 after outliers removal using
RANSAC
through different
combinations of different detectors and descriptors. The
first row of images use
BRISK
detector to detect interest
points and (a)
BRISK
, (b) TFeat, and (c) the proposed
PPD
to
obtain the descriptors. Analogously, the second to the fifth
rows show the correspondences using the
SIFT
,
ORB
,
SURF
,
and
AKAZE
detection algorithms.
Figure 9. Feature point matching results of image pair
Pair2 after outlier removal using
RANSAC
through different
combinations of different detectors and descriptors.
atching results of image pair
Pair3 after outlier removal using
RANSAC
through different
combinations of different detectors and descriptors.
Table 5. The number of correspondences of the three image
pairs using different combinations of detector and descriptors.
PPD is the proposed learned deep descriptor. Bold numbers
indicate that the corresponding descriptors have the top
performance.
Combination
Pair1
Pair2
Pair3
BRISK
2658
2047
10 361
BRISK+TFeat
2717
2366
4081
BRISK+PPD
5112
4081
12 477
ORB
3017
1690
12 236
ORB+TFeat
3095
1978
11 301
ORB+PPD
4031
2266
13 819
AKAZE
2030
1727
4432
AKAZE+TFeat
1952
1605
5080
AKAZE+PPD
3051
2522
5856
SIFT
1296
1166
3749
SIFT+TFeat
1493
1205
4377
SIFT+PPD
2274
1943
5721
SURF
1223
1205
4999
SURF+TFeat
734
745
3984
SURF+PPD
1688
1450
5471
680
September 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING