convolutional network is largely dependent on the hard-
ware configuration. With the development of hardware
technology, it is possible that deep learning–based meth-
ods will take less time to extract the deep descriptor.
Conclusions
In this article, we present a pyramid convolutional triplet
neural network and a novel distance-based loss function to
learn the patch descriptor. First, the hard mining strategy
is used to select the hardest negative patch and the positive
patch to form a triplet sample for a given patch. Second, a
pyramid network is applied to the first convolutional layer to
incorporate the global context of the image patch. Finally, we
design a new distance loss function that does not need to set
the margin for a triplet network manually, which could avoid
the scale problem always occurring in the triplet network.
Experiments demonstrate that the proposed deep descrip-
tor
PPD
is the most effective descript
features and the learning-based dee
benchmark. Except HardNet, the pr
top performance among the three ta
cation, image matching, and patch r
and
Tough
modes on HPatches’s benchmark data set. Three
real aerial image pairs are used to demonstrate that the pro-
posed
PPD
can find more correct correspondences compared
with the
BRISK
,
SIFT
,
ORB
,
SURF
, and
AKAZE
descriptors when
the interest points are detected by one of those feature detec-
tors. In addition, the proposed learning-based
PPD
is more
robust and effective not only to ordinary aerial image pairs but
also to image pairs with viewpoint and illumination variation.
Acknowledgments
We thank Professor Chen Feng from New York University
(NYU) and NYU for providing the High Performance Comput-
ing resource. Jie Wan thanks the Chinese Scholarship Council
scholarship.
Conflicts of Interest:
The authors declare no conflict of interest.
Funding:
This work was supported by the National Key Re-
search and Development of China (2017YF0503004) and the
National Natural Science Foundation of China under grant
41571432.
References
Balntas, V., K. Lenc, A. Vedaldi and K. Mikolajczyk. 2017. HPatches:
A benchmark and evaluation of handcrafted and learned local
descriptors. In
Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition
, held in Honolulu, Hawaii.
Balntas, V., E. Riba, D. Ponsa and K. Mikolajczyk. 2016. Learning
local feature descriptors with triplets and shallow convolutional
neural networks. In
Proceedings of the British Machine Vision
Conference
, held in York, UK.
Balntas, V., L. Tang and K. Mikolajczyk. 2015. Bold-binary online
learned descriptor for efficient image matching. In
Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition
, held in Boston, Mass.
Bay, H., A. Ess, T. Tuytelaars and L. Van Gool. 2008. Speeded-
up robust features (SURF).
Computer Vision and Image
Understanding
110
(3):346–359.
Bentoutou, Y., N. Taleb, K. Kpalma and J. Ronsin. 2005. An automatic
image registration for applications in remote sensing.
IEEE
Transactions on Geoscience and Remote Sensing
43 (9):2127–2137.
Brown, M., G. Hua and S. Winder. 2011. Discriminative learning of
local image descriptors.
IEEE Transactions on Pattern Analysis
and Machine Intelligence
33 (1):43–57.
Calonder, M., V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha and P.
Fua. 2012. BRIEF: Computing a local binary descriptor very fast.
IEEE Transactions on Pattern Analysis and Machine Intelligence
34 (7):1281–1298.
Cheng, G., J. Han and X. Lu. 2017. Remote sensing image scene
classification: Benchmark and state of the art.
Proceedings of the
IEEE
105 (10):1865–1883.
Dong, J. and S. Soatto. 2015. Domain-size pooling in local descriptors:
DSP-SIFT. In
Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition
, held in Boston, Mass.
Dufournaud, Y., C. Schmid and R. Horaud. 2004. Image matching
with scale adjustment.
Computer Vision and Image
Understanding
93 (2):175–194.
Fischer, P., A. Dosovitskiy and T. Brox. 2014. Descriptor matching
with convolutional neural networks: A comparison to sift.
Available from arXiv:1405.5769.
Glorot, X., A. Bordes and Y. Bengio. 2011. Deep sparse rectifier
neural networks. In
Proceedings of the Fourteenth International
Conference on Artificial Intelligence and Statistics
, held in Fort
kthankar and A. C. Berg. 2015.
re and metric learning for patch-based
of the IEEE Conference on Computer
nition
, held in Boston, Mass.
Hoffer, E. and N. Ailon. 2015. Deep metric learning using triplet
network. In
International Workshop on Similarity-Based Pattern
Recognition
, held in Copenhagen, Denmark: Springer.
Ioffe, S. and C. Szegedy. 2015. Batch normalization: Accelerating
deep network training by reducing internal covariate shift.
Available from arXiv:.03167.
Ke, Y. and R. Sukthankar. 2004. PCA-SIFT: A more distinctive
representation for local image descriptors. In
Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition
,
held in Washington, D.C..
Kumar, B., G. Carneiro and I. Reid. 2016. Learning local image
descriptors with deep siamese and triplet convolutional
networks by minimising global loss functions. In
Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition
, held in Las Vegas, Nev.LeCun, Y., L. Bottou, Y.
Bengio and P. Haffner. 1998. Gradient-based learning applied to
document recognition.
Proceedings of the IEEE 86
(11):2278–
2324.
Leutenegger, S., M. Chli and R. Y. Siegwart. 2011. BRISK: Binary
robust invariant scalable keypoints. In
Proceedings of the IEEE
International Conference on Computer Vision
, held in Colorado
Springs, Colorado.
Lowe, D. G. 2004. Distinctive image features from scale-invariant
keypoints.
International Journal of Computer Vision
60 (2):91–
110.
Mishchuk, A., D. Mishkin, F. Radenovic and J. Matas. 2017. Working
hard to know your neighbor’s margins: Local descriptor learning
loss. In
Advances in Neural Information Processing Systems
,
held in Long Beach, Calif.
Mitra, R., J. Zhang, S. Narayan, S. Ahmed, S. Chandran and A.
Jain. 2017. Improved descriptors for patch matching and
reconstruction. In
Proceedings of the IEEE International
Conference on Computer Vision (ICCV) Workshop
, held in
Venice, Italy.
Mur-Artal, R. and J. D. Tardós. 2017. Orb-slam2: An open-source
slam system for monocular, stereo, and rgb-d cameras.
IEEE
Transactions on Robotics
33 (5):1255–1262.
Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z.
Lin, A. Desmaison, L. Antiga and A. Lerer. 2017. Automatic
differentiation in pytorch. In
Advances in Neural Information
Processing Systems Workshop
.
Schonberger, J. L. and J.-M. Frahm. 2016. Structure-from-motion
revisited. In
Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition
, held in Las Vegas, Nev.
Simo-Serra, E., C. Torras and F. Moreno-Noguer. 2015. DaLI:
Deformation and light invariant descriptor.
International Journal
of Computer Vision
115 (2):136–154.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
September 2019
685