September 2019 Full

convolutional network is largely dependent on the hard-

ware configuration. With the development of hardware

technology, it is possible that deep learning–based meth-

ods will take less time to extract the deep descriptor.

Conclusions

In this article, we present a pyramid convolutional triplet

neural network and a novel distance-based loss function to

learn the patch descriptor. First, the hard mining strategy

is used to select the hardest negative patch and the positive

patch to form a triplet sample for a given patch. Second, a

pyramid network is applied to the first convolutional layer to

incorporate the global context of the image patch. Finally, we

design a new distance loss function that does not need to set

the margin for a triplet network manually, which could avoid

the scale problem always occurring in the triplet network.

Experiments demonstrate that the proposed deep descrip-

tor

PPD

is the most effective descript

or among all handcrafted

features and the learning-based dee

p descriptors on the Brown

benchmark. Except HardNet, the pr

oposed

PPD

achieves the

top performance among the three ta

sks, including patch verifi-

cation, image matching, and patch r

etrieval in the

Easy

,

Hard

,

and

Tough

modes on HPatches’s benchmark data set. Three

real aerial image pairs are used to demonstrate that the pro-

posed

PPD

can find more correct correspondences compared

with the

BRISK

,

SIFT

,

ORB

,

SURF

, and

AKAZE

descriptors when

the interest points are detected by one of those feature detec-

tors. In addition, the proposed learning-based

PPD

is more

robust and effective not only to ordinary aerial image pairs but

also to image pairs with viewpoint and illumination variation.

Acknowledgments

We thank Professor Chen Feng from New York University

(NYU) and NYU for providing the High Performance Comput-

ing resource. Jie Wan thanks the Chinese Scholarship Council

scholarship.

Conflicts of Interest:

The authors declare no conflict of interest.

Funding:

This work was supported by the National Key Re-

search and Development of China (2017YF0503004) and the

National Natural Science Foundation of China under grant

41571432.

References

Balntas, V., K. Lenc, A. Vedaldi and K. Mikolajczyk. 2017. HPatches:

A benchmark and evaluation of handcrafted and learned local

descriptors. In

Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition

, held in Honolulu, Hawaii.

Balntas, V., E. Riba, D. Ponsa and K. Mikolajczyk. 2016. Learning

local feature descriptors with triplets and shallow convolutional

neural networks. In

Proceedings of the British Machine Vision

Conference

, held in York, UK.

Balntas, V., L. Tang and K. Mikolajczyk. 2015. Bold-binary online

learned descriptor for efficient image matching. In

Proceedings

of the IEEE Conference on Computer Vision and Pattern

Recognition

, held in Boston, Mass.

Bay, H., A. Ess, T. Tuytelaars and L. Van Gool. 2008. Speeded-

up robust features (SURF).

Computer Vision and Image

Understanding

110

(3):346–359.

Bentoutou, Y., N. Taleb, K. Kpalma and J. Ronsin. 2005. An automatic

image registration for applications in remote sensing.

IEEE

Transactions on Geoscience and Remote Sensing

43 (9):2127–2137.

Brown, M., G. Hua and S. Winder. 2011. Discriminative learning of

local image descriptors.

IEEE Transactions on Pattern Analysis

and Machine Intelligence

33 (1):43–57.

Calonder, M., V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha and P.

Fua. 2012. BRIEF: Computing a local binary descriptor very fast.

IEEE Transactions on Pattern Analysis and Machine Intelligence

34 (7):1281–1298.

Cheng, G., J. Han and X. Lu. 2017. Remote sensing image scene

classification: Benchmark and state of the art.

Proceedings of the

IEEE

105 (10):1865–1883.

Dong, J. and S. Soatto. 2015. Domain-size pooling in local descriptors:

DSP-SIFT. In

Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition

, held in Boston, Mass.

Dufournaud, Y., C. Schmid and R. Horaud. 2004. Image matching

with scale adjustment.

Computer Vision and Image

Understanding

93 (2):175–194.

Fischer, P., A. Dosovitskiy and T. Brox. 2014. Descriptor matching

with convolutional neural networks: A comparison to sift.

Available from arXiv:1405.5769.

Glorot, X., A. Bordes and Y. Bengio. 2011. Deep sparse rectifier

neural networks. In

Proceedings of the Fourteenth International

Conference on Artificial Intelligence and Statistics

, held in Fort

Lauderdale, Fla.

Han, X., T. Leung, Y. Jia, R. Su

kthankar and A. C. Berg. 2015.

Matchnet: Unifying featu

re and metric learning for patch-based

matching. In

Proceedings

of the IEEE Conference on Computer

Vision and Pattern Recog

nition

, held in Boston, Mass.

Hoffer, E. and N. Ailon. 2015. Deep metric learning using triplet

network. In

International Workshop on Similarity-Based Pattern

Recognition

, held in Copenhagen, Denmark: Springer.

Ioffe, S. and C. Szegedy. 2015. Batch normalization: Accelerating

deep network training by reducing internal covariate shift.

Available from arXiv:.03167.

Ke, Y. and R. Sukthankar. 2004. PCA-SIFT: A more distinctive

representation for local image descriptors. In

Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition

,

held in Washington, D.C..

Kumar, B., G. Carneiro and I. Reid. 2016. Learning local image

descriptors with deep siamese and triplet convolutional

networks by minimising global loss functions. In

Proceedings

of the IEEE Conference on Computer Vision and Pattern

Recognition

, held in Las Vegas, Nev.LeCun, Y., L. Bottou, Y.

Bengio and P. Haffner. 1998. Gradient-based learning applied to

document recognition.

Proceedings of the IEEE 86

(11):2278–

2324.

Leutenegger, S., M. Chli and R. Y. Siegwart. 2011. BRISK: Binary

robust invariant scalable keypoints. In

Proceedings of the IEEE

International Conference on Computer Vision

, held in Colorado

Springs, Colorado.

Lowe, D. G. 2004. Distinctive image features from scale-invariant

keypoints.

International Journal of Computer Vision

60 (2):91–

110.

Mishchuk, A., D. Mishkin, F. Radenovic and J. Matas. 2017. Working

hard to know your neighbor’s margins: Local descriptor learning

loss. In

Advances in Neural Information Processing Systems

,

held in Long Beach, Calif.

Mitra, R., J. Zhang, S. Narayan, S. Ahmed, S. Chandran and A.

Jain. 2017. Improved descriptors for patch matching and

reconstruction. In

Proceedings of the IEEE International

Conference on Computer Vision (ICCV) Workshop

, held in

Venice, Italy.

Mur-Artal, R. and J. D. Tardós. 2017. Orb-slam2: An open-source

slam system for monocular, stereo, and rgb-d cameras.

IEEE

Transactions on Robotics

33 (5):1255–1262.

Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z.

Lin, A. Desmaison, L. Antiga and A. Lerer. 2017. Automatic

differentiation in pytorch. In

Advances in Neural Information

Processing Systems Workshop

.

Schonberger, J. L. and J.-M. Frahm. 2016. Structure-from-motion

revisited. In

Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition

, held in Las Vegas, Nev.

Simo-Serra, E., C. Torras and F. Moreno-Noguer. 2015. DaLI:

Deformation and light invariant descriptor.

International Journal

of Computer Vision

115 (2):136–154.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

September 2019

685

September 2019 Full - page 685

Warning.