September 2019 Full - page 675

TFeat (Balntas

et al.

2016) is a shallow

CNN

using triplets to

learn the local feature descriptor, and it shows that the ratio-

loss–based methods are more suitable for patch pair classi-

fication and that margin-loss–based methods work better in

nearest neighbor matching applications. The DeepCD network

structure learns a floating descriptor and a complementary

binary descriptor simultaneously by designing a data-depen-

dent modulation layer and optimizing the joint loss function

(Yang

et al.

2017). Tian

et al.

(2017) propose a different patch

descriptor learning architecture and use several strategies,

including a progressive sampling, the concept of the relative

minimal distance of matching pairs, information supervision

from the intermediate layers, and compactness of the learned

descriptor in the training phase. Mishchuk

et al.

(2017)

propose a novel loss that maximizes the distance between

the positive and the closest negative sample in a batch, and

it demonstrates that it has the state-of-art performance when

adopting the same network structure as in L2-Net.

Following the second line of stra

tegy, this article aims to

learn robust and discriminative des

criptors for image patches

by adopting a triplet

CNN

structure

while introducing a novel

loss function that can replace the d

escriptors of the traditional

handcrafted features, such as

SIFT

Proposed Method

Given a set of

training samples

= (

, …,

∈

) with a

spatial resolution of pixels, the proposed approach generates

a discriminative descriptor

(

∈

) for each image patch

using a pyramid triplet

CNN

Sampling Strategy

Sampling is important for training a neural network because a

lot of negative samples do not contribute to the optimization

and bias the learning step. Hence, many researchers focus

on hard mining techniques (i.e., selecting the proper nega-

tive samples). Our sampling strategy is the same as HardNet

(Mishchuk

et al.

2017), which selects the most challenging

negative image patches for every anchor image patch. The

descriptor

of every image patch

is obtained through the

network.

normalization is applied to every descriptor

| = 1,

. Thus, the distances between the two descrip-

tors

and

is computed via

dist

i j

d d

i j

( )

= -

2 2

(4)

Suppose there are

matching pairs {(

), …, (

)}

in a batch. Then the distance matrix

for the batch can be

obtained by

d d

a p

dist(

)

dist(

)

dist(

)

dist(

)

1 1

2 1



ist(

)

dist(

)

dist(

)

dist(

d d

a p



    



)

dist(

)



d d













(5)

The elements on the dia

gonal of the matrix represent the

distances of descriptors fo

r the corresponding matching pairs

(

), 1

≤

. For an anc

hor patch

, the closest nega-

tive image patch

is the s

mallest element in the

-th row of

the distance matrix

, that

is,

*= arg min

dist(

∈

[1,

≠

. Similarly, the closest nonmatching image patch

for the image patch

should be the smallest number among

the

-th column of the distance matrix

, that is,

*= arg min

dist(

∈

[1,

≠

. Thus, a triplet sample is defined via

Equation 6:

triplet

(

)

(

)

(

)

(



I I I

dist d d dist d d

I I

, ,

≤

dist d d dist d d

, ,

( )

)

(

)

(

)









(6)

In this way, the triplet sample, including the matching pair

(

) and the hardest sample whose descriptor is the closest

to one patch of the matching pair, is used to train the pro-

posed network.

Network Architecture

The schematic of the triplet network is depicted in Figure

1. The corresponding descriptors (

) for the anchor,

Anchor

Patch

CNN

Loss

Positive

Patch

Negative

Patch

Triplet Network: share weights

Descriptor

Figure 1. Schematic of the triplet network, where three image patches are processed by the same

CNN

. The details of the

CNN

structure is shown in Figure 2.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

September 2019

675

SEO version

Warning.

You are currently viewing the SEO version of September 2019 Full.
It has a number of design and functionality limitations.

We recommend viewing the basic HTML version or installing the Adobe Flash Player.

611...,665,666,667,668,669,670,671,672,673,674 676,677,678,679,680,681,682,683,684,685,...702