PE&RS November 2018 Full - page 725

1. For the scene d, a

-dimensional topic proportion

chosen according to a Dirichlet distribution Dir(

), where

is the number of topics.

2. For each visual word position in the scene, a topic

first chosen from the multinomial distribution Mul(

), and

then a visual word

is chosen from

(

), a multino-

mial probability conditioned on topic

The above process shows that the model is controlled by

and

, and thus in the learning stage, our goal is to find the

two parameters such that the log likelihood of the image

dataset is maximized. It is clear that

LDA

is an unsupervised

model, and the estimated topics are not specifically for classi-

fication. To mark the category of a document directly, Jon and

David (2008) developed

sLDA

, which is a supervised variant

LDA

, and proved that

sLDA

fitted the category of documents

better than

LDA

. Since we are more concerned about the cat-

egory than the topics of the scene, the

sLDA

model is applied

to tea garden scene detection in the proposed framework.

As described in Jon and David (2008),

sLDA

adds a response

variable which denotes the categories of the scenes (i.e., tea

gardens and non-tea gardens in our study) in the generative

process of

LDA

. After generating a scene, the response variable

associated with this scene is also generated. Thus, the learned

model can then be used to classify the unknown scenes.

Unsupervised Feature Learning

In the proposed framework, the

UCNN

, via the plain

-means

clustering method, is constructed to achieve unsupervised

multi-layer feature learning (Li

et al.

, 2016). As depicted in

Figure 4, the

UCNN

is composed of two feature extraction lay-

ers, and each layer contains three operations: convolution,

local pooling, and global pooling. In the following, we take

the first feature extraction layer as an example to introduce

the three operations.

1. Convolution operation: The function of the convolution

operation is feature mapping, which is defined under the

constraint of function bases. The function bases need to

be generated by an unsupervised learning algorithm, and

-means clustering is utilized in the proposed frame-

work due to its good performance (Coates et al., 2010).

As described in Li et al. (2016), the unlabeled patches

with dimension

–

are randomly sampled from

the original image scenes, where

denotes the size of the

receptive field, and

is the number of image channels. We

can then construct the feature set X = {

, …,

}, where

∈

(

) denotes the vectorization vector of the

i – th

patch. After preprocessing by intensity normaliza-

tion and zero component analysis whitening, the feature

set

is clustered by the

-means clustering approach, and

the clustering centers form the function bases C = {

…,

}, with

∈

. Once the function bases are generated,

the convolution operation can be defined as follows. Let

denote the vectorization vector of one sliding patch in

the input image I, then this patch can be mapped onto the

sparse feature vector

∈

= max{0,

(

) –

}

(1)

where

= |

–

= 1, 2, …,

and

(

) is the mean

of the elements of

. Through the convolution operation,

we produce the feature map F of the input image I with

dimension (

–

) – (

–

) –

, where

denotes the size

of image I.

2. Local pooling operation: This pooling operation is imple-

mented to keep slight translation and rotation invariance.

Here, the local pooling operation is defined as:

a. L(

) = max(F(

–

/2:

–

/2:

/2:)) (2)

where

= 1, 2, …,

and

denotes the local window size

of the pooling operation,

3. Global pooling operation: The aim of the global pooling

operation is to reduce the dimension of the feature. In the

implementation, the output of the local pooling operation

Figure 2. Flowchart of the

BOVW

model.

Figure 3. Generative process of

LDA

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

November 2018

725

SEO version

Warning.

You are currently viewing the SEO version of PE&RS November 2018 Full.
It has a number of design and functionality limitations.

We recommend viewing the basic HTML version or installing the Adobe Flash Player.

667...,715,716,717,718,719,720,721,722,723,724 726,727,728,729,730,731,732,733,734,735,...746