PE&RS November 2018 Full - page 726

is divided into four quarters, and the averaging operation

is employed for each quarter. We let

denote the

corresponding feature vectors of every quarter, and then

the global pooling result can be expressed as G = {

}, where the dimension of G is 4 *

The multi-layer features can be obtained by repeating the

convolution, local pooling, and global pooling operations.

Specifically, taking the original image as the input of the first

layer and implementing the three operations, we can obtain

the feature extracted by the first layer, i.e., the result of the

global pooling operation. Then, taking the result of the local

pooling operation in the first layer as the input of the second

layer, we can obtain the feature of the second layer by the

three operations. The features of the higher layers can be

similarly extracted. Finally, we integrate the features of each

layer as the final multi-layer features.

The Proposed Tea Garden Scene Detection Framework

The proposed scene-based framework for tea garden detection

is illustrated in Figure 5. First, the whole remotely sensed

image is divided into a series of scenes, with each scene refer-

ring to an image block with a certain semantic category. Since

a tea garden does not always occupy all of a scene, we utilize

the overlapping regions to decrease the omissions, and the

size of an overlapping region is half of the scene. Then, based

on the extracted spectral and textural features, the

BOVW

sLDA

, and

UCNN

representations of each scene are computed,

respectively. Note that the dictionary in

BOVW

or the

UCNN

is trained by unlabeled data before detection. Finally,

SVM

which is an efficient classifier for remotely sensed image

classification (Huang and Zhang, 2013; Qin, 2015), is trained

using a small number of labeled samples to classify these

representations into tea gardens and non-tea gardens, expect

for the

sLDA

model, which can directly predict the category

of each scene after training. In addition, for the purpose of

a comparison, the scenes with spectral and spectral-textural

features are also directly classified with

SVM

In our work, the Gabor filter (Lee, 1996) is used to extract

the textural features due to its high efficacy in remote sensing

image texture description (Reis and Ta

demir, 2011; Wang

al.

, 2014). It is defined as:

x y

b i ux vy

(

)

(

)

(

)

− +

(3)

where

denote the scale along

and

, respectively, and

are the spatial frequencies of the filter in the frequency do-

main. To improve the calculation efficiency, the first principal

component acquired by principal component analysis (PCA)

is employed to extract the Gabor textures. In addition, for the

sample patches in the

BOVW

model, we calculate the mean

and standard deviation of each spectral/Gabor band as the

spectral/textural feature. In the same way, we also generate

the spectral and textural feature vectors for direct classifica-

tion with

SVM

, as a benchmark.

Experimental Data and Setup

Datasets

The experiments were carried out on four datasets, includ-

ing one remotely sensed image from the WorldView-2 (WV-2)

satellite and three images from Google Earth

. The details of

the four datasets are listed in Table 1, and the

RGB

images are

shown in Figure 6 (a). Dataset 1 was acquired from WV-2,

having eight spectral bands of a 2-m spatial resolution. Datas-

et 1 and dataset 2 were obtained at Longyan, Fujian province,

where the tea that is cultivated is mainly Oolong tea. The

tea in dataset 3 is Longjing tea, which is a famous green tea

mainly planted in Hangzhou, Zhejiang province. The area of

dataset 4 is Puer, Yunnan province, which is famous for the

cultivation of Puer tea.

Experimental Setup

The aforementioned datasets with different varieties of tea

were used to test the performance of the proposed scene-

based framework. For each testing image, 10 sets of samples

were randomly chosen, with each one containing 20 tea

garden scenes and 20 non-tea garden scenes as training data

to train the linear

SVM

and

sLDA

, and 200 tea garden scenes

and 400 non-tea garden scenes (independent of the training

scenes) as test data to assess the classification accuracy. In the

experiments, we selected the scenes with more than 90% of

the area occupied by tea garden as tea garden scenes, and the

scenes with no tea garden as non-tea garden scenes according

to Huang

et al.

(2015). The scenes with less than 90% of the

tea garden area were not considered in accuracy assessment.

In addition, Kappa was used as the evaluation criteria, con-

sidering the unbalanced numbers of tea garden and non-tea

garden scenes in the test data.

The parameters used in the four datasets are provided below:

1. Scene size: the size of each scene was 60 m × 60 m, i.e.

30 × 30 pixels for WV-2 imagery and 120 × 120 pixels for

Google Earth imagery. The sensitivity of this parameter is

discussed in the next section.

2. Spectral bands: 8 multispectral bands for WV-2 and 3

bands (red, green, and blue) for Google Earth images were

employed to generate spectral and textural features.

3. Gabor filter: The spatial frequencies

are expressed in

polar coordinates with radial frequency

and orientation

. The parameters of the Gabor filter were set to

= 4,

= {0.006, 0.02, 0.06}, and

= {0,

/3, 2

/3,}.

4. Topic model: For the

BOVW

model, the size of the learned

dictionary in each dataset (i.e., the number of cluster

centers of

-means) was set to 100. The parameters of

sLDA

were determined using 5-fold cross-validation.

5. Unsupervised feature learning: In our implementation, the

UCNN

trained by each dataset contained two layers, with

= 100 and

= 300, where

are the numbers of

Figure 4. Framework of unsupervised feature learning.

726

November 2018

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

SEO Version

Warning.

You are currently viewing the SEO version of !text.
It has a number of design and functionality limitations.

We recommend viewing the Flash version or the basic HTML version of this publication.

667...,716,717,718,719,720,721,722,723,724,725 727,728,729,730,731,732,733,734,735,736,...746