PE&RS June 2016 Full

that can be further transferred into meaningful information

classes defined by an analyst. By further considering the im-

age resolution, a modified version of the Anderson land cover

classification scheme (Anderson

et al

., 1976) was adopted,

which included intensive urban, extensive urban, barren

land, grassland, pasture, evergreen forest, deciduous forest,

mixed forest, wetland forest, and water (Table 1).

After the classification scheme was adopted, the next step

was to carefully select training sites within the image subset

that were representative of the land cover classes. This part of

work is critical for supervised classification, and every effort

should be made to ensure each training set to be spectrally

homogenous. For land cover classes with multiple subclasses,

a separate training set was collected for each subclass. This

was applied to the intensive urban, pasture, and water classes.

The intensive urban class includes two major subclasses, i.e.,

the large building subclass (e.g., commercial, transportation,

industrial and residential buildings) and the large open space

subclass (e.g., parking spaces and state or interstate highways).

The pasture class includes two subclasses, i.e., densely vegetat-

ed areas and sparsely vegetated areas, which are quite different

in image color and texture due to the variation in human use of

the land and the underlying soil and moisture conditions. The

water class includes a subclass for deep and clear water with

a darker color and another subclass for shallow water. The

selection of a training sample involved the use of the afore-

mentioned reference data in many ways and the frequent use

of some statistical measures and visualization tools to evalu-

ate the spectral homogeneity within a training set as well as

the spectral separability across different sets. A total of 6,899

pixels were collected as the training samples (see Table 1).

Random Forest Configuration and Classification

A set of random forest models were carefully constructed by

manipulating three internal parameters: tree number, feature

number, and random seed number. The first two parameters

have been discussed in the second section. Note that “feature”

is a term used in machine learning and statistics, and the total

image band number is equivalent to the overall feature num-

ber. For hyperspectral imagery, however, a feature reduction

(or data dimensionality reduction) procedure is often needed.

The dataset used here includes seven spectral bands, and the

total feature number

f

, therefore, equals to seven. The perfor-

mance of random forests was examined using the random tree

number ranging from 1 to 150; the upper limit was deter-

mined because previous studies generally agreed that the per-

formance of random forest classification (as a special bagging

method) should become stable far before using as many as

150 random trees (e.g., Breiman, 1996; Indurkhya and Weiss,

1998; DeFries and Chan, 2000).The random seed number is

used to initialize a pseudorandom number generator. Thus,

using a different seed number can lead to the generation of a

different random sample series from the entire training sam-

ple for use in the bagging procedure. This can further affect

the construction of individual tree models and the selection

of samples for use in the out-of-bag error estimate. Here, we

adopted ten different random seeds which are represented by

the seed number ranging from 1 to 10 to examine the stability

of random forests in image classification.

The strategy for the random forest configuration was that

each time only one of the three parameters was allowed to al-

ter while fixing the other two. Note that the parameter settings

also include some extreme values (such as lower and upper

limits). First, random forest classifications were implemented

with random tree number ranging from 1 to150, while fixing

the number of features and the random seed number. Then,

this step was repeated for one to seven features. The impacts

of the number of trees and the number of features on the clas-

sification accuracy can be assessed. Finally, the former two

steps were repeated for ten random seeds to investigate the

stability of random forests classification. Therefore, a total of

10,500 (70*150) random forest models were constructed.

The next step was to employ each of the random forest

models to classify the Landsat-8

OLI

image subset into ten

land-cover categories using identical training samples. Using

identical training data for each classification can help avoid

the variations in performance of random forests caused by

non-parametric factors, and thus allow our evaluation to fo-

cus on the internal parameters that is consistent to the overall

research objective here. Note, although the same training

sample set is used for each random forest model, the random

resampling technique of bagging can lead to the variation

in the internal training samples for each individual tree and

in the out-of-bag samples for error estimates. Moreover, the

initial classification output includes some subclasses for the

aforementioned three major land-cover categories that were

further combined into their appropriate major classes prior to

the thematic accuracy assessment.

T

able

1. L

and

C

over

C

lassification

S

cheme

, T

raining

S

ample

S

ize

,

and

R

eference

D

ata

S

ize

No.

Class Name

Description

Training sample*

(in pixels)

Reference

sample

(in pixels)

1

Intensive

urban

More than two-thirds impervious surfaces, mainly commercial, industrial, institutional

constructions with large roofs, and public retail buildings. Large open spaces and large

transportation facilities

1,059

60

2

Extensive

urban

Residential areas with impervious surfaces less than two-thirds of the total cover,

including residential developments, smaller urban service buildings, such as detached

stores and restaurants, and state highways

588

60

3 Barren land

Urban areas with low percentages of constructed materials, vegetation, and low level of

impervious surfaces, including bare soil lands, exposed rock, mines and quarries

428

60

4 Grassland

Herbaceous cover, trees and shrub less than 10%; Parks, lawns and golf courses

584

60

5 Pasture

Grazing area with less than 30% vegetation coverage and small amount fallow land.

Pastures with more than 30% vegetation coverage mixed with bushes

1,014

60

6

Evergreen

forest

Trees remain green throughout the year, wetland evergreen forests included, mainly

cedar and pine trees

522

60

7

Deciduous

forest

Trees lose their leaves during the dry or cold season, wetland deciduous forests

included, mainly oak, maple, elm, and hickory

545

60

8 Mixed forest

Either evergreen or deciduous trees also mixed with less than 10% shrub and scrub

572

60

9 Wetland forest Hardwood, mixed forest and shrubs, distributing along rivers and around lakes

522

60

10 Water

Deep water, such as rivers, lakes, reservoirs. Shallow water, such as pools, and ponds

1,065

60

*Larger training samples were collected for several classes with subclasses, including the intensive urban, pasture, and water classes.

410

June 2016

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

PE&RS June 2016 Full - page 410

Warning.