PE&RS June 2016 Full - page 410

that can be further transferred into meaningful information
classes defined by an analyst. By further considering the im-
age resolution, a modified version of the Anderson land cover
classification scheme (Anderson
et al
., 1976) was adopted,
which included intensive urban, extensive urban, barren
land, grassland, pasture, evergreen forest, deciduous forest,
mixed forest, wetland forest, and water (Table 1).
After the classification scheme was adopted, the next step
was to carefully select training sites within the image subset
that were representative of the land cover classes. This part of
work is critical for supervised classification, and every effort
should be made to ensure each training set to be spectrally
homogenous. For land cover classes with multiple subclasses,
a separate training set was collected for each subclass. This
was applied to the intensive urban, pasture, and water classes.
The intensive urban class includes two major subclasses, i.e.,
the large building subclass (e.g., commercial, transportation,
industrial and residential buildings) and the large open space
subclass (e.g., parking spaces and state or interstate highways).
The pasture class includes two subclasses, i.e., densely vegetat-
ed areas and sparsely vegetated areas, which are quite different
in image color and texture due to the variation in human use of
the land and the underlying soil and moisture conditions. The
water class includes a subclass for deep and clear water with
a darker color and another subclass for shallow water. The
selection of a training sample involved the use of the afore-
mentioned reference data in many ways and the frequent use
of some statistical measures and visualization tools to evalu-
ate the spectral homogeneity within a training set as well as
the spectral separability across different sets. A total of 6,899
pixels were collected as the training samples (see Table 1).
Random Forest Configuration and Classification
A set of random forest models were carefully constructed by
manipulating three internal parameters: tree number, feature
number, and random seed number. The first two parameters
have been discussed in the second section. Note that “feature”
is a term used in machine learning and statistics, and the total
image band number is equivalent to the overall feature num-
ber. For hyperspectral imagery, however, a feature reduction
(or data dimensionality reduction) procedure is often needed.
The dataset used here includes seven spectral bands, and the
total feature number
f
, therefore, equals to seven. The perfor-
mance of random forests was examined using the random tree
number ranging from 1 to 150; the upper limit was deter-
mined because previous studies generally agreed that the per-
formance of random forest classification (as a special bagging
method) should become stable far before using as many as
150 random trees (e.g., Breiman, 1996; Indurkhya and Weiss,
1998; DeFries and Chan, 2000).The random seed number is
used to initialize a pseudorandom number generator. Thus,
using a different seed number can lead to the generation of a
different random sample series from the entire training sam-
ple for use in the bagging procedure. This can further affect
the construction of individual tree models and the selection
of samples for use in the out-of-bag error estimate. Here, we
adopted ten different random seeds which are represented by
the seed number ranging from 1 to 10 to examine the stability
of random forests in image classification.
The strategy for the random forest configuration was that
each time only one of the three parameters was allowed to al-
ter while fixing the other two. Note that the parameter settings
also include some extreme values (such as lower and upper
limits). First, random forest classifications were implemented
with random tree number ranging from 1 to150, while fixing
the number of features and the random seed number. Then,
this step was repeated for one to seven features. The impacts
of the number of trees and the number of features on the clas-
sification accuracy can be assessed. Finally, the former two
steps were repeated for ten random seeds to investigate the
stability of random forests classification. Therefore, a total of
10,500 (70*150) random forest models were constructed.
The next step was to employ each of the random forest
models to classify the Landsat-8
OLI
image subset into ten
land-cover categories using identical training samples. Using
identical training data for each classification can help avoid
the variations in performance of random forests caused by
non-parametric factors, and thus allow our evaluation to fo-
cus on the internal parameters that is consistent to the overall
research objective here. Note, although the same training
sample set is used for each random forest model, the random
resampling technique of bagging can lead to the variation
in the internal training samples for each individual tree and
in the out-of-bag samples for error estimates. Moreover, the
initial classification output includes some subclasses for the
aforementioned three major land-cover categories that were
further combined into their appropriate major classes prior to
the thematic accuracy assessment.
T
able
1. L
and
C
over
C
lassification
S
cheme
, T
raining
S
ample
S
ize
,
and
R
eference
D
ata
S
ize
No.
Class Name
Description
Training sample*
(in pixels)
Reference
sample
(in pixels)
1
Intensive
urban
More than two-thirds impervious surfaces, mainly commercial, industrial, institutional
constructions with large roofs, and public retail buildings. Large open spaces and large
transportation facilities
1,059
60
2
Extensive
urban
Residential areas with impervious surfaces less than two-thirds of the total cover,
including residential developments, smaller urban service buildings, such as detached
stores and restaurants, and state highways
588
60
3 Barren land
Urban areas with low percentages of constructed materials, vegetation, and low level of
impervious surfaces, including bare soil lands, exposed rock, mines and quarries
428
60
4 Grassland
Herbaceous cover, trees and shrub less than 10%; Parks, lawns and golf courses
584
60
5 Pasture
Grazing area with less than 30% vegetation coverage and small amount fallow land.
Pastures with more than 30% vegetation coverage mixed with bushes
1,014
60
6
Evergreen
forest
Trees remain green throughout the year, wetland evergreen forests included, mainly
cedar and pine trees
522
60
7
Deciduous
forest
Trees lose their leaves during the dry or cold season, wetland deciduous forests
included, mainly oak, maple, elm, and hickory
545
60
8 Mixed forest
Either evergreen or deciduous trees also mixed with less than 10% shrub and scrub
572
60
9 Wetland forest Hardwood, mixed forest and shrubs, distributing along rivers and around lakes
522
60
10 Water
Deep water, such as rivers, lakes, reservoirs. Shallow water, such as pools, and ponds
1,065
60
*Larger training samples were collected for several classes with subclasses, including the intensive urban, pasture, and water classes.
410
June 2016
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
387...,400,401,402,403,404,405,406,407,408,409 411,412,413,414,415,416,417,418,419,420,...450
Powered by FlippingBook