PE&RS June 2016 Full - page 408

land cover categories with identical training samples. Both
the classifier’s accuracy and the resulting thematic map ac-
curacy were assessed using the
OOB
error estimate (Wolpert
and Macready, 1999) and the error matrix analysis (Congal-
ton, 1991), respectively. Using these metrics, the sensitivity of
random forests in relation to changing algorithmic parameter
settings was further analyzed. Finally, based on this study, we
recommend several practical guidelines when parameterizing
random forests in land cover classification from remote sensor
data. The following sections will introduce the theoretical
background of random forests in connection to pattern recog-
nition, document the research procedural route adopted, and
discuss the results and implications.
Basics of Random Forests
Only a basic discussion on random forests is provided here,
and readers seeking in-depth understanding of the theoreti-
cal underpinnings should refer to several seminal works
published by Leo Breiman (e.g., Breiman, 2001; Breiman and
Cutler, 2004). Random forests were inspired by some early
efforts on ensemble classification algorithms that combine
multiple unstable classifiers to create a stable and improved
classification performance by injecting randomness in the
structure of algorithms through various strategies (Criminisi
et al
., 2011). Breiman (1996 and 1999) proposed bagging-
bootstrap aggregation, which combines classification results
from individual tree classifiers that are created with training
sample subsets randomly selected from the entire training set
with replacement. Dietterich (1998 and 2000) described a tree
randomization method that creates diverse individual trees by
randomly selecting a split from a list of the best splits at each
node for individual decision trees. Ho (1995 and 1998) pro-
posed the random subspace method which constructs mul-
tiple tree classifiers by projecting samples from the original
feature space to different subspaces created from randomly
selected feature subsets and then conducts classification in
the projected subspaces. Amit and Geman (1997) described a
method that constructs diverse individual trees by specifying
a large number of features, searching over a random selection
of these features, and obtaining the best split at each node.
These ensemble strategies were influential in the develop-
ment of random forests.
Random forests are based on multiple full-grown decision
tree classifiers, each of which votes a pixel as a certain class
and the final classification label search pixel with the most
popular vote. An individual unpruned tree classifier is not
preferred for classification in most cases, because it can be
overly adapted to the training samples and thus results in an
overfitting model (Ho, 1995). However, in random forests, a
fully-grown tree as an unstable classifier can reduce tree cor-
relation and hence increase randomness. Random forests com-
bine multiple individual tree-structured classifiers through
bootstrap aggregation (Breiman, 1996). For each bootstrap
iteration, each tree is constructed using a different bootstrap
sample set from the training data that usually takes two-thirds
of the entire samples; the remaining samples in each run are
left out for out-of-bag error estimate that equals to the incor-
rectly classified instances divided by the entire out-of-bag
samples (Wolpert and Macready, 1999). Using different train-
ing samples can reduce the correlation of individual trees.
Based on the Strong Law of Large Numbers (
SLLN
) (Feller,
1968), as the number of tree increases, a random forest model
can always converge and reduce the generalization error from
its individual tree classifier, and thus overfitting is not consid-
ered to be a problem for random forests (Breiman, 2001).
At each node, randomly selected features are used to en-
hance the difference between trees. Suppose the classification
task has
f
features, a fixed number
F
<
f
is customer specified
to use in the classification.
F
features are selected at random
from
f
,
and the best binary split from the
F
features is chosen
based on the impurity criterions, such as information gain or
Gini Index to split the node as a binary partitioning (Breiman,
2001). Breiman (2001) summarized several characteristics of
random forests that include the capability in processing large
datasets and high dimensional data without having to use
feature reduction or selection techniques and the robustness
to outliers and noises in the training data. Random forests are
considered to be a straightforward, fast and accurate classifier
(Biau, 2012).
Over the years, various studies have been conducted to
investigate the effectiveness of random forests for remote sen-
sor image classification with different logics in connection to
environmental and urban applications. Random forests have
been used to classify various types of remote sensor data such
as multispectral, hyperspectral, very-high-spatial-resolution,
and microwave images at the per pixel level (e.g., Ham
et al
.,
2005; Joelsson
et al.
, 2005; Lawrence
et al
., 2006; Chan and
Paelinckx, 2008; Guo
et al
., 2011; Naidoo
et al
., 2012; Rodri-
guez-Galiano and Chica-Olmo, 2012; Adam
et al
., 2014;Hayes
et al
., 2014). They have also been used for image classification
at the subpixel level (e.g., Reschke and Hüttich, 2014) as well
as at the object level (e.g., Watts
et al
., 2009; Stumpf and Kerle
2011; Long
et al
., 2013; Puissant
et al
., 2014). Furthermore,
random forests have been used for tasks involving multiple
classifications such as multitemporal image classification for
change detection and thematic mapping with multi-sensor
images (e.g., Clark
et al
., 2012; Khalyani
et al
., 2012; Rodri-
guez-Galiano and Chica-Olmo, 2012; Grinand
et al
., 2013;
Ghosh
et al
., 2014; Zhong
et al
., 2014).
Despite the above progresses, most of the existing studies
seem to be quite arbitrary in algorithmic parameter settings.
Specific to the tree number, Leo Breiman used a much small
number (i.e., 100) in his original studies (Breiman, 1999 and
2001). However, there exists a belief that more trees could
lead to better classification outcomes, and hence large tree
numbers ranging from hundreds to thousands have been
used in the literature (e.g., Pal, 2005; Gislason
et al
., 2006;
Lawrence
et al
., 2006; Zhong
et al
. 2014). However, such
treatments may not take the full advantage of random forest
as a fast classifier particularly in near real-time applications.
Random forest implementation requires a larger memory stor-
age, which is considered as a major drawback when compared
to support vector machines (
SVM
) and many other classifiers
(Gislason
et al
., 2006; Tang, 2008). Using a large number of
trees for large datasets can significantly increase the memory
and time cost. Thus, parallelized random forest techniques
become popular to speed up the implementations with large
datasets. As for the feature number, Breiman (2001) suggested
using the square root of the entire feature number, and Liaw
and Wiener (2002) believed that using half or twice of the
square root could result in a suboptimal performance of the
classifier. Since the classifier’s accuracy measured with the
OOB
error estimate generally does not change much, it may be
also possible to use different feature numbers. Clearly, there is
no consensus on the tree number and the feature number that
should be used in order to achieve optimal performance by
random forests, which justifies further research on these areas.
Research Methods
The research methods adopted here included five major
components (Figure 1): (a) remote sensor data acquisition and
preprocessing, (b) land-cover classification scheme design and
training sample selection, (c) random forests configuration
and land-cover classification, (d) evaluations of the classifier’s
408
June 2016
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
387...,398,399,400,401,402,403,404,405,406,407 409,410,411,412,413,414,415,416,417,418,...450
Powered by FlippingBook