PE&RS June 2016 Full

land cover categories with identical training samples. Both

the classifier’s accuracy and the resulting thematic map ac-

curacy were assessed using the

OOB

error estimate (Wolpert

and Macready, 1999) and the error matrix analysis (Congal-

ton, 1991), respectively. Using these metrics, the sensitivity of

random forests in relation to changing algorithmic parameter

settings was further analyzed. Finally, based on this study, we

recommend several practical guidelines when parameterizing

random forests in land cover classification from remote sensor

data. The following sections will introduce the theoretical

background of random forests in connection to pattern recog-

nition, document the research procedural route adopted, and

discuss the results and implications.

Basics of Random Forests

Only a basic discussion on random forests is provided here,

and readers seeking in-depth understanding of the theoreti-

cal underpinnings should refer to several seminal works

published by Leo Breiman (e.g., Breiman, 2001; Breiman and

Cutler, 2004). Random forests were inspired by some early

efforts on ensemble classification algorithms that combine

multiple unstable classifiers to create a stable and improved

classification performance by injecting randomness in the

structure of algorithms through various strategies (Criminisi

et al

., 2011). Breiman (1996 and 1999) proposed bagging-

bootstrap aggregation, which combines classification results

from individual tree classifiers that are created with training

sample subsets randomly selected from the entire training set

with replacement. Dietterich (1998 and 2000) described a tree

randomization method that creates diverse individual trees by

randomly selecting a split from a list of the best splits at each

node for individual decision trees. Ho (1995 and 1998) pro-

posed the random subspace method which constructs mul-

tiple tree classifiers by projecting samples from the original

feature space to different subspaces created from randomly

selected feature subsets and then conducts classification in

the projected subspaces. Amit and Geman (1997) described a

method that constructs diverse individual trees by specifying

a large number of features, searching over a random selection

of these features, and obtaining the best split at each node.

These ensemble strategies were influential in the develop-

ment of random forests.

Random forests are based on multiple full-grown decision

tree classifiers, each of which votes a pixel as a certain class

and the final classification label search pixel with the most

popular vote. An individual unpruned tree classifier is not

preferred for classification in most cases, because it can be

overly adapted to the training samples and thus results in an

overfitting model (Ho, 1995). However, in random forests, a

fully-grown tree as an unstable classifier can reduce tree cor-

relation and hence increase randomness. Random forests com-

bine multiple individual tree-structured classifiers through

bootstrap aggregation (Breiman, 1996). For each bootstrap

iteration, each tree is constructed using a different bootstrap

sample set from the training data that usually takes two-thirds

of the entire samples; the remaining samples in each run are

left out for out-of-bag error estimate that equals to the incor-

rectly classified instances divided by the entire out-of-bag

samples (Wolpert and Macready, 1999). Using different train-

ing samples can reduce the correlation of individual trees.

Based on the Strong Law of Large Numbers (

SLLN

) (Feller,

1968), as the number of tree increases, a random forest model

can always converge and reduce the generalization error from

its individual tree classifier, and thus overfitting is not consid-

ered to be a problem for random forests (Breiman, 2001).

At each node, randomly selected features are used to en-

hance the difference between trees. Suppose the classification

task has

f

features, a fixed number

F

<

f

is customer specified

to use in the classification.

F

features are selected at random

from

f

,

and the best binary split from the

F

features is chosen

based on the impurity criterions, such as information gain or

Gini Index to split the node as a binary partitioning (Breiman,

2001). Breiman (2001) summarized several characteristics of

random forests that include the capability in processing large

datasets and high dimensional data without having to use

feature reduction or selection techniques and the robustness

to outliers and noises in the training data. Random forests are

considered to be a straightforward, fast and accurate classifier

(Biau, 2012).

Over the years, various studies have been conducted to

investigate the effectiveness of random forests for remote sen-

sor image classification with different logics in connection to

environmental and urban applications. Random forests have

been used to classify various types of remote sensor data such

as multispectral, hyperspectral, very-high-spatial-resolution,

and microwave images at the per pixel level (e.g., Ham

et al

.,

2005; Joelsson

et al.

, 2005; Lawrence

et al

., 2006; Chan and

Paelinckx, 2008; Guo

et al

., 2011; Naidoo

et al

., 2012; Rodri-

guez-Galiano and Chica-Olmo, 2012; Adam

et al

., 2014;Hayes

et al

., 2014). They have also been used for image classification

at the subpixel level (e.g., Reschke and Hüttich, 2014) as well

as at the object level (e.g., Watts

et al

., 2009; Stumpf and Kerle

2011; Long

et al

., 2013; Puissant

et al

., 2014). Furthermore,

random forests have been used for tasks involving multiple

classifications such as multitemporal image classification for

change detection and thematic mapping with multi-sensor

images (e.g., Clark

et al

., 2012; Khalyani

et al

., 2012; Rodri-

guez-Galiano and Chica-Olmo, 2012; Grinand

et al

., 2013;

Ghosh

et al

., 2014; Zhong

et al

., 2014).

Despite the above progresses, most of the existing studies

seem to be quite arbitrary in algorithmic parameter settings.

Specific to the tree number, Leo Breiman used a much small

number (i.e., 100) in his original studies (Breiman, 1999 and

2001). However, there exists a belief that more trees could

lead to better classification outcomes, and hence large tree

numbers ranging from hundreds to thousands have been

used in the literature (e.g., Pal, 2005; Gislason

et al

., 2006;

Lawrence

et al

., 2006; Zhong

et al

. 2014). However, such

treatments may not take the full advantage of random forest

as a fast classifier particularly in near real-time applications.

Random forest implementation requires a larger memory stor-

age, which is considered as a major drawback when compared

to support vector machines (

SVM

) and many other classifiers

(Gislason

et al

., 2006; Tang, 2008). Using a large number of

trees for large datasets can significantly increase the memory

and time cost. Thus, parallelized random forest techniques

become popular to speed up the implementations with large

datasets. As for the feature number, Breiman (2001) suggested

using the square root of the entire feature number, and Liaw

and Wiener (2002) believed that using half or twice of the

square root could result in a suboptimal performance of the

classifier. Since the classifier’s accuracy measured with the

OOB

error estimate generally does not change much, it may be

also possible to use different feature numbers. Clearly, there is

no consensus on the tree number and the feature number that

should be used in order to achieve optimal performance by

random forests, which justifies further research on these areas.

Research Methods

The research methods adopted here included five major

components (Figure 1): (a) remote sensor data acquisition and

preprocessing, (b) land-cover classification scheme design and

training sample selection, (c) random forests configuration

and land-cover classification, (d) evaluations of the classifier’s

408

June 2016

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

PE&RS June 2016 Full - page 408

Warning.