impervious surface map. C5 algorithm begins by dividing the
cases based on their attributes and then identifies a natural
break point in the attributes based on the class value. This ap-
proach produces a set of decision rules that can be combined
to build a tree consisting of branches and leaves. Each branch
represents a test that is performed on the data, and leads to
either a further branch or a leaf where a decision is reached
and a class assigned (Quinlan 1993 and 2013).
A set of 300 randomly distributed points was created as the
training dataset. The attribute values of points were extracted
from multi-source data: the
NAIP
imagery, the texture layer, the
lidar
DEM
, the classified
LULC
map, and the road density map.
Another attribute field containing binary impervious surface
data was also added and populated by visually determining
whether each point is located on impervious surface or not.
Next, the data were analyzed using the winnowing, pruning,
and boosting options in See5 to identify the settings that can
deliver the best accuracy and efficiency. Winnowing and prun-
ing help remove attributes and branches that do not signifi-
cantly contribute to the model, which aids in faster processing
and makes the resultant decision tree less complex, often
leading to better accuracy (Quinlan, 1993; Foody
et al
., 2002).
Pruning is also useful to improve trees that suffer from over
fitting (Foody
et al
., 2002). Boosting is used to increase the
accuracy by creating multiple models for the same problem
and using the outcomes from each model to “vote” for a result
(Freund
et al
., 1999). In this study, it was found that the use of
winnowing did not make a difference and the model created
without pruning was already fairly accurate. For the boosting
option, a manageable amount of ten trials was evaluated, but
this did not increase accuracy enough to justify the additional
computational expense. Therefore, the basic decision tree
without the boosting option was chosen. After applying the
analysis, a text output representing the decision tree in a pseu-
do-graphical way was obtained. The final decision tree only
includes bands 1 (B), 3 (R), and 4 (
NIR
) from the
NAIP
imagery
and consists of three branches and four leaves (Figure 2).
Implementation of the Decision Tree
To implement the decision tree, a Python script was created,
which made use of the ArcGIS ArcPy module and allowed
the use of ArcGIS tools within the script. For example, this
makes it possible to utilize ArcGIS dynamic mosaic datasets
Figure 2. The final decision tree for the impervious surface classification.
Figure 3. Locations of the 300 sample units on the 2010 grayscale NAIP image of TCMA.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
January 2016
65