October 2019 Layout Flipping Full - page 738

by the seasons and limited by within-class spectral variation
and between-class spectral confusion. Therefore, it is usually
necessary to manually adjust the parameters for each local
image in practical applications. The texture feature reflects
the characteristic of building arrangement in built-up areas,
which are more discriminative (Smits and Annoni 1999; Li
and Narayanan 2004; Zhong and Wang 2007; Guo
et al.
2014;
Li
et al.
2014, 2015a, 2017a). The built-up areas have various
types and styles so that texture features are insufficient to
address extensive areas, which is one of the possible rea-
sons why many studies have concentrated on only one small
region. In addition, key point features, such as Harris corner
point (Tao
et al.
2012; Li
et al.
2015a; Chen
et al.
2018) and
SIFT (Sirmacek and Unsalan 2009), are also commonly used
due to the structural characteristics of the buildings. There
are some applications that adopt edge features (Unsalan and
Boyer 2004; Hu
et al.
2013). Although these artificial features
are very delicate, the single feature is unable to address the
complex and changeable built-up areas in a large region.
Therefore, many studies have attempted to improve the
extraction performance by combining multiple features, such
as SVM-based multikernel learning (Tao
et al.
2012; Li
et al.
2017a) and other strategies (Zhang
et al.
2014; Hu
et al.
2016;
Zhang
et al.
2017b). Multi-feature fusion can improve the
algorithm performance effectively, but the artificial features
still have limited generalization ability.
The existing methods are basically a combination of artifi-
cial features, classifiers, or segmentation algorithms. Artificial
features are one of the most important factors that limit prac-
ticability. Additionally, the determination of the processing
unit is also significant. Built-up areas are regional objects that
contain abundant buildings and nonbuilding area elements,
such as lawns, trees, and other greenbelts. Therefore, distin-
guishing remote sensing images pixel by pixel is unreason-
able (Pesaresi
et al.
2009; Liu
et al.
2014; Varshney and Rajesh
2014; Bouzekri
et al.
2015; Kaimaris and Patias 2016). Some
studies have used superpixels as an alternative (Li
et al.
2015b), which can improve the speed of algorithms to some
extent. For built-up area extraction, taking small blocks are
divided by checkerboard partitioning as the processing unit is
more appropriate (Zhong and Wang 2007a, 2007b; Tao
et al.
2012; Hu
et al.
2013; Li
et al.
2017a). Smal
tain buildings and pixels between buildin
represent the characteristics of built-up ar
tion results conform to the practical requir
The rapid development and great success of deep learn-
ing in recent years have led to breakthroughs in the limita-
tion of artificial features on the performance of built-up area
extraction. Li
et al.
(2016a) proposed a multiscale
CNN
model
to extract features from image blocks in high-resolution
SAR
images. Wu
et al.
(2017) also applied deep
CNN
to extract
built-up areas from
SAR
images. Khelifa
et al.
(2017) utilized
a pretrained
CNN
to extract features of blocks generated by
super-pixels and classified them by random forest. Li
et al.
(2017b) applied unsupervised deep feature learning to extract
urban villages. Some achievements have been made in the
aforementioned deep learning methods for the extraction
of built-up areas, but most studies treat it as a classification
task just as we have done in the early research (Tan
et al.
2017). However, it is actually more appropriate to treat it
as a segmentation task that processes the image as a whole
rather than split it into many parts. Because segmentation
takes advantage of the neighboring contextual information,
combining deep features and segmentation techniques is
more reasonable. In recent years, some scholars tried to use
semantic segmentation methods to extract target on remote
sensing data. Zheng
et al.
(2018) proposed a semantic seg-
mentation neural network combining residual learning and
the advantages of u-net for road area extraction. Ghosh
et al.
(2018) applied Dilated Stacked U-Nets to produce results in
semantic segmentation on Deepglobe data.
In this article, motivated by the fact that
FCN
has achieved
great success in the semantic segmentation of natural im-
ages (Long
et al.
2015), we propose an algorithm that adopts
FCN
for built-up area extraction. However, direct use of
FCN
to achieve remote sensing image segmentation faces several
challenges. First, the remote sensing images that need to be
processed are always large such that the direct application
of
FCN
suffers from limited hardware resources. For example,
the algorithm exploiting
FCN
in the literature (Gao
et al.
2017;
Chen
et al.
2018) can handle only one small area at a time.
Second, the algorithm speed is important when considering
the amount of data to be processed in practice. In the early re-
search, Tan
et al.
(2017) first used
LMB-CNN
to extract the deep
features of each image block and, based on this, built a graph
model to do the overall segmentation of the image and finally
obtained the built-up area. But graph cut is not a learning-
based algorithm and is computationally expensive so that it
relies mostly on the design of graph. Robustness against data
in different scenarios is not strong. Considering that super-
vised deep learning relies on training samples, when the data
scene changes greatly, we can improve the accuracy of the
test through retraining without changing the structure of the
algorithm. Generally, graph cut is a segmentation process.
And inspired by the success of
FCN
segmentation in natural
images (Chen
et al.
2018), we try to replace the graph cut
using the designed
FCN
network to ameliorate the generality
ability. In addition, using
FCN
to segment the feature map ex-
tracted by
LMB-CNN
solves the challenges of directly using
FCN
to segment remote sensing images. Based on the constructed
built-up area remote sensing image data set with a large num-
ber of samples, we combine
LMB-CNN
and the designed
FCN
to
segment the built-up area at the block level.
The main contributions of the article are as follows:
1. A blockwise built-up area extraction framework that can
be used in large-scale remote sensing images is presented
by fully utilizing the feature extraction ability of
LMB-CNN
and the segmentation ability of
FCN
. The approach also has
low time consumption.
2. A fully convolutional network is designed to segment
e maps so that several segmentation
Furthermore, the final segmentation
oting on several masks.
The remainder of the article is organized as follows. The next
section describes the implementation detail of our proposed
algorithm. Next, the adopted training and testing data are
introduced, and we show the experimental results in detail.
Finally, conclusions are summarized in the final section.
Methodology
The
FCN
has been successful in the semantic segmentation of
natural scenes (Long
et al.
2015; Zheng
et al.
2015; He
et al.
2017), but the size of the images they work on is usually not
very large. The proposed method also attempts to apply
FCN
to large-scale remote sensing images. As we can see in Figure
1, the procedure for the proposed method can be divided into
three steps: (1) divide the image into small blocks by checker-
board partitioning as the processing unit and extract the deep
features of each block via an
LMB-CNN
; (2) rearrange the fea-
tures of the blocks into multi-channel feature maps according
to spatial location (the designed
FCN
segments the multi-chan-
nel feature maps into eight preliminary segmentation masks);
and (3) vote on the eight preliminary segmentation masks to
determine the final extraction result. It is worth pointing out
that an
LMB-CNN
is used to extract features that are used as the
738
October 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
699...,728,729,730,731,732,733,734,735,736,737 739,740,741,742,743,744,745,746,747,748,...778
Powered by FlippingBook