October 2019 Layout Flipping Full

by the seasons and limited by within-class spectral variation

and between-class spectral confusion. Therefore, it is usually

necessary to manually adjust the parameters for each local

image in practical applications. The texture feature reflects

the characteristic of building arrangement in built-up areas,

which are more discriminative (Smits and Annoni 1999; Li

and Narayanan 2004; Zhong and Wang 2007; Guo

et al.

2014;

Li

et al.

2014, 2015a, 2017a). The built-up areas have various

types and styles so that texture features are insufficient to

address extensive areas, which is one of the possible rea-

sons why many studies have concentrated on only one small

region. In addition, key point features, such as Harris corner

point (Tao

et al.

2012; Li

et al.

2015a; Chen

et al.

2018) and

SIFT (Sirmacek and Unsalan 2009), are also commonly used

due to the structural characteristics of the buildings. There

are some applications that adopt edge features (Unsalan and

Boyer 2004; Hu

et al.

2013). Although these artificial features

are very delicate, the single feature is unable to address the

complex and changeable built-up areas in a large region.

Therefore, many studies have attempted to improve the

extraction performance by combining multiple features, such

as SVM-based multikernel learning (Tao

et al.

2012; Li

et al.

2017a) and other strategies (Zhang

et al.

2014; Hu

et al.

2016;

Zhang

et al.

2017b). Multi-feature fusion can improve the

algorithm performance effectively, but the artificial features

still have limited generalization ability.

The existing methods are basically a combination of artifi-

cial features, classifiers, or segmentation algorithms. Artificial

features are one of the most important factors that limit prac-

ticability. Additionally, the determination of the processing

unit is also significant. Built-up areas are regional objects that

contain abundant buildings and nonbuilding area elements,

such as lawns, trees, and other greenbelts. Therefore, distin-

guishing remote sensing images pixel by pixel is unreason-

able (Pesaresi

et al.

2009; Liu

et al.

2014; Varshney and Rajesh

2014; Bouzekri

et al.

2015; Kaimaris and Patias 2016). Some

studies have used superpixels as an alternative (Li

et al.

2015b), which can improve the speed of algorithms to some

extent. For built-up area extraction, taking small blocks are

divided by checkerboard partitioning as the processing unit is

more appropriate (Zhong and Wang 2007a, 2007b; Tao

et al.

2012; Hu

et al.

2013; Li

et al.

2017a). Smal

l blocks that con-

tain buildings and pixels between buildin

gs can effectively

represent the characteristics of built-up ar

eas, and the extrac-

tion results conform to the practical requir

ements.

The rapid development and great success of deep learn-

ing in recent years have led to breakthroughs in the limita-

tion of artificial features on the performance of built-up area

extraction. Li

et al.

(2016a) proposed a multiscale

CNN

model

to extract features from image blocks in high-resolution

SAR

images. Wu

et al.

(2017) also applied deep

CNN

to extract

built-up areas from

SAR

images. Khelifa

et al.

(2017) utilized

a pretrained

CNN

to extract features of blocks generated by

super-pixels and classified them by random forest. Li

et al.

(2017b) applied unsupervised deep feature learning to extract

urban villages. Some achievements have been made in the

aforementioned deep learning methods for the extraction

of built-up areas, but most studies treat it as a classification

task just as we have done in the early research (Tan

et al.

2017). However, it is actually more appropriate to treat it

as a segmentation task that processes the image as a whole

rather than split it into many parts. Because segmentation

takes advantage of the neighboring contextual information,

combining deep features and segmentation techniques is

more reasonable. In recent years, some scholars tried to use

semantic segmentation methods to extract target on remote

sensing data. Zheng

et al.

(2018) proposed a semantic seg-

mentation neural network combining residual learning and

the advantages of u-net for road area extraction. Ghosh

et al.

(2018) applied Dilated Stacked U-Nets to produce results in

semantic segmentation on Deepglobe data.

In this article, motivated by the fact that

FCN

has achieved

great success in the semantic segmentation of natural im-

ages (Long

et al.

2015), we propose an algorithm that adopts

FCN

for built-up area extraction. However, direct use of

FCN

to achieve remote sensing image segmentation faces several

challenges. First, the remote sensing images that need to be

processed are always large such that the direct application

of

FCN

suffers from limited hardware resources. For example,

the algorithm exploiting

FCN

in the literature (Gao

et al.

2017;

Chen

et al.

2018) can handle only one small area at a time.

Second, the algorithm speed is important when considering

the amount of data to be processed in practice. In the early re-

search, Tan

et al.

(2017) first used

LMB-CNN

to extract the deep

features of each image block and, based on this, built a graph

model to do the overall segmentation of the image and finally

obtained the built-up area. But graph cut is not a learning-

based algorithm and is computationally expensive so that it

relies mostly on the design of graph. Robustness against data

in different scenarios is not strong. Considering that super-

vised deep learning relies on training samples, when the data

scene changes greatly, we can improve the accuracy of the

test through retraining without changing the structure of the

algorithm. Generally, graph cut is a segmentation process.

And inspired by the success of

FCN

segmentation in natural

images (Chen

et al.

2018), we try to replace the graph cut

using the designed

FCN

network to ameliorate the generality

ability. In addition, using

FCN

to segment the feature map ex-

tracted by

LMB-CNN

solves the challenges of directly using

FCN

to segment remote sensing images. Based on the constructed

built-up area remote sensing image data set with a large num-

ber of samples, we combine

LMB-CNN

and the designed

FCN

to

segment the built-up area at the block level.

The main contributions of the article are as follows:

1. A blockwise built-up area extraction framework that can

be used in large-scale remote sensing images is presented

by fully utilizing the feature extraction ability of

LMB-CNN

and the segmentation ability of

FCN

. The approach also has

low time consumption.

2. A fully convolutional network is designed to segment

the blockwise featur

e maps so that several segmentation

masks are obtained.

Furthermore, the final segmentation

result is refined by v

oting on several masks.

The remainder of the article is organized as follows. The next

section describes the implementation detail of our proposed

algorithm. Next, the adopted training and testing data are

introduced, and we show the experimental results in detail.

Finally, conclusions are summarized in the final section.

Methodology

The

FCN

has been successful in the semantic segmentation of

natural scenes (Long

et al.

2015; Zheng

et al.

2015; He

et al.

2017), but the size of the images they work on is usually not

very large. The proposed method also attempts to apply

FCN

to large-scale remote sensing images. As we can see in Figure

1, the procedure for the proposed method can be divided into

three steps: (1) divide the image into small blocks by checker-

board partitioning as the processing unit and extract the deep

features of each block via an

LMB-CNN

; (2) rearrange the fea-

tures of the blocks into multi-channel feature maps according

to spatial location (the designed

FCN

segments the multi-chan-

nel feature maps into eight preliminary segmentation masks);

and (3) vote on the eight preliminary segmentation masks to

determine the final extraction result. It is worth pointing out

that an

LMB-CNN

is used to extract features that are used as the

738

October 2019

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

October 2019 Layout Flipping Full - page 738

Warning.