inputs of a
FCN
. The segmentation task is essentially the clas-
sification of each local image block by considering neighbor-
ing relationship, so the classification features extracted by
LMB-CNN
are suitable for subsequent segmentation tasks.
The details of each step are illustrated in the following
sections. In addition, some simple postprocessing steps are
accepted to make the final extraction result more practical.
Extract Deep Features by LMB-CNN
The first step of the proposed algorithm divides the image
into small blocks that are used to extract the deep features as
the input of
FCN
, and the block size is 64×64 pixels consid-
ering that the resolution of the image we processed is 1 m.
There are four reasons to do this other than that we combine
LMB-CNN
with
FCN
to construct an end-to-end network. First,
as already described, treating small blocks as the processing
units is more appropriate than pixels. Second, this step can
handle the problem that a large remote sensing image causes
considerable consumption of hardware resources while using
FCN
. More than 20
GB
of storage is required for 64-channel fea-
ture maps with a size of 10 240×10 240, and the actual image
is usually larger. This operation reduces the input size of
FCN
by nearly 1000 times. Third, the selected
LMB-CNN
model can
be trained as a classification task, while the
FCN
model used
later is trained as a segmentation task. With
FCN
segmentation
using the block-level features extracted by
LMB-CNN
, we can
combine them together sequentially in which both of the two
tasks can be learned. Finally, the block size is 64 pixels be-
cause each block can traverse two or more buildings in terms
of the definition of the built-up area.
To extract the deep features, we select a lightweight multi-
branch convolutional neural network denoted as
LMB-CNN
(Tan
et al.
2018b), considering the speed requirement of practical
applications. The structure of
LMB-CNN
is shown as follows.
As illustrated in Figure 2, in addition to the standard con-
volutional layers and full connection layers, the main body of
LMB-CNN
is three multi-branch blocks. For each multi-branch
Figure 1. Main framework of the proposed algorithm.
Figure 2. Structure of
LMB-CNN
. The size and quantity of the convolution kernels are shown in the figure, and batch
normalization layers and the activation layers are hidden. “DS Conv” means depthwise separable convolution, and “Conv”
means standard convolution.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
October 2019
739