Building Extraction from High-Resolution
Remote Sensing Images Based on GrabCut
with Automatic Selection of Foreground and
Background Samples
Ka Zhang, Hui Chen, Wen Xiao, Yehua Sheng, Dong Su, and Pengbo Wang
Abstract
This article proposes a new building extraction method from
high-resolution remote sensing images, based on GrabCut,
which can automatically select foreground and background
samples under the constraints of building elevation contour
lines. First the image is rotated according to the direction
of pixel displacement calculated by the rational function
Model. Second, the Canny operator, combined with mor-
phology and the Hough transform, is used to extract the
building’s elevation contour lines. Third, seed points and
interesting points of the building are selected under the
constraint of the contour line and the geodesic distance.
Then foreground and background samples are obtained
according to these points. Fourth, GrabCut and geomet-
ric features are used to carry out image segmentation and
extract buildings. Finally, WorldView satellite images are
used to verify the proposed method. Experimental results
show that the average accuracy can reach 86.34%, which
is 15.12% higher than other building extraction methods.
Introduction
Buildings, as an important component of the living environ-
ment, have been the focus of numerous studies, including in
urban planning and construction, chang
tion-density estimation, and disaster ass
and efficient extraction of geometric and
of buildings has always been an importa
the field of geoinformation science (X. Huang
et al.
2017).
With the advance of remote sensing technology, spatial reso-
lutions of images from very-high-resolution satellites (e.g.,
SPOT-5
, WorldView-1 through WorldView-4, and QuickBird)
have reached meter level, providing more detailed spatial and
textural information (Cheng and Han 2016). Therefore, extrac-
tion of building information from high-resolution remote sens-
ing images has become a research hot spot (Cao
et al.
2016).
However, accurate building extraction from high-resolution
images remains a challenge due to various factors in remote
sensing images, such as diversity of objects, complexity of
buildings, noise, occlusions, shadows, and low contrast. To
make it worse, when the viewing perspective is oblique there
will be much coverage of building elevations in remote sensing
images. Using monocular optical images to automatically ex-
tract the top contour of buildings, those elevation areas are hard
to distinguish from building tops (Cui, Yan and Reinartz 2012;
J. Wang
et al.
2015)—but the main goal of building extraction is
to have a clean boundary for each building (J. Wang
et al.
2015).
At present, methods based on shadow and auxiliary in-
formation are frequently used. However, in locating a build-
ing, shadow-based methods treat the elevation and the roof
equally, producing inaccurate boundaries (Ok, Senaras and
Yuksel 2013; Ngo, Collet and Mazet 2015; Gao
et al.
2018).
Other methods based on auxiliary data such as lidar (light
detection and ranging) can distinguish the roof and elevation
well, but the cost of obtaining such data is high (Zarea and
Mohammadzadeh 2016; Fernandes and Dal Poz 2017; S. Kim
and Rhee 2018). Apart from those, deep learning-based im-
s a new research trend, but this kind of
ounts of training data, and usually do
taining a large quantity of elevations of
al.
2019; Wurm
et al.
2019).
This article proposes a building extraction method that can
distinguish the roof from the elevation under the constraint of
the building’s elevation contours without any other types of
data or training data. First, building elevation contour lines are
extracted. Then the foreground samples are selected under the
constraints of elevation contours, which are used as back-
ground samples. Finally, GrabCut is used for image segmenta-
tion, and geometric features of the segmented area are used to
accurately extract buildings from high-resolution images. The
method is tested on two urban data sets (Guangdong, China,
and Tripoli, Libya) using WorldView-2 and WorldView-3 satel-
lite image configurations. All results are evaluated qualita-
tively and quantitatively compared with ground truths. The re-
sults show that building tops can be accurately distinguished
from the elevations of buildings in the monocular images of
highly oblique viewing angles with a high level of automation.
Ka Zhang is with the Key Laboratory of Virtual Geographic
Environment, Nanjing Normal University, Ministry of
Education, Nanjing, China; the School of Geography, Nanjing
Normal University, Nanjing, China; the Jiangsu Center
for Collaborative Innovation in Geographical Information
Resource Development and Application, China; the State Key
Laboratory Cultivation Base of Geographical Environment
Evolution (Jiangsu Province), China; and the Key Laboratory
of Urban Land Resources Monitoring and Simulation, MNR,
China (
).
Hui Chen (co-first author), Yehua Sheng (co-first author),
Dong Su, and Pengbo Wang are with the Key Laboratory of
Virtual Geographic Environment, Nanjing Normal University,
Ministry of Education, Nanjing, China; and the School of
Geography, Nanjing Normal University, Nanjing, China.
Wen Xiao (co-first author) is with the School of Engineering,
Newcastle University, Newcastle upon Tyne, United Kingdom.
Photogrammetric Engineering & Remote Sensing
Vol. 86, No. 4, April 2020, pp. 235–245.
0099-1112/20/235–245
© 2020 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.86.4.235
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
April 2020
235