DSM Generation from High Resolution Multi-View
Stereo Satellite Imagery
K. Gong and D. Fritsch
Abstract
Along with improvements to spatial resolution, multiple-
view stereo satellite imagery has become a valuable data-
source for digital surface model generation. In 2016, a public
multi-view stereo benchmark of
ery was released by the John Ho
Physics Laboratory,
USA
. Motiva
benchmark, we propose a pipeline to process multi-view
satellite imagery into digital surface models. Input images
are selected based on view angles and capture dates. We
apply the relative bias-compensated model for orienta-
tion, and then generate the epipolar image pairs. The im-
ages are matched by the modified tube-based SemiGlobal
Matching method (
tSGM
). Within the triangulation step,
very dense point clouds are produced, and are fused by a
median filter to generate the Digital Surface Model (
DSM
).
A comparison with the reference data shows that the fused
DSM
generated by our pipeline is accurate and robust.
Introduction
Over the last decade, a number of High Resolution Satellite
(
HRS
) sensors have been launched by commercial companies
or space agencies, like Sentinel-2, WorldView-3/4, Pléiades,
and so on. The best Ground Sample Distance (
GSD
) of
HRS
panchromatic imagery has reached the 30 cm level, which
reveals more surface features. The
HRS
sensors cover most of
the regions of the Earth and collect the surface information
with large range footprints. They have high revisit frequency
over a certain area, which can provide a large number of im-
age collections and make the acquisition of multi-view stereo
(
MVS
) satellite imagery available. As well-known, the Rational
Polynomial Coefficients (
RPCs
) are provided by the satel-
lite data vendor, instead of the rigorous push-broom sensor
model. Thus, data consumers can ignore the difference of the
satellite sensors and easily process the satellite data by ap-
plying a general pipeline. Because of these benefits, the
MVS
high resolution satellite images are useful for global three-
dimensional (
3D
) mapping, environmental monitoring, urban
planning, change detection, and so on.
In 2016, a public
MVS
benchmark of commercial satellite
imagery was released by the John Hopkins University Applied
Physics Laboratory (
JHU
/
APL
),
USA
. The benchmark contains
50 DigitalGlobe WorldView-3 panchromatic and multispectral
images. The imagery covers a 100 square kilometers area close
to San Fernando, Argentina, with
GSD
of the nadir images of
about 30 cm. High resolution image data was made available
which was captured from November 2014 to January 2016.
The benchmark also provides a (Light Detection and Ranging)
LiDAR
point cloud collected on June 2016 as the ground truth,
with nominal point spacing of 20 cm. Digital surface models
(
DSMs
) at 30 cm
GSD
are produced from the
LiDAR
point cloud,
in order to make equally-spaced comparisons with the results
generated from Worldview-3 panchromatic imagery (Bosch
et al.
2016). This well-organized
MVS
high resolution satellite
benchmark has motivated us to learn and test new methods of
point cloud and
DSM
generation from
MVS
satellite data.
t is well known that
MVS
imagery
3D
reconstruction meth-
can be classified into two categories. The first category
es the multi-view triangulation problem for all images
simultaneously, which is the true multi-view method (Furuka-
wa and Hernandez 2015). The second category only uses the
binocular stereo pairs. It processes the stereo pairs separately
and fuses the output point clouds or
DSMs
to a final result
(Haala 2013). Comparing the binocular stereo strategy with
the true multi-view method, the latter is more rigorous but
also more complicated. Because of the efficiency and stable
performance of the semiglobal matching (
SGM
) algorithm
(Hirschmüller 2008), most solutions for the
3D
reconstruction
from the
MVS
satellite imagery is implemented using binocu-
lar stereo methods (d’Angelo and Kuschk 2012; Kuschk 2013;
Qin 2017; Facciolo
et al.
2017). Some researchers have inves-
tigated and compared both kinds of reconstruction strategies
on
MVS
satellite images (Ozcanli
et al.
2015). In their imple-
mentation, the pair-wise multi-view reconstruction method
demonstrated better results than the true multi-view method.
In this paper, we present a pipeline based on the binocular
stereo method for
DSM
generation using
MVS
high resolu-
tion satellite imagery. The point clouds and
DSMs
, which are
separately generated from different stereo pairs, will be fused
to the final
DSM
. The fused final
DSM
is compared to the refer-
ence
DSM
for further evaluations. We conduct a qualitative
analysis by visual comparison and calculate the complete-
ness, the median error, the root-mean-square error (
RMSE
) and
the error distribution for the quantitative analysis. We show,
that our proposed pipeline can produce accurate and robust
DSM
from
MVS
satellite imagery.
The contents of this paper are structured as follows:
Section “Related Work” introduces past work in this area,
whereas the methodology of the proposed pipeline is pre-
sented in the section “Methodology”. Section “Experiments”
demonstrates the results generated from the benchmark data
and their evaluation, and in the last section we draw some
conclusions.
Related Work
The high resolution satellite sensors are able to provide plenty
of imagery for a certain area, but usually they are collected on
different dates. Thus, the collected images may have different
illumination situations, different geometric configurations,
and may contain terrain changes. All of those differences will
have negative influences on the outcome of the
DSM
genera-
tion. A large number of stereo images also means that image
Institute for Photogrammetry, University of Stuttgart, 70174
Stuttgart, Germany (
, dieter.
.
Photogrammetric Engineering & Remote Sensing
Vol. 85, No. 5, May 2019, pp. 379–387.
0099-1112/18/379–387
© 2019 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.85.5.379
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2019
379