12-19 December Full - page 596

is based on visual inspection (Durupt and Taillandier 2006;
Macay Moreia
et al.
2013), geometric fidelity metrics (Kaartin-
en
et al.
2005), or by extending standard two-dimensional
(2D) object detection criteria (Karantzalos and Paragios 2010),
without any semantic dimension. Only one benchmark data-
set has addressed the issue (Rottensteiner
et al.
2014). It re-
mains focused on very few areas and a geometric comparison
with manually extracted roof structures (Li
et al.
2016; Nan
and Wonka 2017; Nguatem and Mayer 2017). Consequently, it
cannot be easily extended. Similar conclusions can be drawn
for indoor reconstruction (Tran
et al.
2019).
Positioning and Contributions
The current situation motivates the need for a well-suited
quality assessment paradigm. Since the building models
display strong structural properties, an unconstrained evalu-
ation based on data fidelity metrics, as in (Berger
et al.
2013),
is too general. The evaluation should also ignore format issues
or geometric consistencies as proposed in (Ledoux 2018). Al-
though being a serious issue and clean 3D models are usually
not the norm (Biljecki
et al.
2016a; Hu
et al.
2018), we rule
out, at this stage, these cases for simplicity. Instead, we target
a semantic evaluation in which building semantics is taken
into account through the detection and categorization of mod-
eling errors at the facet level for each 3D
work is independent from the
LoD
and th
The standard criteria used in the reconst
L
1
norm between the model and a Digita
(
DSM
)) will not be taken into account, as they are usually cho-
sen as minimization targets in the modeling procedure. Thus,
we define an evaluation framework that can be used for:
• Building model correction: for the automatic or interactive
(Kowdle
et al.
2011) refinement of building models using
the detected errors.
• Change detection: modeling errors can straightforwardly
stem from changes, which frequently occur in urban envi-
ronments (Taneja
et al.
2015). Conversely, changes can be
implicitly detected from other defects (Tran
et al.
2019).
• Reconstruction method selection: evaluating models from
various reconstruction algorithms can allow assessing
which method(s) is(are) the most adapted for a specific
LoD
and building type.
• Crowd-sourcing evaluation (Kovashka
et al.
2016): catego-
rizing user behaviors during crowd-sourced modeling and
vandalism detection process (Neis
et al.
2012).
This work proposes an adaptable and flexible framework
indifferent to input urban scenes and reconstruction methods.
For that purpose, our contributions are three-fold:
• A new taxonomy of errors, hierarchical, adapted to all
LoDs
,
and independent from input models;
• A supervised classification formulation of the evaluation prob-
lem which predicts all errors affecting the building model;
• A multimodal baseline of features that are extracted from
the model itself as well as from Very High Resolution (
VHR
)
external data (optical images and height data).
The next section, “Related Work) introduces the problem of
the evaluation of 3D building models and discusses exist-
ing methods. The section “Problem Formulation” details the
proposed approach, while data and experiments conducted
over three urban areas are presented in the section “Results.”
A more comprehensive set of experiments studying the scal-
ability of the proposed method is reported in the “Scalability
Analysis” section. The same experiments are conducted at
other semantic levels and recorded in the section “Finesse
Study.” Main conclusions are drawn in the last section.
Related Work
Quality assessment methods can be classified according to
two main criteria: reference data and output type.
Reference Data Types
Existing methods rely on two types of reference data.
tted ground truth data with very high
e models can be obtained either from
ick
et al.
2004; Kaartinen
et al.
2005)
ble precision (
σ
(error)
0.05 m), or us-
ing stereo-plotting techniques (Jaynes
et al.
2003; Kaartinen
et
al.
2005; Zebedin
et al.
2008; Zeng
et al.
2014). Generally, the
criterion is the root mean square error (
RMSE
) on the height
values. Such a strategy does not scale well, does not straight-
forwardly bring semantics, and requires a 3D matching proce-
dure (overlapping ratio between surfaces, minimal roof areas,
integration of superstructures) that can be complex in dense
urban environments.
Two, raw remote sensing data: models can either be com-
pared to the source that allowed the generation of the models
or remote sensing data of superior geometric accuracy: Light
Detection and Ranging (
LiDAR
) point clouds, height maps
(i.e.,
DSMs
) (Akca
et al.
2010; Lafarge and Mallet 2012; Li
et al.
2016; Zhu
et al.
2018) or multiview
VHR
images as in
(Boudet
et al.
, 2006; Michelin
et al.
2013). Despite the fact
such strategy better samples the area of interest, it may not
always be helpful. On one hand, they have been exploited by
the modeling methods and such comparisons are often the
basis for their fidelity criterion. On the other hand, additional
remote sensing data is not easy to obtain, especially at large
scales under operational constraints.
Figure 1. (a) Our semantic evaluation framework for 3D building models. Semantic errors affecting the building are predicted
using a supervised classifier and handcrafted features. (b) In addition to the input model topological structure, features are
extracted from Very High Resolution overhead data. (c) It can be based on a comparison with the Digital Surface Model (
DSM
).
(d) Optical images can also be used through, for instance, local gradient extraction. (e) Several errors can be detected at the
same time, in a hierarchical manner. Fidelity errors correspond to geometrical imprecision as shown in red. On the other
hand, modeling errors denote morphological inconsistencies with the real object.
866
December 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
581...,586,587,588,589,590,591,592,593,594,595 597,598,599,600,601,602,603,604,605,606,...648
Powered by FlippingBook