12-19 December Full - page 606

results dramatically. This is perfectly illustrated by

the low variance in Figure 14a for the three areas of

interest. These results are in line with the conclu-

sions at the finesse level 3. In the higher finesse

level, only

BUS

, from all atomic errors, was highly

impacted by a change in the feature configuration.

This may explain the observed low variability. We

also note the prevalence of

FOS

errors (between 60%

and 85%) and

FIG

(between 70% and 95%) in the

Facet Errors family. This added to the fact that they

are, in a large capacity, easily detected individu-

ally (>90%, in

-score, for the first and 80% for the

second, see Figure 8b) helps understanding why the

-score reaches at least 90% for this family (Figure

14a). As with finesse level 3 experiments, Facet Er-

rors yields higher prediction scores than on Build-

ing Errors. Indeed, we can see a smaller discrepancy

between

-scores on different scenes for Facet Er-

rors (below 5%) than for Building Errors (15%).

The transferability study (Figure 14b) compares

the

-scores with the ablation study provided in

Figure 14a. Out of all 12 possible comparisons, only

two exhibit a decrease in error discrimination. Both

affect the Building Errors family when trained on

Nantes. Facet Errors, on the other hand, confirms,

its transferability and stability (less than 5% of

discrepancy between the two extremal values). For

this reason, we skip the generalization study, all together, at

this section.

The representativeness study conducted for the finesse

level 2 results in the

-scores that are illustrated in Figure

14c. Family detection scores are very stable across all differ-

ent tested split ratios. Moreover, in contrast to atomic errors

results (cf. Figure 13),

-scores do not vary by more than 1%

in mean and standard deviation. This proves that at finesse

level 2, error family prediction is evened out independent of

different split ratios, as opposed to higher order errors. Again,

it benefits from the higher heterogeneity of the training set

with multiple areas.

Detection of Erroneous Models

Now, we work at finesse level 1, first on feature ablation.

Since valid samples are very rare in our

case, it is expected

that it will be very difficult to detect the

se instances. In conse-

quence, in Table 6, we choose to report

correctly Valid build-

ings instead of computing the precision score in percentage.

At this level, even more that the error family semantic

degree, feature configurations have virtually no impact on

test results: Elancourt was the only exception when im-

age features are added to geometric ones. Furthermore, we

confirm expectations as, at most, only 1 out of 57 (resp

0 out

of 55 and 3 out of 21) valid instances are detected for Elan-

court (resp

Nantes and Paris-13). As a consequence, we do

not report the rest of previously conducted experiments for

this finesse level. Indeed, it is senseless to compare detec-

tion transferability, generalization or representativeness if we

hardly detect them at all on the same training scene.

Conclusion

A learning framework was proposed to semantically evalu-

ate the quality of 3D m

odels of buildings. For that purpose,

errors were hierarchic

ally organized into a novel flexible

taxonomy. It aims to h

andle the large diversity of urban envi-

ronments and varying

requirements stemming from end-users

Figure 14. F-score mean and standard deviation for the feature ablation study outcomes per zone for finesse level 2. (a)

corresponds to the ablation study, (b) to the transferability experiments, and (c) to the representativeness setting.

Table 5. Feature ablation study on the three datasets for the finesse = 2 case.

Geom.

∪

Hei. Geom.

∪

Im.

All.

Rec Prec Rec Prec Rec Prec Rec Prec

Elancourt

Building Errors 99.76

85.96

99.82 85.88 99.88 85.57

100

85.55

Facet Errors

91.79

89.79

92.65 89.40 93.21 89.45

93.46

89.16

Nantes

Building Errors 85.98 67.27

87.59

67.79 85.75 68.32 86.90

69.23

Facet Errors

91.20 94.01 91.37

94.36

91.20 94.35

91.73

94.21

Paris-13

Building Errors 97.36 68.76 97.36 68.76 97.36 68.76 97.36 68.76

Facet Errors

99.03 91.26 99.03 91.26 99.03 91.26 99.03 91.26

Table 6. Test results expressed in percentage for the finesse = 1 case.

Geom.

∪

Hei.

Geom.

∪

Im.

All.

Rec Valid Rec Valid Rec Valid Rec Valid

Elancourt

Erroneous 99.95 1/57 99.95 1/57 99.95 1/57 99.95 1/57

Nantes

Erroneous 99.84 0/55 99.84 0/55 100 0/55 100 0/55

Paris-13

Erroneous 99.77 3/21 99.77 3/21 99.77 3/21 99.77 3/21

876

December 2019

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

SEO Version

Warning.

You are currently viewing the SEO version of !text.
It has a number of design and functionality limitations.

We recommend viewing the Flash version or the basic HTML version of this publication.

581...,596,597,598,599,600,601,602,603,604,605 607,608,609,610,611,612,613,614,615,616,...648