To testify the effectiveness of the evaluation and compare
the performance of heterogeneity
VCAR
and
MI
, directive evalu-
ations based on classifications further proves the effectiveness
of the evaluation. The change curves of overall classification
accuracy (
OA
) are also illustrated in Figure.12. The correlation
computing results show that the correlations between
OA
and
NMI
is higher than those of
OA
and
NVCAR
, which proves that
NMI
index is more effective to measure the heterogeneity be-
tween the segmented objects. However, Ming
et al
. (2017) also
discussed that the intersegment heterogeneity based on
NMI
is
computed by using the global average gray value, however the
theoretical intersegment heterogeneity is toward local adjacent
image objects, so the validity of using
MI
to express the inter-
segment heterogeneity should be further discussed.
Conclusions
Evaluating the segmentation results of high spatial resolu-
tion remote sensing imagery is one of the difficulties in the
field of
GeOBIA
, and it is also an essential technology in the
automatic segmentation process. This paper systematically
and experimentally summarizes the commonly used segmen-
tation evaluation methods. Currently, the most commonly
used segmentation evaluation method is still the subjective
evaluation method. The indirect evaluation method and the
analytical evaluation method are commonly used as assistive
evaluation methods. However, these three evaluation methods
cannot provide quantitative, objective, and comprehensive
evaluation indexes. Besides, they are difficult to apply in an
automatic segmentation system with high spatial resolution
remote sensing images. With in-depth research, supervised
and unsupervised evaluation methods are gradually replacing
the subjective evaluation method, for they provide objective
and quantitative evaluation results.
With a precisely manually-established reference segmenta-
tion dataset, the evaluation result of the supervised evaluation
method is the most reliable. But, building a reference dataset
for whole remote sensing images, especially with high spatial
resolution, is tedious and time consuming. So, the super-
vised evaluation method is more suitable for evaluating the
single-scale segmentation results in the ap-
plication of typical object recognition, which
requires less reference data. In addition, the
deployment of this method depends heavily
on the reference data, so it cannot be used in
an automatic segmentation system.
The greatest advantage of the unsuper-
vised evaluation method is that it does
not need human intervention to build the
reference dataset manually, thus reduc-
ing the subjectivity of the evaluation to a
certain extent. The ultimate goal of segmen-
tation evaluation of remote sensing image
is to make an automatic quality check of
the segmentation result. However, in the
evaluation process of single-scale segmen-
tation results for typical object recogni-
tion, manual intervention to determine the
segmentation and evaluation scope of the
typical objects, which reduces the auto-
matic processing, is still required. Because
of this, the advantages of unsupervised
evaluation method cannot be used. Hence,
the unsupervised evaluation method is
more suitable for the evaluation of
GeOBIA
multi-scale segmentation results to further
select the optimal scale parameters, and is
the most appropriate evaluation method for
the automatic segmentation process in high
spatial resolution remote sensing application.
At present there are still many problems and deficiencies
in both the current supervised evaluation and the unsuper-
vised evaluation methods, further research on these two
evaluation methods is of importance.
In the future, research on segmentation evaluation with high
spatial resolution images can focus on the following aspects:
1. For the supervised evaluation method, finding a universal
object matching method and a discrepancy index.
2. The goodness indexes for the unsupervised evaluation
method are not comprehensive enough. Higher-level infor-
mation could be used to establish the goodness indexes,
such as prior knowledge and semantic information.
3. Establishing a goodness selection system for the unsuper-
vised evaluation method, determining how to select the
goodness index, and assigning weight for each index auto-
matically. The experimental results show that it is difficult
to synchronously get both high intra-segment homogeneity
and high intersegment heterogeneity, from which some
issues have stemmed. First, of the two measures, intraseg-
ment homogeneity and intersegment heterogeneity, which
one is more important to the terminal classification? Sec-
ond, what’s the relationship between the two measures? If
there are coupling relationship among inter-segment homo-
geneity, inter-segment heterogeneity, segmentation accu-
racy and classification accuracy? Third, is it perfect enough
to use
MI
to express the intersegment heterogeneity?
4. Combining the use of the supervised evaluation method
and unsupervised evaluation method to evaluate the
segmentation result. Using their own advantages, and es-
tablishing a comprehensive evaluation index and segmen-
tation evaluation system.
5. With the rapid development of artificial intelligence tech-
nology, research may need to consider combining artificial
intelligence technology and the segmentation evaluation
method to proposed innovative supervised discrepancy
indexes and unsupervised goodness indexes.
Figure.12 Unsupervised segmentation evaluation results: (a)~(b) Image-A,
(c)~(d) Image-B.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
October 2018
643