this combination is a promising approach for
HSI
classifica-
tion (Wan, Tang, and Li 2015). Dopido, Li, and Plaza (2012)
proposed a novel semisupervised active-learning method for
urban hyperspectral image classification. Initially, utilizing
active learning to select the most informative samples with
achieving a great improvement in the classification results.
And then classifier estimates the labels of the selected samples
with no extra cost for labeling the selected samples. In order to
select informative and representative unlabeled data, Di and
Crawford (2011) embedded algorithms into the multiview ac-
tive learning (Di and Crawford 2011). A novel coregularization
framework for active learning was presented. The first regular-
izer studies the intrinsic multiview information embedded in
the hyperspectral data and the second regularizer is based on
the “consistency assumption”. Recently, a novel way named
Collaborative Active and Semisupervised Learning (
CASSL
)
was proposed. This method exploits the unlabeled data by al-
lowing both human labeling and a classifier to collaboratively
label the selected unlabeled data for hyperspectral image
classification (Wan, Tang, and Li 2015).
CASSL
immediately
finished the improvement of the classification performance
when check classifier and base classifier are always obtaining
same prediction on unlabeled samples. Based on the above
analysis, this paper demonstrates an intensive framework for
normal
CASSL
, named the Double-Strategy-Check Collabora-
tive Active and Semisupervised Learning (
DSC-CASSL
), aiming
to simultaneously improve the classification accuracy and
robustness.
DSC-CASSL
is mainly divided into two parts, one is
alliance double-verification part in semisupervised learning,
the other one is the Ensemble Strategy Active Learning (
ESAL
)
part (Cui, Kai, and Zhongjun 2018).
The rest of this paper is organized as follows. Section
“Related Work” presents details of the related works on hy-
perspectral image classification. Section “Proposed Method-
ology” describes the proposed framework in detail. Section
“Experiments and Analysis” describes the verification of the
effectiveness of
DSC-CASSL
on four data set. The Conclusion
section summarizes this paper.
Related Work
This section briefly introduces some exi
applying
AL
, SSL, and the combination
hyperspectral image classification.
Active Learning for Hyperspectral Image Classif
The
AL
techniques are suitable for solving classification prob-
lems given a small sized labeled data. And the performance
of
AL
also depends on the methods of selecting unlabeled data
for manual labeling. Many criterions have been proposed and
applied to provide a ranking of unlabeled data based on their
structure characteristics and potential information. Recently,
there are many researches on the uncertainty criterion, such
as large margin heuristics, query by committee, and the
posterior probability. Schohn and Cohn (2000) introduced a
margin sampling (MS), which selects the data point closest
to the current separating hyperplane. This method can obtain
the unlabeled sample with the lowest confidence. MS was
originally proposed for solving binary classification problems.
Hence, the multiclass-level uncertainty (
MCLU
) technique has
been proposed to deal with multiclass classification problems.
In
MCLU
, the difference between the first and second largest
distance values to the hyperplanes is selected to compute the
classification confidence of each unlabeled sample. Thus, the
classification confidence is assessed based on the two most
likely classes to which the test pattern belongs. Since such a
heuristic merely seeks for maximal uncertainty and may gen-
erate redundancy, the pixel diversity is always imposed in the
query process to reduce the redundancy (Demir, Persello, and
Bruzzone 2011).
MCLU
can be described as follows (Demir,
Persello, and Bruzzone 2011):
r
1
max
= arg max {
f
i
(
x
)}
i
= 1,2,…,
n
(1)
r
2
max
= arg max {
f
i
(
x
)}
j
= 1,2,…,
n
,
j
≠
r
1
max
(2)
c
diff
(
x
) =
r
1
max
–
r
1
max
(3)
The
c
diff
(
x
) strategy compares the uncertainty between the
two most likely categories. If the value of
c
diff
(
x
) is high, the
sample x is assigned to
r
1
max
with high confidence. On the
contrary, if this value is small, it means that the decision for
r
1
max
is not reliable, and there is a possible contradiction with
the class
r
2
max
.
Entropy maximization gives a multiclass heuristic natu-
rally, such as Entropy-Query-by- Bagging (
EQB
) were applied
by Demir, Persello, and Bruzzone (2011). And this algorithm
is independent on the classifiers.
EQB
has two main problems.
First,
EQB
has the tendency to locally over sample complex
areas, indicating a large number of classes among which the
committee is uncertain. Therefore, sampling will be con-
stantly toward areas of uncertainty among several classes,
causing the risk of redundant sampling. The concentration of
sampling in the most uncertain areas may increase the risk
of ignoring relevant samples lying in areas of uncertainty
between a small number of classes. Copa, Tuia, and Volpi
(2010) proposed normalized entropy query-by-bagging (
nEQB
)
algorithm to solve this problem. This measure works as a
weighted index that is not affected by the number of classes
predicted for the candidate.
nEQB
avoids oversampling and
favors diversity.
The procedure starts with the creation of k training sets
built on a selection with replacement of the original data.
Each set is used to train a classifier and to obtain the class
label of all the datapoints in the set of candidates U. The pro-
cess provides k possible labeling for each sample, which can
be regarded as a classification frequency and thus as the prob-
ability for the candidate
i
x
to be labeled in the class ω. This
probability is utilized to assess uncertainty of class member-
pixel
i
x
as:
H x
N
B
x U
i
i
i
=
( )
( )
∈
arg max
log
(4)
H x
p y w x
p y w x
i
i
i
w
N
i
i
i
( )
= −
=
(
)
=
(
)
(
)
=
∑
*
*
| log
|
1
(5)
p y w x
y w
y w
i
i
i m
m
k
i m j
j
N
m
k
i
*
,
*
,
*
|
,
,
=
(
)
=
(
)
(
)
=
=
=
∑
∑∑
δ
δ
1
1
1
(6)
H
(
x
i
) is an empirical measure of entropy,
y
*
i
is the class la-
bel predicted for the ith pixel, and
p
(
y
*
i
=
w
|
x
i
) is the observed
probability to obtain class
ω
predicted for the candidate. N
i
is
the number of classes predicted for
x
i
by the committee. Note
that 1
≤
N
i
≤
N
, where
N
is the total number of classes.
y
*
i,m
is the
class label predicted for pixel
x
i
by the mth classifier.
Semisupervised Learning for Hyperspectral Image Classification
Differing from active learning, the learning process of SSL
implements both supervised and unsupervised methods, it
uses labeled and unlabeled data. In Alhichri
et al.
(2015), a
842
November 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING