Multi-Response Linear Regression (MRLR)
The
MRLR
model is an effective method for the ensemble of
heterogeneous base classifiers. The advantages of using
MRLR
is its interpretability as it provides a method of combining
the results generated by the level-0 into a final decision. The
weights generated by
MRLR
indicate the different contribu-
tions that each base classifier makes for class prediction,
which can be described as follows. Suppose the training sam-
ple set
Φ
= {(
x
i
,
y
i
)}
N
i
=1
contains N observations, where
x
i
= (
x
i
1
,
x
i
2
,…,
x
ip
)
T
is a p-dimensional eigenvector,
y
i
is a class label
and
y
i
∈Γ
= {
w
1
,
w
2
,…,
w
m
}. We use the training set
Φ
to train
L
different classification algorithms to obtain the integration
ζ
= {
C
1
,
C
2
,…,
C
L
} of the
L
base classifiers. We assume that each
base classifier
C
i
(
i
= 1, 2, …,
L
) predicts an observed value as a
posterior probability distribution vector:
P x P w x P w x P w x
P x P x
C
C
C
C m
T
C
C
i
i
i
i
i
i
( )
=
(
)
(
)
…
(
)
(
)
=
( )
( )
1
2
1
2
|
|
|
,
, ,
,
,
…
( )
(
)
= …
,
,
, , ,
P x i
L
C
m T
i
1 2
(1)
where
P
i
j
(
x
) is the possibility value of the pixels in the
w
j
class
obtained by the
i
th
base classifier. We can therefore describe
the input data of the meta-classifier as a
m
×
L
matrix
P
(
x
):
P x P x P x P x
P x P
T
T
L
T T
m
ClassifierC
( )
=
( )
( )
…
( )
(
)
=
( )
…
1
2
1
1
1
1
,
, ,
, ,
…
,
, ,
, ,
, ,
P x P x P x P
m
ClassifierC
C
L
2
1
2
1
2
( )
…
( )
( )
…
m
ClassifierC
T
x
L
( )
(2)
MRLR
transforms the C classification problems into C
regression problems. For example, for class
w
j
, if the sample
has a class label
w
j
, its output value is 1; otherwise, the output
value is 0. For each class
w
j
,
MRLR
chooses each base classi-
fier’s predicted
x
belonging to class
w
j
to establish a linear
model, which is defined as:
L
R x
a P x a j
m
j
i
L
i
j
i
j
i
j
( )
( ),
,
, ,
( )
=
= ≥
=
Σ
1
0 1 2
(3)
and the estimation of parameter
a
i
j
j
m
{ }
=
1
usually utilizes the
NNLS
algorithm.
For each sample,
MRLR
utilizes the predicted values of
the
L
base classifiers to construct the input feature data but
ignores the association with neighboring pixels. In this paper,
considering the spatial information, the weighted average of
the sample’s eight neighbors is also taken into account when
constructing the feature data of the meta-classifier. The spe-
cific input data can be represented as:
Q
x P x Q x P x Q x
P x P Q x
T
T
L
T
L
T T
m
( )
=
( )
( )
…
( )
( )
(
)
=
( )
…
( )
1
1
1
1
1 1
1
,
, ,
,
, ,
,
, ,
, ,
, ,
,
…
( )
…
( )
( )
Q P x P x Q x
m
Classifier C
L
L
m
L
1
1
1
1
…
, ,
…
( )
Q x
L
m
Classifier C
L
(4)
where
Q
i
j
(
x
) is the weighted average of the probabilities of the
eight neighboring pixels in the
w
j
class obtained by the
i
-th
base classifier.
To estimate the model parameters in
MRLR
, we propose the
FOA
and compare it with the
NNLS
algorithm (Li and Ngom,
2013).
NNLS
algorithm is the most commonly used method for
parameter estimation of the
MRLR
model. The
FOA
is one of the
recently developed swarm optimization algorithms, and it has
global optimization ability (Iscan and Gunduz, 2015). Besides,
FOA
is a stable algorithm, which solves the problems fast.
Construction of Multi-Source Feature Dataset
and Automatic Selection of Training Samples
As aforementioned, many studies have demonstrated the
effectiveness of combination of texture, morphological, and
spectral features. The gray level co-occurrence matrix (
GLCM
)
is a conventional method of extracting statistical texture
features. In this paper, five second-moment descriptors, i.e.,
mean, variance, homogeneity, contrast, and dissimilarity,
are applied. For the selection of window size, according to
the size and distribution of various features in the image,
we choose 5 × 5 size window and 0° direction to extract the
features. The morphological features are also a type of texture
features called structure texture. Two commonly used mor-
phological operators are opening and closing. The mathemati-
cal morphology framework defines a series of operators to em-
phasize homogeneous spatial structures in a gray level image.
The strategy of opening reconstruction is to dilate an eroded
image in order to recover as much as possible of the eroded
image. In contrast, closing reconstruction is to erode a dilated
image in order to recover the initial shape of image structures
that have been dilated. The opening-and-closing reconstruc-
tion integrates the advantages of both operations regarding
their capacity to preserve original shapes of spatial structures.
Therefore, these three morphological reconstruction filters are
used to construct the input dataset. According to the distribu-
tion of features in images, a circular structure with a radius of
5 is chosen as the structuring element.
Despite the advantages of supervised classifiers in classi-
fication, they require training samples as labeled beforehand.
Manual selection of training samples can lead to incomplete-
ness of selected categories, and it is time-consuming. So in
this paper, the training samples are selected by Change Vector
Analysis (
CVA
), an unsupervised change detection method.
CVA
is very effective in combining different types of change
features. The training samples are selected from the change
map by using two thresholds and defined as:
t
T k c T k c
a k
c
t
T l
nc T l
nc
b
1
1
1
2
1
= +
+
+
− +
= −
−
+
[
* _ ,
* _
(
)
* _ ]
[
* _ ,
* _
(
δ
δ
δ
δ
δ
− +
l
nc
1)
* _ ]
δ
(5)
where
T
is determined by the expectation maximization (
EM
)
algorithm, the
δ
_
c
and
δ
_
nc
are the standard deviation of the
changed pixels and unchanged pixels, respectively, and
k
and
l
are the adjustment coefficients as
k
= 1, 2,…,
a
,
l
= 1, 2,…,
b
. Here,
a
= (
x_max
–
T
)|
δ
_
c
and
b
= (
T–x_min
)|
δ
_
nc
with
x_max
and
x_min
being the maximum and minimum value of
the
CVA
change map, respectively.
Pixel-wise Change Detection Based on the
Stacked Generalization Hybrid Ensemble System
As mentioned earlier,
ELM
,
SVM
, and
KNN
are chosen to con-
struct the base classifiers at level-0. The
MRLR
is utilized as
the meta-classifier at level-1. In order to improve computa-
tional efficiency and ensure a high accuracy, the
ELM
homo-
geneous integration algorithm based on random subspace
method (
RSM
) is adopted to label a large part of pixels. The re-
maining unlabeled pixels are then classified by the proposed
SG
hybrid ensemble system. The specific change detection
processes are as follows.
1. Generation of the level-0 base classifier
As described in the previous section, we randomly divide
the automatically acquired training samples into three
sub-training sets, then we utilize two parts to train
ELM
,
SVM
, and
KNN
to generate the base classifiers at level-0.
When training the
ELM
, the two sub-training sets use the
RSM
ensemble strategy to classify all the pixels. According
to the label determination rules, a large number of pixels
are labeled, and the remaining pixels are reclassified by
the trained
SVM
and
KNN
. The outputs of
ELM
,
SVM
, and
KNN
based on the
RSM
homogeneous integration and the
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
November 2018
735