x
=
→ =
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
2
3
12
1
2
3
4
5
6
7
8
9
10
11
A
x
12
After this transformation, the between-class and within-class
scatter matrices are calculated with using matrix form of
training samples. Then, the eigenvector corresponding to the
largest eigenvalue of
S
w
–1
S
b
can be selected as the projection
vector (
p
). By transforming the original feature vector of each
pixel of image into a feature matrix (
x
d
×1
A
m
×
n
), the
m
-di-
mensional projected vector (
y
), which is the extracted feature
vector of matrix
A
, is obtained by:
y
m
×1
=
A
m
×
n
×
p
n
×1
.
(4)
We multiply the matrix form of each sample of data (
A
) in
p
for extraction of a
n
-dimensional feature vector (
y
) from that
sample. The scatter matrices in
2DLDA
can be calculated as
follows:
S
b
=
i
n
c
=
∑
1
n
ti
(
A
–
i
–
A
–
)
T
(
A
–
i
–
A
–
)
(5)
S
w
=
j
n
ti
=
∑
1
n
ti
(
A
–
ji
–
A
–
i
)
T
(
A
–
ji
–
A
–
i
)
(6)
In the above equations,
A
ji
is the
j
th
sample of class
i
,
A
–
i
is the
mean of class
i
, and
A
–
denotes the mean of entire training
samples. Note that
A
ji
,
A
–
i
, and
A
–
are
m
×
n
matrices. The pro-
jection vector for feature extraction using
2DLDA
is obtained
by maximizing the Fisher criterion as follows:
p
= arg max
p
p p
p p
T
b
T
w
S
S
.
(7)
The above optimization problem, is solved to get the opti-
mal
p
. We solve the following generalized eigenvalue problem
to obtain:
p
:
S
b
p
=
λ
S
w
p
(8)
where
λ
is the maximal eigenvalue of
S
w
–1
S
b
and
p
is the eigen
vecor associated with
λ
. Note that in traditional
LDA
(
1DLDA
),
the dimensions of scatter matrices are
d
×
d
while in
2DLDA
,
S
b
and
S
w
are
n
×
n
matrices where
n
<
d
. In addition, in
1DLDA
, we
have a projection matrix (
W
d
×
m
) for feature extraction while in
2DLDA
, we have a projection vector (
p
n
×1
). The use of the
2DLDA
approach for feature extraction of hyperspectral data has two
main advantages:
1. For high dimensional matrices, inversion is a really
sensitive operation that can only be reliably done if the
estimate of matrix is really good. But in
LDA
, it is really
difficult to obtain a precise estimate of
S
w
using limited
number of training samples. Thus,
S
w
will be almost
singular and this will cause overfitting in the
LDA
method. In
2DLDA
, with transformation the feature vec-
tor of each sample of data into a feature matrix, we deal
with the
SSS
problem. The within-class scatter matrix
in
2DLDA
is usually nonsingular. Li and Yuan (2005)
show that in
2DLDA
, the within-class scatter
matrix
S
w
is nonsingular when
N n
n
m n
c
≥ +
min( , )
(
m
and
n
are the number of rows and the number of
columns of
A
m
×
n
respectively, and
N
is the number of
total training samples). Obviously, this inequality is
usually satisfied and therefore, the
SSS
problem does
not exist in
2DLDA
.
2.
LDA
can extract a maximum of
n
c
– 1 features while
2DLDA
can extract each number of features with no
limitation. In
2DLDA
, the rank of
S
b
is not limited to the
number of classes. Moreover, without considering the
rank of
S
b
, only one eigenvector associated with the
largest eigenvalue of
S
w
–1
S
b
can be considered as the
projection vector (
p
). But, we show later that the use of
all eigenvectors for calculation of
p
improves the clas-
sification accuracy.
We may deal with two problems when we use the
2DLDA
method for feature extraction of hyperspectral images:
1. Because the number of spectral bands (
d
) must be writ-
ten as a product of two integers (
d
=
m
×
n
),
d
must be
a
composite number. In other words, we must be able
to transform the feature vector (
x
d
×1
) of each pixel of
image into a feature matrix (
A
m
×
n
). Thus, if
d
is a prime
number, the use of
2DLDA
is not possible.
2. The number of extracted features is equal to the number
of rows (
m
) of
A
, because we have:
y
m
×1
=
A
m
×
n
×
p
n
×1
.
Thus, if
d
is not divisible by
m
, extraction of
m
features
is not possible.
Now, we represent the solutions to deal with the aforemen-
tioned problems:
1. If the number of spectral bands (
d
) is a prime number,
we add
ε
new features to
d
original features so that
d
+
ε
becomes a composite number, and so, we can write
d
+
ε
as a product of two integers.
2. For extraction of
m
features, if
d
is not divisible by
m
, we
add
ε
to
d
such a way that
d
+
ε
becomes divisible by
m
.
In general, for extraction of
m
features, we add as small
a value of
ε
as possible to
d
so that
d
+
ε
becomes composite
and divisible by
m
. (
ε
is an nonnegative integer). Actually,
with adding
ε
to
d
, we add
ε
new features to the
d
× 1 original
feature vector (
x
d
×1
x
(
d
+
ε
)×1
) where
d
+
ε
=
m
×
n
. We consider
the central moments of order two or more (
k
≥
2) as the added
new features. We choose the central moments as added new
features because the calculation of them is simple, fast and
also efficient from a classification accuracy point of view. The
k
th
central moment is defined as:
μ
k
=
E
[(
x
–
m
x
)
k
],
k
= 2, 3, …
(9)
where
m
x
is the mean of feature vector (
x
d
×1
). For better under-
standing of the proposed process, please see the example. For
instance, suppose that the number of spectral bands (features)
in a hyperspectral image is
d
= 200. For extraction of
m
= 6
features from data, first, we should add
ε
= 4 new features to
the original feature vector of each pixel of image to
d
+
ε
= 204
becomes divisible by 6. So, we should add 2,3,4, and 5 order
central moments to each feature vector. Then, feature vector
of each sample that becomes a 204 × 1 vector is transformed
into a 6 × 34 matrix, and then,
the scatter matrices are esti-
mated by transformed training samples.
The
S
w
–1
S
b
is a
n
×
n
matrix which contains
n
eigenvalues.
We can use just one eigenvector, which is associated to the
largest eigenvalue of
S
w
–1
S
b
, as the projection vector
p
. But, it
is better that we use the capability of all the eigenvectors for
calculation of
p
.
This improves the classification accuracy. In
other words, each eigenvector, proportional to the magnitude
of its eigenvalue, can contribute for calculation of
p
. Thus,
the projection vector is calculated in a weighted manner as
follows:
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
October 2015
779