often use a simple or advanced search algorithm for finding
the best possible subset of spectral bands (Ladha and Deepa,
2011; Yin
et al.
, 2012; Xie
et al.
, 2013). Feature extraction
methods usually use a projection matrix to transform the
feature space of data (Imani and Ghassemian, 2014 and 2015;
Huang and Kuo, 2010; Kamandar and Ghassemian, 2013). The
feature extraction methods can be done in one of following
approaches: supervised ones (Zhang
et al.
, 2013; Chang
et al.
,
2014; Wen
et al.
, 2013), unsupervised ones (Yin
et al.
, 2012;
Yin
et al.
, 2013; Ghassemian and Landgrebe, 1988), and semi-
supervised ones (Liao
et al.
, 2013). Our focus in this paper is
on supervised feature extraction methods.
The best known supervised feature extraction method is
linear discriminant analysis (
LDA
) (Fukunaga, 1990). The
LDA
approach maximizes the between-class scatter matrix and
minimizes the within-class scatter matrix. Thus, separabil-
ity between different classes is expected to be enhanced.
But,
LDA
has some difficulties. Because of singularity in
within-class scatter matrix,
LDA
fails to work in
SSS
situation.
Moreover, the number of extracted features in
LDA
is limited
to the number of classes. In other words,
LDA
can extract a
maximum of
n
c
– 1 features where
n
c
is the number of classes.
Generalized discriminant analysis (
GDA
) uses the kernel trick
to improve the efficiency of
LDA
(Baudat and Anouar, 2000).
The nonparametric weighted feature extraction (
NWFE
) uses
the weighted mean.
NWFE
defines the new nonparametric scat-
ter matrices to provide more classification accuracy (Kuo and
Landgrebe, 2004). The number of extracted features by
NWFE
can be more than
n
c
– 1.
Feature extraction is used in many pattern recognition
problems. One important application of feature extrac-
tion methods such as
LDA
is in face recognition. In the face
recognition problem, each sample of data is an image matrix
while
LDA
works based on the vectors. Thus, for using the
one dimensional
LDA
(
1DLDA
), or for simplicity
LDA
, at first,
we need to transform 2D face image matrices into 1D image
vectors, row by row or column by column. For example, an
m
×
n
image matrix is transformed into a
mn
× 1 vector. As a
result, the dimension of vector becomes high in comparison
with the number of training samples. In other words, we have
high dimension data and
SSS
problem. Because of singularity
of within-class scatter matrix due to the
SSS
problem,
LDA
usu-
ally fails to work in face recognition applications. So far, the
variety of approaches has been proposed for solving this prob-
lem. An interesting technique is two dimensional
LDA
(
2DLDA
)
(Li and Yuan, 2005; Noushath
et al.
, 2006). In the
2DLDA
method, for face recognition applications, the data samples
remain in the matrix form. In other words, the data do not lie
in a high dimensional space, and thus, the within-class scatter
matrix usually becomes nonsingular (Yang and Dai, 2009)
It is important to note that all the
2DLDA
methods, which
have been proposed so far, are for feature extraction in face
recognition applications. In this paper, we propose to use
2DLDA
for feature extraction of hyperspectral data in the
SSS
situation. There is a basic difference between face image data
and hyperspectral image data. The samples of face data are
two dimensional images in matrix form while the samples of
hyperspectral data are pixels in vector form (each sample in
face database is an image while a sample in a hyperspectral
data is a pixel not an image). In the proposed
2DLDA
approach,
at first the feature vector of each pixel of hyperspectral image
is transformed into a feature matrix, and then, scatter matrices
are estimated by transformed training samples. Because the
2DLDA
methods proposed in the literatures are for face data-
bases whose samples are originally two-dimensional images
in matrix form, they are not comparable with our proposed
method that is for hyperspectral datasets whose samples are
originally pixels in vector form.
We deal with the
SSS
problem for feature extraction and
classification of hyperspectral data. Due to limited training
samples, the within-class scatter matrix becomes singular and
thus
LDA
fails to work for hyperspectral images. To cope with
the high dimensional data and the
SSS
problem, we propose
to transform the feature vector of each pixel of image into a
feature matrix. As a result, the produced feature matrices lie
in a lower dimension space. Therefore, the
SSS
problem is
degraded. In other words, the
d
×1 feature vector of each pixel
of hyperspectral image is transformed into a
m
×
n
feature
matrix where
d
=
m
×
n
, and
m,n
<
d
. After transformation, the
original feature vectors to the feature matrices, within-class
and between-class scatter matrices are estimated for feature
extraction of obtained feature matrices.
The rest of this discussion is organized as follows:
1DLDA
is represented in the next section. Then, the proposed
2DLDA
method is introduced. After that, the classification results of
proposed method based on four real hyperspectral datasets
comparing to some popular feature extraction methods are
discussed. Finally, conclusions are represented.
One Dimensional Linear Discriminant Analysis (1DLDA)
Consider a dataset with
N
training samples {
x
i
}
N
i
=1
in
R
d
where
d
is the number of spectral bands (features). Let
n
ti
be the
number of training samples of
i
th class and
i
n
c
=
∑
1
n
ti
=
N
, where
n
c
is the number of classes
1DLDA
(for simplicity,
LDA
) seeks projection directions on
which the ratio of the between-class scatter to the within-class
scatter is maximized. The between-class scatter matrix (
S
b
)
and within-class scatter matrix (
S
w
) in
LDA
are defined as:
S
b
=
i
n
c
=
∑
1
n
ti
(
m
i
–
m
)(
m
i
–
m
)
T
(1)
S
w
i
n
j
n
ji
i
ji
i
T
c
ti
=
−
(
)
−
(
)
= =
∑ ∑
1
1
x m x m
(2)
where
m
i
is the mean of
i
th
class,
m
is the mean of entire train-
ing samples, and
x
ji
is the
j
th
training sample in the
i
th
class.
LDA
maps every original data
x
i
∈
R
d
to
y
i
∈
R
m
with a linear
transformation,
y
i
=
W
T
x
i
, where
W
d
×
m
is the projection matrix
and
m<d
.
w
is optimized as follows where
w
denotes one of
the vectors in the projection matrix
W
= (
w
1
,
w
2
, …,
w
m
):
w
= arg max
w
w S w
w S w
T
b
T
w
.
(3)
The optimal linear projection matrix
W
which projects
the samples into a
m
-dimensional feature space is composed
of eigenvectors of
S
w
–1
S
b
corresponding to its first
m
largest
eigenvalues.
2DLDA for Feature Extraction of Hyperspectral Images
Each sample of hyperspectral data is a
d
×1 feature vector. In
the proposed method, the first step is to transform the feature
vector of each pixel of image into a feature matrix. In other
words, each
d
×1 feature vector (
x
d
×1
) is transformed into a
m
×
n
feature matrix (
A
m
×
n
) where
d
=
m
×
n
, and
m,n
<
d
. We
do this transformation for all data samples. In other words,
(
x
ji
)
d
×1
is transformed to (
A
ji
)
m
×
n
(
j
= 1, …,
n
ti
;
i
= 1, …,
n
c
) . For
example, if
d
= 12, and
m
= 3, we will have:
778
October 2015
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING