PE&RS October 2015 - page 785

The experimental results show the better performance of
2DLDA
compared to other feature extraction methods in the
SSS
situation. Each sample (pixel) of hyperspectral image is
naturally a vector and not a matrix. To shorten the length of
each feature vector, we transform it into a feature matrix and
so we manipulate
the natural arrangement of spectral features.
However, the use of
2DLDA
degrades the
SSS
problem and
is an efficient method when the number of available train-
ing samples is limited. When enough training samples are
available, it seems
that
LDA
is preferred than
2DLDA
. For better
understanding, we represent the experimental results: the
average classification accuracy of
2DLDA
, on average for 1 to 9
extracted features, is 31.55 percent more than
LDA
for the In-
dian dataset using the
SVM
classifier with 16 training samples.
Under the same conditions with 32 training samples, the im-
provement of
2DLDA
compared to
LDA
was only 1.67 percent.
For the Pavia dataset, using the
SVM
classifier and 16 training
samples, on average for 1 to 8 extracted features, the average
accuracy obtained by
2DLDA
is 22.75percent more than
LDA
,
while under the same conditions with 32 training samples,
2DLDA
is 6.87 percent better than
LDA
. We can see that with
increasing the number of training samples, the difference
between the
2DLDA
and
LDA
methods becomes less. When the
size of training set is large, it may that
LDA
works better than
2DLDA
. In this regard, we can state three reasons:
1. The low accuracy of
LDA
in small training sample size
is because of singularity of within-class scatter matrix.
When a large number of training samples is available,
not only the within-class scatter matrix becomes non-
singular but also, both the within-class scatter matrix
and between-class scatter matrix are estimated with
a high accuracy. Thus,
LDA
has good efficiency with a
large number of training samples.
2. With the transformation of feature vector of each pixel
of hyperspectral image into a feature matrix in the
2DLDA
method, we manipulate the natural arrangement
of spectral features. This manipulation is necessary to
shorten the length of feature vector and to cope with
the singularity of the within-class scatter matrix in the
SSS
situation. But, when enough training samples are
available, this manipulation on the nature of data may
degrade the classification accuracy.
3. In the
2DLDA
method, for extraction of
m
features, we
may add
ε
new features to the
d
× 1 original feature
vector to
d
+
ε
becomes composite and divisible by
m
.
We consider the central moments of order two or more
(
k
2) as the added new features.
ε
added new features
are redundant because they were obtained from
d
origi-
nal features. The use of these redundant features may
degrade the classification accuracy. However, adding
new redundant features for implementation of
2DLDA
in
the
SSS
situation may be necessary. So, when there are
enough training samples,
LDA
may provide better clas-
sification accuracy than
2DLDA
.
As we see from the obtained results,
2DLDA
using the
SVM
classifier (
2DLDA
+
SVM
) gives better classification accuracy
than
2DLDA
+
ML
. In other words,
2DLDA
that is an appropriate
feature extraction method in the
SSS
situation is more compat-
ible with nonparametric classifiers such as
SVM
that have less
sensitivity to the number of training samples.
We also compared the computation time of feature ex-
traction methods. For example, for the Indian dataset, using
16 training samples and 6 extracted features, computation
times in feature extraction processes for different meth-
ods are obtained as follows:
2DLDA
: 9.43 seconds,
LDA
: 0.69
seconds,
NWFE
: 90.81 seconds, and
GDA
: 0.73 seconds. The
2DLDA
method is slower than
LDA
and
GDA
because it needs to
transform the feature vector of samples to the feature matrices
and then calculate the scatter matrices using the matrix form
of samples. Nevertheless, the computation time of
2DLDA
is
under 10 seconds and also
2DLDA
is about 9.63 times faster
than
NWFE
.
NWFE
is the lowest method, because it needs to
calculate the weighted mean and the weight of each sample in
each class of data for calculation of scatter matrices.
Conclusions
Feature extraction methods play an important role in classifi-
cation and analysis of high dimensional data such as hyper-
spectral images. The collection of labeled samples is generally
expensive, difficult and time consuming and therefore, the
number of available training samples is limited.
LDA
, which is
the most known supervised feature extraction method, fails to
work with small training sample size because of the singular-
ity of the within-class scatter matrix. To cope with the high
dimensions and small sample size problem in hyperspectral
images, we proposed to use the
2DLDA
method for feature
extraction. In the proposed method, feature vector of each
pixel of image is transformed into a feature matrix. Thus,
the dimensions of scatter matrices estimated by transformed
samples are reduced. Experiments with the Indian, University
of Pavia,
KSC
, and Botswana datasets are conducted to evalu-
ate our feature extraction method in terms of classification
accuracy and computation time. The singularity problem in
the
LDA
method is solved by the
2DLDA
method. Moreover,
in contrast to
LDA
and
GDA
, the
2DLDA
method has no limita-
tion in the number of extracted features and can extract each
arbitrary number of features. The experimental results show
that
2DLDA
has better performance than some popular feature
extraction methods such as
LDA
,
GDA
, and
NWFE
in
SSS
situ-
ation. The experiments represent that
2DLDA
+
SVM
has more
efficiency than
2DLDA
+
ML
. Because of use the matrix form of
samples instead of the vector form of them,
2DLDA
is slower
than
LDA
and
GDA
. Nevertheless, it is much faster than
NWFE
.
2DLDA
is superior to other feature extraction methods in the
SSS
situation. But, with increasing the number of used train-
ing samples, the performance of
LDA
becomes close to
2DLDA
and for large training sample size, it is possible that
LDA
works better than
2DLDA
.
References
Baudat, G., and F. Anouar, 2000. Generalized discriminant analysis
using a kernel approach,
Neural Computation
, 12(10):2385–
2404.
Chang, C., and C. Linin, 2008. LIBSVM - A Library for Support Vector
Machines, URL:
(last
date accessed: 24 August 2015).
Chang, Y.-L., J.-N. Liu, C.-C. Han, and Y.-N. Chen, 2014. Hyper-
spectral image classification using nearest feature line
embedding approach,
IEEE Transactions on Geoscience and
Remote Sensing
, 52(1):278–287.
Cohen, J., 1960. A coefficient of agreement from nominal scales,
Educational and Psychological Measurement
, 20(1):37–46.
Congalton, R.G., R.G. Oderwald, and R.A. Mead, 1983. Assessing
Landsat classification accuracy using discrete multivariate
statistical techniques,
Photogrammetric Engineering & Remote
Sensing
, 49(12):1671–1678.
Foody, G.M., 2004. Thematic map comparison: Evaluating the
statistical significance of differences in classification accuracy,
Photogrammetric Engineering & Remote Sensing
, 70(5):627–633.
Fukunaga, K., 1990.
Introduction to Statistical Pattern Recognition
,
Academic Press Professional, Inc. San Diego, California.
Ghassemian, H., and D.A. Landgrebe, 1988. Object-oriented feature
extraction method for image data compaction,
IEEE Control
Systems Magazine
, 8(3):42-48.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
October 2015
785
751...,775,776,777,778,779,780,781,782,783,784 786,787,788,789,790,791,792,793,794,795,...822
Powered by FlippingBook