PE&RS December 2015 - page 921

The Multiple Comparison Problem
in Empirical Remote Sensing
Benjamin W. Heumann
Abstract
This paper seeks to draw attention to the multiple compari-
son problem (
MCP
) within the remote sensing community,
and suggest some easily implemented solutions. The use of
repeated statistical tests by remote sensing scientists to iden-
tify significant relationships, increases the chance identify-
ing false positives (i.e., type-I errors) as the number of tests
increases. This paper provides an introduction to the multiple
comparison problem (i.e., the impact of the interpretation of
p-values when repeated tests are made), outlines some simple
solutions, and provides two case studies to demonstrate the
potential impact of the problem in empirical remote sensing.
The first case study looks at multiple potential texture metrics
to predict leaf area index. The second case study examines
pixel-wise temporal trend detection. The results show how
applying solutions to the multiple comparison problem can
greatly impact the interpretation of statistical results.
Introduction
Remote sensing of the environment often relies on empirical
modeling to identify relationships between sensor measure-
ments of radiance or reflectance with patterns and processes
of interest on the ground. Empirical models offer several
advantages to process-based models in situations where the
processes are poorly understood or the information measured
by the sensor serves as a proxy of the actual phenomenon.
Empirical models are commonly used to select remote sens-
ing derived products, to predict ground-based observations
such as leaf area index, or to detect temporal trends in mul-
titemporal data. But such models rely on the interpretation
of significance, usually the p-value. When repeated tests are
used such as testing multiple spectral indices against ob-
served leaf area index (
LAI
) or temporal trends in each pixel of
a multi-temporal dataset this affects how we should interpret
the significance of these tests (McDonald, 2009). This is often
referred to as the multiple hypothesis or comparison problem,
or multiplicity. For this paper, multiple comparison was used
and considered to be interchangeable with these other terms
.
This paper seeks to draw attention to the multiple compari-
son problem as it relates to empirical remote sensing. This pa-
per first provides a definition and explanation of the multiple
comparison problem, followed by a description and discus-
sion of some possible and easy to implement solutions. Next,
two case studies are presented that exemplify the issue in the
remote sensing field. The first case study examines the impact
on the multiple comparison problems when conducting
exploratory analysis of empirical relationships between image
texture, derived from grey-level co-occurrence matrices (
GLCM
),
and
LAI
in a mangrove forest. The second case study shows the
effects of the multiple comparison problem on repeated tests
of significance when mapping temporal trends from a Normal-
ized Difference Vegetation Index (
NDVI
) time series.
Defining the Multiple Comparison Problem
Typically, we accept or reject the null hypothesis based on
whether the p-value is above or below the threshold of signifi-
cance,
α
, (i.e., chance of a type-I error or a false positive result).
By convention, the p-value threshold is often set at 0.10, 0.05,
or 0.01, depending on the desired tradeoff between type-I and
type-II errors. However, this assessment of significance is prob-
lematic when multiple tests are conducted against a single,
static dataset (e.g., time or field measurements). As the number
of tests increases, the chance of a false positive also increases.
Hypothetically, if 100 hypotheses were tested, there would still
be a 1 in 20 chance that any individual result with an
α
-value
of 0.05 is a false positive, under the null hypothesis. Therefore
it would be expected that five “significant” results would be
detected by chance alone out of 100 tests. In fact, the chance of
detecting at least one false positive result in 100 tests with an
α
-value of 0.05 is 99.4 percent. The multiple comparison prob-
lem occurs anytime there are repeated tests using the same test
or comparison data. Therefore, it is recommended that anytime
repeated tests occur, the significance of those tests should be
reinterpreted to address the multiple comparison problem
(McDonald, 2009). The specific method of reinterpretation var-
ies depending on the context and implications of the results
.
Related to the
MCP
is the problem of spurious correlation, es-
pecially when the statistical tests are being utilized for induc-
tive (i.e., bottom-up or data driven analysis) rather than deduc-
tive (i.e., top-down or theory-driven analysis) reasoning. If each
test being performed does not have a strong theoretical basis,
as positive results are found, there is concern that these results
may be spurious. For example, there is a theoretical basis for
the relationship between image texture and
LAI
(Song and
Dickinson, 2008). However, there is less of a theoretical basis
for a specific
GLCM
metric or other
GLCM
parameters. Therefore,
when combined with the multiple comparison problem, the
interpretation of what might seem like straight forward statisti-
cal tests becomes more complicated. For a more humorous
description of the multiple comparison problem and spurious
correlation, and its implications, see
.
In order to better address the multiple comparison problem,
the tradeoffs between the acceptance of false positive results
and rejecting true positive results must be considered. More
stringent significance tests do indeed reduce the false detection
rate, but they also increase false rejections. This tradeoff must
be balanced based on the particular needs of the research. As
Veazie (2006) describes, the implementation of models and
policy based on false positive results could result in poten-
tially costly errors. For example, remote sensing results are in-
creasingly being used to make management decisions, particu-
larly related to deforestation, desertification, the global carbon
cycle, and climate change (
UN-REDD
Programme, 2008). False
Department of Geography and Center for Geographic
Information Science, Central Michigan University, Dow
Science Complex Room 296, Mt. Pleasant, MI 48859
(
).
Photogrammetric Engineering & Remote Sensing
Vol. 81, No. 12, December 2015, pp. 921–926.
0099-1112/15/921–926
© 2015 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.81.12.921
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
December 2015
921
879...,911,912,913,914,915,916,917,918,919,920 922,923,924,925,926,927,928,929,930,931,...946
Powered by FlippingBook