PE&RS December 2015

positive results that over-estimate the extent of the problem

may lead to a misallocation of limited funds and resources

.

The multiple comparison problem is a long-standing topic

within the field of statistics (Dunnett, 1955; Dunn 1961) and

remains an area of active research (McDonald, 2009). As an

example of the widespread acceptance of this issue in other

fields, the foundational paper by Benjamini and Hochberg

(1995) on controlling the false discovery rate (

FDR

), a popu-

lar method to account for the multiple comparison problem,

has been cited over 17,000 times (Web of Knowledge, URL:

http:webofknowledge.com

). The Benjamini and Hochberg (1995)

paper is often cited in the fields such as biochemistry / molecu-

lar biology (2,947 papers), genetics / heredity (2,518 papers), and

biotechnology / applied microbiology (1,653 papers), as well as

fields closely related to remote sensing such as environmental

science / ecology (994 papers) and plant sciences (801 papers)

.

By comparison, remote sensing papers rarely explicitly con-

sider the multiple comparison problem. For example, in a sys-

tematic search of a wide range of disciplines in the natural and

physical sciences related to the remote sensing of the Earth’s

surface using the Web of Knowledge, only seven remote sens-

ing papers were found that cited the Benjamini and Hochberg

(1995). Undoubtedly, other remote sensing investigations may

have considered the

MCP

in their statistical analyses without

reference to Benjamini and Hochberg, but the difference of at

least two orders of magnitude in the number of times that paper

had been cited in remote sensing compared to such wide rang-

ing other disciplines indicates that the multiple comparison

problem is largely underappreciated within the field of remote

sensing. To illustrate the wide range of remote sensing appli-

cations in which the multiple comparison problem applies,

papers that explicitly apply

FDR

include comparisons of mul-

tiple land classifiers (Brenning, 2009; Xu

et al

., 2014), detecting

pixel-based temporal trends across many pixels (Brown

et al

.

2012; Wessels

et al

., 2012), and comparing multiple hyperspec-

tral indices to canopy structure (Pena

et al

., 2012). Perhaps con-

tributing to the obscurity of the multiple comparison problem

is that its discussion is a methodological footnote rather than

the focus of any given paper. This paper hopes to help illumi-

nate the

MCP

issue within the remote sensing community.

Simple Solutions

Since the tradeoffs between false positive and false negative

results depend on the context of the research, no single solution

for the multiple comparison problem exists. Other papers in

other fields have discussed this issue in more detail (e.g., Cur-

ran-Everett, 2000; Gelman, 2012; McDonald, 2009). Below, four

common solutions found in other fields that can be easily imple-

mented in remote sensing analyses are presented: Bonferroni

Correction, False Discovery Rate, Independent Validation, and

Improved Interpretation. More complicated solutions including

multi-level or Bayesian models may also be implemented, but

are not discussed here. It should be noted that these solutions

target multiple comparisons when testing for significant rela-

tionships and not difference between population distributions

which can be solved using Tukey’s multiple comparisons

.

The Bonferroni Correction is one of the oldest and simplest

solutions. It adjusts the

α

-value of significance by dividing

the

α

-value by the number of hypotheses tested (Bonferroni,

1936). For example, if a test wishes to consider ten alternate

hypotheses at an

α

-value of 0.05, the corrected significance

threshold is 0.005 (

α

/10). The Bonferroni Correction is often

considered the most traditional and conservative correction

because it controls for the likelihood of a type-1 error oc-

curring in any of the multiple tests conducted. In situations

where a large number of hypotheses are tested, and especially

if sample sizes are small, the adjusted

α

-value can and often

does exclude all hypotheses tested

.

The False Discovery Rate is an alternative approach pro-

posed by Benjamini and Hochberg (1995) based on control-

ling the rate of false discovery or proportion of false positive

results within a group of tests (i.e., family-wise error). Using

the rank of p-values to help interpret significance among mul-

tiple comparisons,

FDR

uses the following equation:

q

–

value

= (

i

/

m

)

*

q

(1)

where

i

is the rank of the test based on sorted p-value from

smallest to greatest,

m

is the total number of tests, and

q

is

the maximum desired false discovery rate for all tests. The

p-value for each test is then compared to the

q

-value and if

the p-value is less than the

q

-value, the test is significant. For

easier interpretation, many software packages include ad-

justed p-values. Furthermore,

FDR

is adaptable to the number

of tests as the threshold for significance is the rate at which a

false discovery is expected (i.e., 5 percent). For a review on

the false discovery rate and other techniques for correcting for

multiple comparisons, see Groppe

et al

. (2011)

.

Another approach, and one with a long tradition in remote

sensing, is the use of independent training and validation da-

tasets (Congalton, 1991). While the use of validation datasets

is less common for trend detection, this approach can help

verify the validity of an empirical relationship derived from

multiple testing. Since the test of the empirical relationship

with the validation dataset is a new, single test, no p-value

adjustment is required. Furthermore, if the empirical rela-

tionship is used for prediction, then the uncertainty in the

prediction can also be assessed. One of the drawbacks of the

validation approach is the need for independent data, which

is often unattainable. For example, the detection of temporal

trends for each pixel over large areas is impossible to validate

.

The occurrence of the multiple comparison problem does

not automatically necessitate a quantitative solution. An

alternative approach to the multiple comparison problem is to

simply change the interpretation of the p-value (Gelman

et al

.,

2012), rather than trying to adjust the p-value so that it fits the

classic interpretation. For example, if 1,000 tests are conducted

and 500 are found to be significant, but only 50 would be

expected from random datasets, then it has been demonstrated

that the number of significant relationships is much greater

than the number expected by chance. In a situation where each

test detects a temporal trend in each pixel of a time-series of

images, the results would clearly indicate overall significant re-

sults (i.e., the number of significant results far exceeds that by

chance alone), although the number of pixels with significant

trends is likely over-estimated. However, since it would be

expected that 5 or 10 percent of the “significant” results could

be false positive results, this uncertainty should be considered

when mapping the results. Since it is difficult to ascertain

which pixels are false positive, interpretation of trends should

focus on aggregate results rather than local patterns. Further-

more, interpretation of the total area with a significant trend

that may be a false positive result should be discussed. In some

situations, the inclusion of false positive results may be useful.

For example, including potentially false positive results in a

pilot study help indicate areas of future research, as long as

those results are not used for generalization. Interpretation of

the results should clearly describe this situation if applicable.

First Case Study: Estimating Leaf Area Index of Mangroves

using Image Texture

Context

This case study demonstrates the impact of the multiple

comparison problem when trying to repeatedly test different

remote sensing products against ground measurements, in

this case image texture and leaf area index. The amount of

922

December 2015

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

PE&RS December 2015 - page 922

Warning.