When estimating the statistics from manual measurements,
it should be considered that not everyone is good at fusing a
stereo pair and few people are not even capable of perceiv-
ing depth difference from the stereo fusion. Therefore, the
outliers need to be identified and removed before evaluat-
ing statistics of the tie-point positions from a large group of
manual selections.
To identify outliers, we define a simple error function us-
ing a pre-computed disparity map
D
. For example, a selec-
tion error of a tie-point (
t
i
k
,
s
k
mi
), can be defined as the pixel
difference between the manual measurement and computed
disparity map for a point, i.e.,
e
D d
d D
i
k
mi
k
i
k
mi
k
i
k
i
k
t s
t s
t
t
,
:
,
( , ( ))
(
)
=
(
)
−
,
(1)
where
d
i
k
i
k
i
k
i
k
t s
s t
,
(
)
= −
and
D
i
k
i
k
t
s
( )
=
is a corresponding
point of
t
i
k
defined by a pre-computed disparity map
D
.
With this error metric (1), we can define an inlier set ˆ
S
i
k
containing all reliable right tie-points,
ˆ
,
,
, :
S
S
C e
D
C
i
k
mi
k
mi
k
i
k
mi
k
m
k
i
k
li
k
li
k
=
∈ ∈
(
)
< ∀ ∈
{ |
s s
s
t s
s
δ
m
k
}, (2)
where
δ
is an error threshold which is normally set to around
10 pixels, and
C
m
k
is a set of right tie-points collected by the
m
th
participant. Thus, an error bound of
t
i
k
(denoted by
b
i
k
in this paper) can be defined as
b
m
s
s m
i
k
i
k
i
k
i
k
i
mi
k
i
mi
k
i
k
S
=
=
−
(
)
∑
∑
| |ˆ
σ
1
2
.
(3)
As a general quality metric of a set of stereo measurements,
we can also define a total measurement error as
e T S D
MN
d D
tot
k k
i
M
j
N
i
k
ji
k
,
:
,
(
)
=
( )
(
)
∑∑
1
t s
,
(4)
where
S
k
represents all measurements, i.e.,
S S
k
i i
k
= ∪
.
Similarly, we can also define a measurement error of an inlier
set and an outlier set, i.e.,
e
in
(
T
k
,
S
k
:
D
) and
e
out
(
T
k
,
S
k
–
S
k
:
D
),
respectively.
Assessment Criteria
The proposed evaluation method basically assesses a dispar-
ity map in terms of matching score (M) and rewarding score
(R). A matching score is similar to the classic quality metric
used in stereo evaluation but the main difference is that our
method evaluates it based on a set of error bounds rather than
ground truth. The proposed method also introduced a reward-
ing score. The main purpose of this is to award more scores
when an algorithm can cope well with challenging matching
problem defined in the discontinuous point selection.
In order to compute matching score, we define a 2D Gauss-
ian function from an error bound. For example, a scoring
function for
s
i
k
(i.e. the right pixel position of
t
i
k
obtained
from an input disparity map for evaluation) is:
s b
s m
s m
i
k
i
k
i
k
i
k T xi
xi
i
k
i
exp
,
.
(
)
= −
−
(
)
−
−
0 5
0
0
2
2
1
σ
βσ
k
(
)
, (5)
where
b
i
k
is the error bound of
t
i
k
,
σ
xi
2
is the variance of the
x
values of the
i
th
tie-points in type (
k
) data set, and 0 <
β
< 1 .
This means that we give a higher matching score when an
input disparity is closer to the mean of inlier measurements.
If a stereo selection is not confident (i.e.,
σ
x
is high), then we
penalise less even if a tie-point is further away from the mean.
Another thing to note is that the covariance matrix in Equa-
tion 5 is defined by a horizontal standard variance only, i.e.
σ
xi
. This is because
σ
yi
of manual measurements is nil as we
rectify an input stereo pair for stereo measurement. However,
to allow a little variation in
y
direction as some algorithms do
refine vertical positions even if an input stereo pair is recti-
fied, we have used in our test. Please note that this weighting
value was selected empirically based on our ALSC refinement
results of the manual measurements.
A matching score of a set of right points from a disparity
map is then defined as a weighted sum of (5), i.e.,
M D B
L
w g
k i
T
i
i
k
i
k
k
,
( , )
| |
(
)
=
∑∑
1
s b
,
(6)
where
L T T T B
a
b
c k
= + +
| | | | | |,
, is a set of all error bounds,
D
is a disparity map for evaluation which defines
s
i
k
, and
,
)
,
,
x
x
σ
σ
w
i
xi
k
= −
1
2
0
σ
max(
, i.e., a higher weight is given
to a more confident measurement.
The proposed rewarding score is defined for the tie-points
at discontinuities (i.e., type (c)). As briefly explained earlier,
we have defined a pair of tie-points around object boundary.
Supposing that
P
i
is the
i
th
pair of the discontinuous tie-points
obtained around object boundary, we can define the
i
th
pair
P
i
i
c
i
c
i
c
i
c
=
(
) (
)
{
}
+ +
,
,
,
t s t
s
2 2 2 1 2 1
and an example of a pair can
be found in Figure 3. In this case, our rewarding function is
defined as an averaged sum of sigmoid function values, i.e.,
R D B P
P
d
d
i
P
i
c
i
c
i
c
i
c
, ,
| |
(
,
,
| |
(
)
=
−
(
)
−
(
)
=
+
+
∑
1
0
2 1 2
2 1 2
γ
t
t
s s
),
(7)
where
y
(
x
) is a sigmoid function,
2
1
+ −
( )
exp
x
, and
P
is a set
of all pairs of tie-points,
P
=
∪
i
P
i
. Thus, Equation 7 gives addi-
tional scores when a disparity map can give a similar estima-
tion to the average of manual measurements around a depth
discontinuity.
Finally, a total score (
TS
) is defined as a weighted sum of
the matching score and the rewarding score, i.e.,
TS
(
D,B,P
) = (1 –
α
)
M
(
D,B
) +
α
R
(
D,B,P
),
(8)
where 0 <
α
<1. The weighting coefficient in Equation 8 can
be set up differently depending on applications, e.g., a higher
weight (e.g. 0.5<
α
) could be given to put the matching score
ahead over the rewarding score of a disparity map.
Experiment Results
The evaluation work described in this paper is based on the
stereo matching results from
UCL
-
MSSL
,
NASA
-
JPL
, and the
Joanneum Research Institute (
JR
hereafter) with respect to
the datasets from the PRoVisG Mars 3D challenge and the
ExoMars PanCam test campaigns. The PRoVisG Mars 3D chal-
lenge 2011, aimed at testing and improving the state of the art
162
March 2018
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING