PERS_April2018_Public - page 205

it is equivalent to infer topics in word-document analysis.
Moreover, a traffic state is a combination of frequently co-
occurring activities (i.e. interactions). This makes it possible
to infer traffic states using topic model, too.
The
HDP
[20]
is an unsupervised non-parametric hierarchical
Bayesian topic model and was originally proposed for word-
document analysis. It clusters the frequently co-occurring
words within the same documents into the same topics.
Furthermore, different from the other clustering topic models,
such as
LDA
[21]
,
HDP
is able to automatically determine the
number of clusters. The rest of this section will show how to
use
HDP
model to infer typical activities and traffic states from
the input video. Based on the output of
HDP
models, we pro-
pose a method to construct feature vectors to represent activi-
ties with visual words and traffic states with typical activities.
Afterward these will be used to train classifier to recognize
complicated traffic activities in surveillance video.
Learning Activities Using HDP
Figure 2 is a graphical representation of
HDP
model; it consists
of two Dirichlet Processes. The first one is used to generate a
global set of activities and the second one is used to sample
a subset of activities from the global set for a clip. Finally,
visual words are drawn from activities.
The possible activities are inferred by
HDP
whose standard
graphical representation is shown in Figure 2
[20]
. The global
random measure
G
0
={
θ
1
θ
} is a global list of activities that is
shared by all clips. Its distribution is a Dirichlet Process (
DP
)
with concentration parameter
λ
and Dirichlet prior
H
:
G
0
|
γ
,
HDP
(
γ
,
H
)
(1)
G
0
can be expressed using the stick-breaking formulation
[20]
:
G
k
k
k
0
0
1
=
=
π δ
ϕ
(2)
ϕ
k
|
γ
~
H
,
(3)
π π
π
k k
l
l
k
= ′
− ′
(
)
=
1
1
1
(4)
π
k
~
Beta
(1,
λ
)
(5)
where {
ϕ
k
}
k
=1 are parameters of multinomial distributions
over words in the codebook corresponding to activity
θ
k
, i.e.,
the word probability vector and the sum of its entries equals
1.
θ
ϕ
k
is the Delta function at point
ϕ
k
. {
π
k
} are random prob-
ability measures (mixtures over topics) and
k
=1
π
k
= 1. For
convenience, the random probability measure of
π
defined
from Equations 1 to 5 is abbreviated with
π
k
~
GEM
(
γ
), where
GEM
stands for Griffiths-Engen-McCloskey distribution
[22]
.
The multinomial distribution
ϕ
k
over words in the codebook
is generated from
H
. Therefore,
H
is interpreted as a distribu-
tion over multinomial distributions and thus can be defined
as a Dirichlet distribution:
H
=
Dir
(
D
0
)
(6)
ϕ
k
|
γ
~
Dir
(
D
0
)
(7)
G
0
is the prior distribution for the second
DP
. For each clip
t
,
G
t
is a random measure which is drawn from the second
DP
with concentration parameter
α
and Dirichlet prior
G
0
:
G
t
|
α
,
G
0
~
DP
(
α
,
G
0
)
(8)
In our case describes the multinomial distribution of ac-
tive topics in clip
t
, i.e., it is a subset of the global activities .
We express it using the stick-breaking representation again:
G
t
tk
k
k
π δ
φ
=
1
(9)
ϕ
k
|
α
,
G
0
~
G
0
,
(10)
π π
π
tk tk
tl
l
k
= ′
− ′
(
)
=
1
1
1
(11)
π
tk
~
Beta
(1,
α
)
(12)
For the
i
th
word in document
t
, a topic
θ
ti
is first drawn from
G
t
and then the word
x
ti
is drawn from multi-nominal distribution
Multi
(
x
ti
;
ϕ
θ
u
) (i.e. the multi-nominal distribution over words in
codebook corresponding to topic
θ
u
). We notice that, different
G
t
has the same
ϕ
k
as
G
0
, i.e., different clips share the same set
of topics and statistical strength. We apply Gibbs sampling
schemes to do inference under an
HDP
model, which is a gener-
ally applied method in topic model. Figure 6 shows the learned
typical activities by
HDP
models for
QMUL
Junction Dataset
[8]
.
The hyper-parameters
γ
and
α
are empirically predefined.
They are priors on the concentration of the word distribution
within topics. They influence the number of activities in
G
0
and
G
t
. The parameter
D
0
for the Dirichlet distribution is also
set empirically.
Although
HDP
models decide the number of topics automat-
ically, some of the explored activities are unrepresentative.
Because some very rare motion need to be explained by an
individual activity. They could be noise or rare events. Such
learned activities could lead to ambiguous or even mislead-
ing analysis of interactions. Therefore, the unrepresentative
activities need to be removed. The total number of words that
are assigned to activity
k
is noticed as
n
k
throughout the train-
ing video. The occurrence ratio of activity
k
is computed as
r
n
n
n
k
k
k
=
+ +
1
(13)
We rank {
r
1
,…,
r
k
} in decreasing order as {
r
1
,…,
r
k
} and calcu-
late the accumulated sum as
′ = ′
=
R r
j
i
i
j
1
(14)
The representative activity (topic) set is selected as
{
}
j j
R
j k
′ ≤
≤ ≤
0 99 1 .
,
=
θ
typical
θ
(15)
Learning States Using HDP-HMM
A busy traffic junction is normally regulated by traffic lights:
different traffic states occur sequentially and circulatory in
a certain order. Hidden Markov model (
HMM
)
[23]
is an effi-
cient method to explore the latent states and their transition
information.
HMM
can be explained as a doubly stochastic
Markov chain and is essentially a dynamic variant of a finite
mixture model. Teh
et al
.
[20]
replaced the finite mixture with a
Dirichlet process and proposed the
HDP
-
HMM
model which is
illustrated in Figure 3. Its stick-breaking formalism is:
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
April 2018
205
167...,195,196,197,198,199,200,201,202,203,204 206,207,208,209,210,211,212,213,214,215,...230
Powered by FlippingBook