PERS_April2018_Public - page 205

it is equivalent to infer topics in word-document analysis.

Moreover, a traffic state is a combination of frequently co-

occurring activities (i.e. interactions). This makes it possible

to infer traffic states using topic model, too.

The

HDP

[20]

is an unsupervised non-parametric hierarchical

Bayesian topic model and was originally proposed for word-

document analysis. It clusters the frequently co-occurring

words within the same documents into the same topics.

Furthermore, different from the other clustering topic models,

such as

LDA

[21]

HDP

is able to automatically determine the

number of clusters. The rest of this section will show how to

use

HDP

model to infer typical activities and traffic states from

the input video. Based on the output of

HDP

models, we pro-

pose a method to construct feature vectors to represent activi-

ties with visual words and traffic states with typical activities.

Afterward these will be used to train classifier to recognize

complicated traffic activities in surveillance video.

Learning Activities Using HDP

Figure 2 is a graphical representation of

HDP

model; it consists

of two Dirichlet Processes. The first one is used to generate a

global set of activities and the second one is used to sample

a subset of activities from the global set for a clip. Finally,

visual words are drawn from activities.

The possible activities are inferred by

HDP

whose standard

graphical representation is shown in Figure 2

[20]

. The global

random measure

…

∞

} is a global list of activities that is

shared by all clips. Its distribution is a Dirichlet Process (

)

with concentration parameter

and Dirichlet prior

HDP

(

)

(1)

can be expressed using the stick-breaking formulation

[20]

∞

∑

π δ

(2)

(3)

π π

k k

= ′

− ′

(

)

−

∏

(4)

′

Beta

(1,

)

(5)

where {

}

∞

=1 are parameters of multinomial distributions

over words in the codebook corresponding to activity

, i.e.,

the word probability vector and the sum of its entries equals

is the Delta function at point

. {

} are random prob-

ability measures (mixtures over topics) and

∑

∞

= 1. For

convenience, the random probability measure of

defined

from Equations 1 to 5 is abbreviated with

GEM

(

), where

GEM

stands for Griffiths-Engen-McCloskey distribution

[22]

The multinomial distribution

over words in the codebook

is generated from

. Therefore,

is interpreted as a distribu-

tion over multinomial distributions and thus can be defined

as a Dirichlet distribution:

Dir

(

)

(6)

Dir

(

)

(7)

is the prior distribution for the second

. For each clip

is a random measure which is drawn from the second

with concentration parameter

and Dirichlet prior

(

)

(8)

In our case describes the multinomial distribution of ac-

tive topics in clip

, i.e., it is a subset of the global activities .

We express it using the stick-breaking representation again:

π δ

∞

∑

(9)

(10)

π π

tk tk

= ′

− ′

(

)

−

∏

(11)

′

Beta

(1,

)

(12)

For the

word in document

, a topic

is first drawn from

and then the word

is drawn from multi-nominal distribution

Multi

(

;

) (i.e. the multi-nominal distribution over words in

codebook corresponding to topic

). We notice that, different

has the same

, i.e., different clips share the same set

of topics and statistical strength. We apply Gibbs sampling

schemes to do inference under an

HDP

model, which is a gener-

ally applied method in topic model. Figure 6 shows the learned

typical activities by

HDP

models for

QMUL

Junction Dataset

[8]

The hyper-parameters

and

are empirically predefined.

They are priors on the concentration of the word distribution

within topics. They influence the number of activities in

and

. The parameter

for the Dirichlet distribution is also

set empirically.

Although

HDP

models decide the number of topics automat-

ically, some of the explored activities are unrepresentative.

Because some very rare motion need to be explained by an

individual activity. They could be noise or rare events. Such

learned activities could lead to ambiguous or even mislead-

ing analysis of interactions. Therefore, the unrepresentative

activities need to be removed. The total number of words that

are assigned to activity

is noticed as

throughout the train-

ing video. The occurrence ratio of activity

is computed as

+ +



(13)

We rank {

,…,

} in decreasing order as {

′

≥

,…,

′

} and calcu-

late the accumulated sum as

′ = ′

∑

R r

(14)

The representative activity (topic) set is selected as

{

}

j j

j k

′ ≤

≤ ≤

0 99 1 .

typical

(15)

Learning States Using HDP-HMM

A busy traffic junction is normally regulated by traffic lights:

different traffic states occur sequentially and circulatory in

a certain order. Hidden Markov model (

HMM

)

[23]

is an effi-

cient method to explore the latent states and their transition

information.

HMM

can be explained as a doubly stochastic

Markov chain and is essentially a dynamic variant of a finite

mixture model. Teh

et al

[20]

replaced the finite mixture with a

Dirichlet process and proposed the

HDP

HMM

model which is

illustrated in Figure 3. Its stick-breaking formalism is:

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

April 2018

205

SEO version

Warning.

You are currently viewing the SEO version of PERS_April2018_Public.
It has a number of design and functionality limitations.

We recommend viewing the basic HTML version or installing the Adobe Flash Player.

167...,195,196,197,198,199,200,201,202,203,204 206,207,208,209,210,211,212,213,214,215,...230