PERS_April2018_Public - page 208

Rare Motions

The first case is the occurrence of unexpected motions. Such

motions do not belong to any typical activities. To detect such

abnormal events, in clip

a word set

′

in size

′

is defined as

the gathering of motions which are not labeled to any learned

activity. If

′

word

, it is confident that some abnormal mo-

tions exist during this clip.

Conflicting Activities

Second, some activities rarely co-occurred during a clip,

i.e., in a specific traffic state, some specific activities rarely

occurred. For example, in the state of rightward flow, there

should not be any vehicle driving leftward. To detect such

abnormal events, we use

regression to model the temporal

relationship among different typical activities during a clip.

As we have discussed in the Representation of Activities and

Video Clips Section, the feature vector of clip

is denoted

= {

, …,

}. The value of

has underlying relation-

ship with the others. In other words, each value of

can be

estimated according to the others in the same clip. Therefore,

for each element

regression model is constructed. We

denote

–

= {

, …,

} as the input feature vector of (

–1)

dimensions and

is the corresponding output value, where

–

means that

is excluded. A probabilistic prediction

about the output value

is given by trained

regression

model as:

–

(μ,

(32)

μ =

(K –

)

–1

(33)

(

)–

(K +

)

–1

(34)

where

(

–

) and

(

–

is the predicted

based on the other observed activities. If the observed value

is larger than µ+1.96

, this activity will be vied as conflict-

ing with the others in this clip. µ is the predicted mean value,

is its variance and (−

∞

,µ+1.96

) is the 97.5 percent confi-

dence interval. Notice that

less than µ-1.96

is not viewed

as conflict, because in practice an activity causes conflict

when its intensity is strong enough. Each activity is modeled

by one

regression model. Therefore, totally

regression

models are necessary.

Illegal State Transition

Finally, a state is followed by another which is forbidden

according to the specific traffic rule. Figure 11 shows an

example of an illegal state transition caused by an abnormal

event of a fire engine interrupting the current vertical traffic

flow and driving rightward. The scene is in vertical flow in

-1 clip and interrupted by fire engine in

clip. During

clip the fire engine is driving cross the scene. Therefore, the

+1 clip would be naturally classified as rightward flow with

high probability by

classifier and the result can be modi-

fied by Equation 31. However, no matter based on our human

understanding or the clip’s features, this recognition is cor-

rect. According to the learned state transition rule as shown

in Figure 7, a rightward flow only follows after the leftward

flow. Hence, such case should be determined as an abnormal

event. We define a logical judgment to identify such abnormal

events. If

(

–1

(

word

, it will be identified as

an illegal state transition, i.e., some abnormal events occur.

Abnormal Events Localization

Users are always interesting in the location of ongoing

abnormal events. As discussed in the Visual Features Repre-

sentation Section Each of visual words contains the position

information of its cell in the camera scene. Therefore, all

visual words belonging to detected abnormal events can be

localized.

We have discussed three kinds of abnormal events and the

methods to detect them, respectively. Identifying the abnor-

mal events caused by rare motions and illegal state transition

is logic based, which is easy to realize and convenient to

apply. hospedales

et al

[11]

and

[8]

identify the use

LDA

model

to estimate the likelihood by iterative sweeps of the Gibbs

sampler and detect abnormal events which has low posterior.

Different from the methods in

[11]

and

[8]

, for the abnormal

events caused by conflicting activities, we use

regression

to model the temporal relationship among activities during a

clip. It provides a probabilistic analysis of each activity with-

out complex computation.

Experiments

Dataset

Experiments were carried out in video data from three

complex and crowded traffic scenes regulated by the traffic

lights.

QMUL

Junction Dataset

contains 1 hour of 25 fps video

(90,000 frames) with frame size 360 × 288. The video covers a

busy traffic junction containing three major flows in different

directions.

QMUL

Junction Dataset 2

has a video length of 52

minutes with 25 fps (78,000 frames). The frame size is 360 ×

288. This video is captured in a busy street with particularly

busy pedestrian activity.

MIT Dataset

[9]

consists of 1.5 hours

of 30 fps (162,000 frames) with frame size 720 × 480, and

captures a far-field traffic scene.

For each dataset, the first 500 video clips (about 25 minute’s

length) were employed to learn the typical activities and traf-

fic states. The rest of the video sequences were employed to

simulate online screened video to test online performance, i.e.,

699 clips of

QMUL

Junction Dataset, 539 clips of

QMUL

Junction

Dataset 2 and 1711 clips of MIT Dataset were used for test.

The

ARD

kernel was adopted in

models and the hyper-

parameters were optimized by

Conjugate Gradient

[27]

. The

Laplace’s

approximation method

[24]

was applied in

classifi-

cation models.

To infer the latent variables under the

HDP

and

HDP

HMM

1000 sweeps of the Gibbs sampler were executed and the first

500 were used as a burn-in. To find the best hyper-parame-

ters (

) for our task, a grid search has been performed on

∈

{0.1,0.5,1.0,1.52.0}. We analyzed the results with different

We got a interesting and useful outcomes; even though the

number of clusters increased with larger

and

, the numbers

of typical activities and states always converged when about

least 90% of the total motions were explained. These num-

bers kept consistent when

and

were both larger than 0.5.

The selected typical activities and states look similar. The ad-

ditional activities and were generated to explain very rare mo-

tions. In this thesis, we are only interested in typical activities

and states, and we did not use topic models to estimate likeli-

hood or posterior. Therefore, we did not need precise hyper-

parameters for the generative models. The hyper-parameters

were fixed at

=2,

=0.5 for all experiments. In actual imple-

mentation of

HDP

and

HDP

HMM

, the hyper-parameters can be

optimized by giving a vague gamma prior and sampling them

using the scheme proposed in

[20]

In Figure 6a to 6p) Some dominant activities and their per-

centages discovered by

HDP

models. Figure 6q has manually

labeled legal vehicles driving lanes (red lines) and pedestrians

walking lanes (yellow dash lines)

Learning Typical Activities and States

In the

QMUL

Junction Dataset, the

HDP

models automatically

learned 32 activities in this traffic scene, among which 22

were selected as typical activities (some of them are shown in

Figure 6). Their corresponding percentage computed by Equa-

tion13 are noted beneath. For a better illustration, all pos-

sible motion flows for vehicles and pedestrians are manually

208

April 2018

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

SEO version

Warning.

You are currently viewing the SEO version of PERS_April2018_Public.
It has a number of design and functionality limitations.

We recommend viewing the basic HTML version or installing the Adobe Flash Player.

167...,198,199,200,201,202,203,204,205,206,207 209,210,211,212,213,214,215,216,217,218,...230