Rare Motions
The first case is the occurrence of unexpected motions. Such
motions do not belong to any typical activities. To detect such
abnormal events, in clip
t
a word set
x
′
t
in size
N
′
t
is defined as
the gathering of motions which are not labeled to any learned
activity. If
N
′
t
>
th
word
, it is confident that some abnormal mo-
tions exist during this clip.
Conflicting Activities
Second, some activities rarely co-occurred during a clip,
i.e., in a specific traffic state, some specific activities rarely
occurred. For example, in the state of rightward flow, there
should not be any vehicle driving leftward. To detect such
abnormal events, we use
GP
regression to model the temporal
relationship among different typical activities during a clip.
As we have discussed in the Representation of Activities and
Video Clips Section, the feature vector of clip
t
is denoted
as
c
t
= {
p
t
1
, …,
p
tk
}. The value of
p
ti
has underlying relation-
ship with the others. In other words, each value of
c
t
can be
estimated according to the others in the same clip. Therefore,
for each element
p
ti
a
GP
regression model is constructed. We
denote
c
t
–
p
ti
= {
p
t
1
, …,
p
tk
} as the input feature vector of (
K
–1)
dimensions and
p
ti
is the corresponding output value, where
c
t
–
p
ti
means that
p
ti
is excluded. A probabilistic prediction
about the output value
p
ti
is given by trained
GP
regression
model as:
f
*
|
C
–
p
i
,
p
i
,
c
t
–
p
ti
~
N
(μ,
σ
),
(32)
μ =
k
*
T
(K –
σ
n
2
I
)
–1
p
i
(33)
σ
2
=
k
(
x
*
,
x
*
)–
k
*
T
(K +
σ
n
2
I
)
–1
k
*
(34)
where
k
*
=
k
(
C
–
p
i
,
c
t
–
p
ti
) and
K
=
K
(
C
–
p
i
,
C
–
p
i
).
f
*
is the predicted
p
ti
based on the other observed activities. If the observed value
p
ti
is larger than µ+1.96
σ
, this activity will be vied as conflict-
ing with the others in this clip. µ is the predicted mean value,
σ
2
is its variance and (−
∞
,µ+1.96
σ
) is the 97.5 percent confi-
dence interval. Notice that
p
ti
less than µ-1.96
σ
is not viewed
as conflict, because in practice an activity causes conflict
when its intensity is strong enough. Each activity is modeled
by one
GP
regression model. Therefore, totally
K
GP
regression
models are necessary.
Illegal State Transition
Finally, a state is followed by another which is forbidden
according to the specific traffic rule. Figure 11 shows an
example of an illegal state transition caused by an abnormal
event of a fire engine interrupting the current vertical traffic
flow and driving rightward. The scene is in vertical flow in
t
-1 clip and interrupted by fire engine in
t
clip. During
t
+1
clip the fire engine is driving cross the scene. Therefore, the
t
+1 clip would be naturally classified as rightward flow with
high probability by
GP
classifier and the result can be modi-
fied by Equation 31. However, no matter based on our human
understanding or the clip’s features, this recognition is cor-
rect. According to the learned state transition rule as shown
in Figure 7, a rightward flow only follows after the leftward
flow. Hence, such case should be determined as an abnormal
event. We define a logical judgment to identify such abnormal
events. If
p
(
y
t
=
s
i
|
y
t
–1
=
s
j
)=
m
(
s
i
,
s
j
)<
th
word
, it will be identified as
an illegal state transition, i.e., some abnormal events occur.
Abnormal Events Localization
Users are always interesting in the location of ongoing
abnormal events. As discussed in the Visual Features Repre-
sentation Section Each of visual words contains the position
information of its cell in the camera scene. Therefore, all
visual words belonging to detected abnormal events can be
localized.
We have discussed three kinds of abnormal events and the
methods to detect them, respectively. Identifying the abnor-
mal events caused by rare motions and illegal state transition
is logic based, which is easy to realize and convenient to
apply. hospedales
et al
.
[11]
and
[8]
identify the use
LDA
model
to estimate the likelihood by iterative sweeps of the Gibbs
sampler and detect abnormal events which has low posterior.
Different from the methods in
[11]
and
[8]
, for the abnormal
events caused by conflicting activities, we use
GP
regression
to model the temporal relationship among activities during a
clip. It provides a probabilistic analysis of each activity with-
out complex computation.
Experiments
Dataset
Experiments were carried out in video data from three
complex and crowded traffic scenes regulated by the traffic
lights.
QMUL
Junction Dataset
contains 1 hour of 25 fps video
(90,000 frames) with frame size 360 × 288. The video covers a
busy traffic junction containing three major flows in different
directions.
QMUL
Junction Dataset 2
has a video length of 52
minutes with 25 fps (78,000 frames). The frame size is 360 ×
288. This video is captured in a busy street with particularly
busy pedestrian activity.
MIT Dataset
[9]
consists of 1.5 hours
of 30 fps (162,000 frames) with frame size 720 × 480, and
captures a far-field traffic scene.
For each dataset, the first 500 video clips (about 25 minute’s
length) were employed to learn the typical activities and traf-
fic states. The rest of the video sequences were employed to
simulate online screened video to test online performance, i.e.,
699 clips of
QMUL
Junction Dataset, 539 clips of
QMUL
Junction
Dataset 2 and 1711 clips of MIT Dataset were used for test.
The
ARD
kernel was adopted in
GP
models and the hyper-
parameters were optimized by
Conjugate Gradient
[27]
. The
Laplace’s
approximation method
[24]
was applied in
GP
classifi-
cation models.
To infer the latent variables under the
HDP
and
HDP
-
HMM
,
1000 sweeps of the Gibbs sampler were executed and the first
500 were used as a burn-in. To find the best hyper-parame-
ters (
β
,
α
) for our task, a grid search has been performed on
β
,a
∈
{0.1,0.5,1.0,1.52.0}. We analyzed the results with different
We got a interesting and useful outcomes; even though the
number of clusters increased with larger
β
and
α
, the numbers
of typical activities and states always converged when about
least 90% of the total motions were explained. These num-
bers kept consistent when
β
and
α
were both larger than 0.5.
The selected typical activities and states look similar. The ad-
ditional activities and were generated to explain very rare mo-
tions. In this thesis, we are only interested in typical activities
and states, and we did not use topic models to estimate likeli-
hood or posterior. Therefore, we did not need precise hyper-
parameters for the generative models. The hyper-parameters
were fixed at
β
=2,
α
=0.5 for all experiments. In actual imple-
mentation of
HDP
and
HDP
-
HMM
, the hyper-parameters can be
optimized by giving a vague gamma prior and sampling them
using the scheme proposed in
[20]
.
In Figure 6a to 6p) Some dominant activities and their per-
centages discovered by
HDP
models. Figure 6q has manually
labeled legal vehicles driving lanes (red lines) and pedestrians
walking lanes (yellow dash lines)
Learning Typical Activities and States
In the
QMUL
Junction Dataset, the
HDP
models automatically
learned 32 activities in this traffic scene, among which 22
were selected as typical activities (some of them are shown in
Figure 6). Their corresponding percentage computed by Equa-
tion13 are noted beneath. For a better illustration, all pos-
sible motion flows for vehicles and pedestrians are manually
208
April 2018
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING