Geert Pingen @gpgn

Cloud & Machine Learning Engineer, Research Scientist
Tracking People Streams
Research
Image Processing

Introduction

The management and control of large gatherings of people at events, such as festivals, poses significant challenges to public safety. The high density of people at these events can give rise to dangerous situations. Crowd analysis can aid in mitigating the security risks. In these occasions masses of people are monitored by one or more observers and, if necessary, security services are ready to act and redirect them to different areas. However, given the large scale of these situations, an automatic method to detect potentially dangerous phenomenon can be a helpful instrument to support human supervision. The construction of good crowd models can also aid in the design of public spaces, since these should be able to withstand the crowds that are expected to be there.

People stream analysis

Significant effort has already been directed towards crowd and people stream analysis [3] [4] [5], the detection of dangerous or incongruous events [6] [7], and towards the integration of such analysis into a surveillance system [8]. A 2013 review of video analytics of crowded scenes can be found in published work by M. Thida, et al.

Crowd behaviour patterns

Several behaviour patterns like blockings, bottlenecks, rings, fountainheads and lanes were proposed by Solmaz et al., and also adapted by Stijntjes. In their work, they propose a method of automated analysis of crowd behaviour based on optical flow and regions of interest. This method integrates low-level features retrieved from analysing the optical flow field of a scene, with high level information retrieved through the extraction of features by determining a Jacobian in regions of interest. Since it does not require training, or individual analysis of objects in the scene, it might prove useful for on the fly crowd behaviour analysis.

This project is based primarily on previous research done by Solmaz et al., and Stijntjes. Videos of crowded scenes were studied and patterns which may occur and evolve in risky situations were identified: in particular we focussed our attention on bottlenecks and studied them using a method based on optical flow and eigenvalue maps.

Blockings and bottlenecks may indicate obstructions, such as people stumbling, or partially closed off passageways. Fountainheads and ring formations may indicate that there is a potentially dangerous situation or object at the centre of these patterns. The automated early detection of such events can be a major contribution to health and safety services. Not only to guide the large stream of people to safety orderly, but also to rapidly respond to threats or give first aid.

Data

Footage of crowded scenes is easily found on the web, but most of them are registered from moving cameras or change perspective frequently, making them not quite suited for our purposes. In this project, input obtained from a fixed camera is analysed and predefined patterns (bottlenecks) are extracted where possible. To extend the dataset a number of computer generated video sequences have been created using a free software package Pedestrian Dynamics 2. This allows for data creation in a controlled and qualified manner for analysis. The complete dataset on which we tested our implementation therefore consists of our own generated data and an expansion of footage found on the internet.

Method

For a given video sequence, after preprocessing, the optical flow is determined. A Jacobian matrix is calculated for the entire flow field. From this matrix eigenvalue maps are calculated. Finally, relevant sections denoting possible bottleneck features are determined in the eigenvalue map. Sections marked as possible bottlenecks are compared with a ground-truth in order to determine the correctness of the method. We will go into a little more detail for each of these steps now.

Preprocessing

Raw video sequences are converted to grayscale and exported as individual images. Any detectable bottlenecks in the videos are examined by visual inspection and stored as ground truth. Furthermore, the geometric scale (pixels/m) of the video is determined by finding objects in the scene with a known size; this assumes the video is shot top-down. Angled videos (albeit small angles) were used as well. Finally, the input sequences were scaled to a fixed geometric scale of 10px/m. A typical input image is shown in Figure 1a.

Figure 1. Detection method sequence.

Figure 1. Detection method sequence.

Optical Flow

Optical flow calculation is done using the Horn and Schunk method. For every 2 consecutive frames the optical flow is calculated, and the mean is taken of the entire sequence of frames. The result is a 2D flow field of vectors that describes the average motion of a pixel spanning the video sequence, as can be seen in Figure 1b. The flow field is adjusted to the frame rate of the video sequence, and its geometric scale. It is then filtered using a median filter with a kernel size of 1x1 meter (10x10 px) to smooth out the flow field and eliminate large local fluctuations or errors. Finally, the flow field is thresholded to omit small (< 10^-5) values which are assumed to be caused by noise. The flow field is visualized by adding a 360 degree colour mapping based on the vector's angle and size (see Figure 1c) to aid in verifying the correctness of the flow field.

Jacobian and Eigenvalue maps

We construct a Jacobian from the optical flow field by considering a continuous dynamical system w=F(w)w' = F(w), with w=[u(w),v(w)]Tw' = [u(w), v(w)]^T and w(t)=[x(t),y(t)]Tw(t) = [x(t), y(t)]^T denoting particle velocities and positions respectively. Areas in which the stability is assessed, ww^*, referred to as critical points, are found by satisfying F(w)=0F(w^*) = 0. When taking into account small disturbances and noise z=wwz = w - w^*, and using Taylor's theorem this yields F(w+z)=F(w)z+H.O.T.F(w^*+z) = F(w^*)z + H.O.T.. If we then consider only the critical points (satisfying F(w)=0F(w^*) = 0), and disregard the higher order terms, this leaves z=JF(w)zz' = J_{F}(w^*)z with JFJ_{F} being the Jacobian matrix:

JF=[δuδxδuδyδvδxδvδy] J_{F} = \begin{bmatrix} \frac{\delta u}{\delta x} & \frac{\delta u}{\delta y} \ \frac{\delta v}{\delta x} & \frac{\delta v}{\delta y} \end{bmatrix}

The Jacobian's eigenvalues, trace (τ\tau), and determinant (Δ\Delta) are computed and used to label regions to types of behaviours with different colours based on the mapping from Solmaz et al. and Stijntjes as depicted in Figure 1d.

Feature detection

A filter mask is applied to the eigenvalue map to retain only the portions relevant to a bottleneck, according to the values in Figure 2. Colours signifying a bottleneck are yellow and cyan in Figure 1d. The relevant portions are retained in a black and white image as can be seen in Figure 1e. Blob detection is then applied at which point we discard blobs smaller than 15x15px (or 1.5m2). This reduces noise based on the assumption that a bottleneck feature would at least be equal to or larger than this size. Blobs in proximity to one another (within 1m) are merged. The process of blob detection and classification runs in a single iteration, therefore there is no change of blobs converging into a single blob (Figure 1f). Detected blobs are then labelled according to their behaviour type (Figure 1g).

Figure 2. Labelling of candidate regions. This table is a corrected version of the one used by Stijntjes.

Figure 2. Labelling of candidate regions. This table is a corrected version of the one used by Stijntjes.

Results

In comparison to our labelled ground truth data we obtain around 80% precision and around 80% accuracy, as can be seen in Figure 3. These scores are displayed relative to the epsilon value in the filter map from Figure 2.

Figure 3. Bottleneck detection result metrics.

Figure 3. Bottleneck detection result metrics.

Conclusion

We see an increase in accuracy compared to Stijntjes et al. and Solmaz et al. Misclassifications can also be seen to rapidly decline for large vales of ϵ\epsilon.

These results also show that for detecting bottlenecks with the fixed geometric scale a suitable and distinct range of values for ϵ\epsilon can be found empirically.

The method of detecting features using binary filters from the colour labels in the eigenvalue maps is simple but not very robust. Especially on a larger number of types and sizes of features more sophisticated methods (e.g. cluster analysis) are required.

Further reading

Results were published in the University of Twente Students Journal of Biometrics and Computer Vision (2014/2015).

A full publication is available at: https://ojs.utwente.nl/index.php/UTSjBCV/article/view/10

References

[1] Solmaz, Berkan, Brian E. Moore, and Mubarak Shah. "Identifying behaviors in crowd scenes using stability analysis for dynamical systems." Pattern Analysis and Machine Intelligence, IEEE Transactions on 34.10 (2012): 2064-2070.

[2] Stijntjes, Ingo. "Assessment of automated crowd behaviour analysis based on optical flow." (2014).

[3] Ali, Saad, and Mubarak Shah. "A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.

[4] Santoro, Francesco, et al. "Crowd analysis by using optical flow and density based clustering." Signal Processing Conference, 2010 18th European. IEEE, 2010.

[5] Jin, Xiaogang, et al. "Interactive control of large-crowd navigation in virtual environments using vector fields." IEEE Computer Graphics and Applications 6 (2008): 37-46.

[6] Velastin, Sergio A., Boghos A. Boghossian, and Maria Alicia Vicencio-Silva. "A motion-based image processing system for detecting potentially dangerous situations in underground railway stations." Transportation Research Part C: Emerging Technologies 14.2 (2006): 96-113.

[7] Khan, Sultan Daud, et al. "Detecting dominant motion flows and people counting in high density crowds." (2014).

[8] Fradi, Hajer, and Jean-Luc Dugelay. "Towards crowd density-aware video surveillance applications." Information Fusion 24 (2015): 3-15.

[9] Thida, Myo, et al. "A literature review on video analytics of crowded scenes." Intelligent Multimedia Surveillance. Springer Berlin Heidelberg, 2013. 17-36.