Contextualized Privacy Preservation Filters Using Crowd Density Maps

Get Complete Project Material File(s) Now! »

Crowd density estimation

An important problem in crowd analysis that has been studied in a number of works is crowd density estimation. Intuitively, different crowd density should receive different levels of attention. The objective of the related works that focus on this problem is either to provide an estimation of the crowd level, or to count the number of pedestrians. The taxonomy of methods that perform crowd density estimation in the form of person counting embodies two paradigms: detection-based (direct) and regression-based (also called map-based or indirect) methods. The first paradigm consists of aggregating person counts from local object detectors. Once the object detector is applied, localizations of all person instances are given. Having obtained that, person counts can proceed in straightforward manner. By applying these methods, the count is not affected as long as people are correctly segmented. But, the difficulty is that detecting people is by itself a complex task.
Detection-based counting can generate accurate estimation in low dense scenes, however they face some difficulties in high crowded scenes because of occlusions. This problem has been partially addressed by adopting part-based detectors [130], or by detecting either only heads [74, 125] or the -shape formed by heads and shoulders [73]. These attempts to mitigate occlusions could be effective in medium crowd scenes, however, they are not applicable in very crowd scenes which are of primary interest for people counting.
Since analyzing crowded scenes still remain challenging (because of the spatial overlaps that makes delineating people difficult), most of the recent works bypass the task of detecting people and instead focus on extracting a set of low level image features. This paradigm of counting methods is based on regression to learn the relationship between the set of extracted features and the number of persons [98]. Once trained, an estimate for object counts can be obtained from the values of the extracted features. In this context, intensive study has been conducted by employing different features. Some of them are features of foreground pixels (e.g. total area, textures and edge count [15], [28], [67], [83]) and the others are based on measurements of interest points (e.g. corner points [3] and SURF features [23]). Also, this problem has been addressed by applying different regression functions (e.g. linear in [3] and [92], ε-SVR regressor and ANFIS in [2], Bayesian Poisson in [18] and Gaussian Process regression in [15]) to select the one fitting the features. This extensive study varying the features or the trainable function is caused by the fact that the features deviate from the perfect case where the number of persons is simply proportional to the features. Therefore, instead of training more features and testing different regression functions, we are interested in revealing the factors that affect the relationship between the features and the number of persons. More details about the related works to people counting problem and our proposed approach to handle that can be found in Chapter 3.

Detection and Tracking in crowded scenes

Automatic detection and tracking of people in video data is a common task in the research area of video analysis and its results lay the foundations of a wide range of applications such as video surveillance, behavior modeling, security applications, traffic control, and mugging detection. Many tracking algorithms use the “Tracking-by-detection » paradigm which involves the application of a detection algorithm in individual frames and then estimates the tracks of different objects by associating the previously computed set of detections across frames. Tracking methods based on these techniques are manifold and include e.g. graph-based approaches ([52], [93]), particle filtering frameworks ([11]) and methods using Random Finite Sets ([34]).
Although there are different approaches to the tracking problem, all of them rely on efficient detectors which have to identify the position of persons in the scene while minimizing false detections (clutter) in areas without people. Techniques based on background subtraction such as [38] are widely applied thanks to their simplicity and effectiveness but are limited to scenes with few and easily perceptible constituents. Generally, conventional tracking algorithms that focus on one particular object in the scene have some difficulties to deal with an unknown number of targets and the interactions among them in multi-target tracking problem. The application of these object-centric methods on videos containing dense crowds is therefore even more challenging and more issues could be encountered in such cases.
Crowded scenes exhibit some particular characteristics rendering the problem of multitarget tracking difficult. Targets are often occluded by other objects in the scene or by other targets which makes it difficult to distinguish one specific person from the others. Also, the size of a target in crowds is usually small which affects its appearance in video sequence.
The aforementioned factors contribute to the loss of observation of the target objects in crowded videos. These challenges are added to the classical difficulties hampering any tracking algorithm such as: changes in the appearance of targets related to the camera view field, the discontinuity of trajectories when the target exits the field of view and re-appears later again, cluttered background, and similar appearance of some objects in the scene.
Because of all these issues, human detection or tracking paradigms fail in such scenarios. The problem of tracking in crowds has been studied in many works which attempt to perform that in scenes of medium-to-high density from monocular video sequence [5, 48, 75, 70, 11, 136] or recorded from multiple camera configurations [66, 42]. In medium crowded scenes, multi-target tracking could be performed by applying tracking-by-detection [11, 70]. Whereas, in extremely crowded scenes another category of methods has been recently proposed. It consists of learning motion patterns in order to constraint the tracking problem. For instance, in [5], global motion patterns are learned and participants of the crowd are assumed to follow a similar pattern. Rodriguez et al. [100] extend this approach to cope with multimodal crowd behaviors by studying overlapping motion patterns. These solutions are not suitable for tracking objects whose movements are not conform to the global motion patterns. Besides, these methods operate in off-line mode, they require the availability of the entire test sequence. Also, the learned patterns are tied to a particular scene.

READ BEYOND ALLOYING EFFECTS: MICROSTRAIN-INDUCED ENHANCEMENT OF ELECTROCATALYTIC PROPERTIES ON VARIOUS PTNI/C NANOSTRUCTURES

Crowd change modeling, detection and event recognition

Crowd behavior analysis has recently attracted research attention. This problem covers different subproblems such as crowd change or anomaly detection [12, 29, 86, 58, 118, 19, 12], and crowd event recognition [106, 47, 3, 16, 64, 135, 36]. The goal is to automatically detect changes and to recognize crowd events in video sequences. Usually the activity process in video sequence can be categorized into three main steps: (1) detection, (2) tracking, and (3) event recognition [47]. Given the difficulties encountered by analyzing crowded scenes, usually, research works related to crowd event recognition bypass the detection and the tracking of individuals in the scenes. Instead, some works focus on detecting and tracking local features [58, 19, 106, 3], or particles [86, 64, 135]. The extracted local features (points of interest) are employed to represent the individuals present in the scene. In this case motion patterns that have to be associated to individuals are assigned to the local features. By this way, tracking of individuals in crowds which is a daunting task is avoided. Likewise, alternative solutions that operate on particles tracking, observe that when persons are densely crowded, individual movement is restricted, thus, they consider members of the crowd as granular particles. Then, these methods proceed by putting a grid of particles over the image frame and moving them with the flow field. Other methods operate on foreground masks [29, 17, 12] by considering these foreground areas as regions of interest, denoted as activity area in [12].

Table of contents :

Abstract
Contents
List of Figures
List of Tables
1 Introduction
1.1 Context and Motivation
1.2 Thesis Contributions
1.3 Thesis Outline
2 Video Surveillance Systems and Crowd Analysis
2.1 Introduction
2.2 Automated Surveillance Systems
2.2.1 Detection of interesting objects
2.2.2 Object Tracking
2.2.3 Object Categorization
2.2.4 Behavior Analysis
2.3 Crowd Analysis
2.3.1 Crowd density estimation
2.3.2 Detection and Tracking in crowded scenes
2.3.3 Crowd change modeling, detection and event recognition
2.4 Conclusion
I Low Level Features Analysis for Crowd Density Estimation
3 People Counting Using Frame-Wise Normalized Feature
3.1 Introduction
3.2 Related Works
3.3 Frame-Wise Normalized Feature Extraction
3.3.1 Based on measurements of interest points
3.3.2 Based on measurements of foreground pixels
3.4 Gaussian Process regression
3.5 Experimental Results
3.6 Conclusion
4 Crowd Level Estimation using Texture Features Classification
4.1 Introduction
4.2 Related Works
4.3 Patch-Level Analysis
4.4 Subspace Learning on Local Binary Pattern
4.4.1 Block-based Local Binary Pattern extraction and histogram sequence normalization
4.4.2 Discriminative subspace learning
4.5 Multi-Class SVM classifier
4.5.1 Baseline multi-class SVM method
4.5.2 MultiClass SVM based on Graded Relevance Degrees
4.6 Experimental Results
4.6.1 Dataset
4.6.2 Experiments
4.6.3 Results and analysis
4.7 Conclusion
5 Crowd Density Map Estimation Using Sparse Feature Tracking
5.1 Introduction
5.2 Motivation
5.3 Crowd Density Map Estimation
5.3.1 Extraction of local features
5.3.2 Local features tracking
5.3.3 Kernel density estimation
5.4 Evaluation methodology
5.5 Experimental Results
5.5.1 Datasets and Experiments
5.5.2 Results and Analysis
5.6 Conclusion
II Crowd Density-Aware Video Surveillance Applications
6 Enhancing Human Detection and Tracking in Crowded scenes
6.1 Introduction
6.2 Related Works
6.3 Human detection using Deformable Part Based-Models
6.4 Integration of geometrical and crowd context constraints into human detector
6.4.1 Geometrical Constraints
6.4.2 Crowd Context Constraint:
6.4.3 Summary of the integration algorithm
6.5 Tracking-by-detection using Probability Hypothesis Density
6.6 Experimental Results
6.6.1 Datasets and Experiments
6.6.2 Results and Analysis
6.7 Conclusion
7 Contextualized Privacy Preservation Filters Using Crowd Density Maps
7.1 Introduction
7.2 Related Works
7.3 Incorporation of Crowd Density Measure in a Privacy Preservation Framework
7.3.1 RoIs detection
7.3.2 Adaptive privacy filters
7.4 Experimental Results
7.4.1 Datasets and Experiments
7.4.2 Results and Analysis
7.5 Conclusion
8 Crowd Change Detection and Event Recognition
8.1 Related Works
8.2 Crowd attributes
8.2.1 Local crowd density
8.2.2 Crowd motion: Speed and Orientation
8.3 Abnormal change detection and event recognition
8.3.1 Crowd modeling
8.3.2 Crowd Change Detection
8.3.3 Event Recognition
8.4 Crowd event characterization
8.4.1 Walking/Running:
8.4.2 Evacuation:
8.4.3 Crowd Formation/Splitting:
8.4.4 Local Dispersion
8.5 Experimental Results
8.5.1 Datasets
8.5.2 Experiments and Analysis
8.6 Conclusion
9 Conclusions and Future Perspectives
9.1 Conclusions
9.2 Limitations, extensions and directions for future research
A Foreground Segmentation
A.1 Introduction
A.2 Baseline Method: Background subtraction by Gaussian Mixture Model
A.3 Improved Foreground Segmentation Using Uniform motion estimation
A.4 Experimental Results
A.5 Conclusion
B Résumé en Français
B.1 Introduction
B.1.1 Contexte et motivation
B.1.2 Contributions
B.1.3 Plan
B.2 Analyse des caractéristiques de bas niveau pour l’estimation de la densité des foules
B.2.1 Comptage des personnes à l’aide d’une caractéristique normalisée .
B.2.2 Estimation du niveau de la foule par la classification des caractéristiques de texture
B.2.3 Estimation de la carte de densité en utilisant le suivi des caractéristiques locales
B.3 Applications utilisant de la densité de la foule
B.3.1 Détection et suivi des personnes dans des scènes denses
B.3.2 L’analyse du comportement de la foule
B.3.3 Amélioration de la compatibilité entre la vie privée et la surveillance
B.4 Conclusion
Bibliography