Methods for tracking with overlapping multi-‐cameras

Get Complete Project Material File(s) Now! »

Methods for tracking with overlapping multi-‐cameras

A lot of works on multi-‐camera surveillance requires overlapping fields. Using multi-‐camera with overlapping fields of view allows to see the object from several different angles. This, in turn, helps to solve the problem of occlusion. A synthesis of methods is presented in table 1-‐1, and more details are gives below.
􀁸 Jain and Wakimoto (Jain, et al., 1995) develop an approach for person tracking using calibrated cameras and computing 3D environments model. The alignment of the viewpoints of the same person depends on the absolute position of objects in 3D environments.
􀁸 Cai et al (Cai, et al., 1999) use calibrated cameras to track the object. Their approach consists of tracking the person using a single camera and the switching across cameras is done when these no longer have a good view of the object. Selecting the switched cameras depends on a simple threshold of the tracking confidence. They model three types of features: location, intensity and geometry using a multi-‐variant Gaussian. The Mahalanobis distance is used for matching.
􀁸 Chang and Gong (Chang, et al., 2001) overcome the occlusion by matching objects across several cameras. They use a Bayesian to fuse the geometric constraints: epipolar geometry, landmarks and homography, with the appearance color model and the height of the object.
􀁸 Lee et al (Lee, et al., 2000) use overlapping cameras to track the objects in urban scenarios. They use the intrinsic camera parameters and the alignment of moving object centroids to map these points onto a common frame.
􀁸 Dockstader and Tekalp (Dockstader, et al., 2001) use the Bayesian net to iteratively fuse the independent observations from multiple cameras to produce the most likely vector of 3D state estimates for the tracked objects. They use also the Kalman filter to update this estimation. Sparse motion estimation and foreground region clustering are extracted as features.
􀁸 Mittal and Davis (Mittal, et al., 2003) segment images in each camera view and track the object using Kalman filter. Then they use epipolar geometry to match the segment of the object to find the depth of the object points. The points are mapped onto the epipolar plane and the correspondences between the points are achieved with a Bayesian network. The foreground objects are modeled by their color.
􀁸 Khan and Shah (Khan, et al., 2003) avoid the calibration using the constraints of the lines of the field of view (FOV). The information of FOV is learned during a training phase. Using this information. They can predict the label of an object that has been followed in a camera, in all other cameras in which the object was visible.
􀁸 Kim et al (KIM, et al., 2006) build a colored part-‐based appearance model of a person. Then the models across cameras are integrated to obtain the ground plane locations of people. The correspondence of a model across multiple cameras is established by mapping the model onto homographies planar to construct a global top view. The result of the projection is extended to a multi-‐hypothesis framework using particle filtering.
􀁸 Gandhi and Trivedi (Gandhi, et al., 2007) use calibrated cameras and a model of the environment for the 3D position of a person (see figure 1.17). This approach captures the information to look up a number of angles and azimuths around the person. It produces a panoramic appearance of the person (Panoramic Appearance Map) which is used for matching.
􀁸 Calderara et al (Calderara, et al., 2008) present methods for consistent labeling (tracking). They extend the approach of Khan et al (Khan, et al., 2003) by an offline training process computing homography and epipolar geometry between cameras. The consistent labeling is solved using Bayesian statistics. They use the color appearance model consisting of the centroid, the color template and the mask of object.
􀁸 Kayumbi and al (Kayumbi, et al., 2008) use homographic transformation to integrate different views into the large top-‐view plane. Then they transform the detected player locations from the camera image plane into the ground plane. Finally, the tracking on the ground plane is achieved using graph matching of trajectories of objects.
􀁸 Dixon and al (Dixon, et al., 2009) avoid the calibration by using the sheet (sheet is an image constructed from video data extracted along a single curve in the image space over time) to decompose a 2D tracking problem into a collection of 1D tracking problems, exploiting the fact that in urban areas vehicles often remain within lanes. The data association is expressed as MAP (maximum-‐a-‐posteriori) estimation by using the min-‐cost flow formulation.

Shape features

The use of moments is widespread in pattern recognition domain. Since their introduction by Hu (Hu, 1962) in 1961, which proposed a set of moments invariant to translation, scaling and rotation using the theory of algebraic invariant, Fulsser and Suk (Flusser, et al., 1993) extended it to the affine moments invariant(AMI) which is invariant to affine transform, and Van Gool (Gool, et al., 1996) suggest a set that is additionally invariant to photometric condition. These moment based are more sensitive to noise than the moment which based on an orthogonal basis like Zernike moments
(Khotanzad, et al., 1990) which are invariant to rotation and scale. Wang et al (Wang, et al., 1998) extended these moments to be invariant to illumination, with good experimental, but the method involves a high computational complexity. The work (Adam, et al., 2001)showed that the Fourier-‐ Mellin transform gives better results than other signatures generally used in the literature for character recognition with multi-‐oriented multi-‐scale rotations on their images up to 􀍳􀍺􀍲􀬴 and robust against noise.

Contour features

The second approach typically referred to is the Fourier descriptor (Rui, et al., 1996) . It involves a characterization of the contours of the shape. The curvature scale space descriptors (CSSDs) (Abbasi, et al., 1999) are also widely used as shape descriptors which detect the curvature points of the contours at different scales using a Gaussian kernel to convolve successive silhouettes. The experimental results in (Zhang, et al., 2003) show that the Fourier descriptors are more robust to noise than CSSDs. The main drawback is the need to obtain an object with a clearly segmented contour. It is difficult to obtain a full close contour of the object. Furthermore, the detection of all the edges contained in an image can be disturbed by internal or external contours of the object.

Table of contents :

Résumé
Plan du mémoire
ABSTRACT
Manuscript outline
1 What is IDVSS?
Motivation
Definition
1.1 Techniques used in surveillance systems
1.1.1 Object detection
1.1.2 Object classification
1.1.3 Tracking
1.1.4 Understanding
1.1.5 Databases and semantic description
1.2 Intelligent Distributed Video Surveillance Systems
1.2.1 Methods for tracking with overlapping multi-‐cameras
1.2.2 Methods for tracking with non-‐overlapping multi-‐cameras
1.3 Conclusion
2 Visual features
Motivation
Introduction
2.1 Feature types
2.1.1 Shape features
2.1.2 Contour features
2.1.3 Texture features
2.1.4 Color features
2.2 Representation of global features
2.3 Local Features
2.3.1 The principle of interest points
2.3.2 Interest points detector
2.3.3 Efficient implementations of IP detectors
2.3.4 The interest points descriptors
2.3.5 Choice of descriptors and detector
2.4 Conclusion
3 Pedestrian Detection with keypoints
Motivation
Introduction
3.1 Related works
3.1.1 Background substraction
3.1.2 Statistical Methods
3.1.3 Spatial-‐based Background Models
3.2 The proposed method
3.2.1 Construction of the initial background
3.2.2 Adaboost Classification
3.2.3 AdaBoost training algorithm
3.3 Results
3.3.1 Background Substraction results
3.3.2 Keypoint classification results
3.3.3 Evaluation of Keypoints filtering with Adaboost only
3.3.4 Evaluation of the cascade filtering
3.4 Comparison with HoG detector
3.5 Conclusion
4 Re-‐identification
Motivation
Introduction
4.1 Related works
4.1.1 Template matching methods
4.1.2 Color histogram based methods
4.1.3 Local features based methods
4.1.4 Kd-‐Tree search
4.2 The proposed algorithm
4.2.1 Schema of the algorithm
4.3 Experimental evaluation
4.3.1 Evaluation metrics
4.3.2 Caviar Dataset
4.3.3 Our newly built corpus with 40 persons
4.3.4 Individual recognition analysis
4.3.5 NNDR ratio influance
4.4 Proposed improvement for the re-‐identification
4.4.1 Construction of the model
4.4.2 Construct the query
4.4.3 Experimental Results
4.5 Experimental Results using ETHZ
4.6 Comparison with other region descriptors
4.6.1 Comparison SURF vs. SIFT
4.6.2 Comparison SURF vs. COLOR
4.6.3 Comparison HOG vs. SURF
4.7 Comparison KD-‐tree vs. hierarchical k-‐means tree
4.8 Conclusions
5 Conclusion & Perspective
5.1 Summary
5.2 Future work.
6 Publications
7 Bibliography