Asynchronous Part-based Shape Tracking
Object tracking is a fundamental task in many computer vision applications, including video-based surveillance systems -, human-computer interaction , augmented reality  or tra c monitoring . Despite being extensively studied over the last decade, fast object recognition and tracking remains a challenging and computationally expensive problem.
A major application of visual tracking is face detection and tracking. This area has been dominated in the recent years by appearance-based methods . These techniques learn a global representation of the object from a set of training images and have been deeply studied, originating from the pioneering work of Viola & Jones introduced in , later improved by di erent researchers such as -. A complete review of the recent advances in face detection is presented in . In the eld of event-based vision, an example can be found in .
Appearance-based methods implicitly assume a rigid spatial relation between the parts that constitute the object. Other methods that overcome this limitation rely on modeling an object as a set of simple parts with a exible geometric relation between them. These techniques, known as part-based methods, have received a lot of attention as they allow the recognition of generic classes of objects and the estimation of their pose. An early example of this technique can be found in , in which a model known as Pictorial Structures is introduced. In the Pictorial Structures framework the local elements of the object are assumed to be rigid and linked by springs. A probabilistic model is de ned and looks for the best match that minimizes two cost functions: one for the matching of individual parts and another one for the geometric relations between them. The main limitation of this technique, which has prevented its use for a long time, is the high computational cost needed to solve the resulting minimization problem. With the increase in available computational power, this method has received renewed attention (e.g. in -).
In spite of the recent improvements, these techniques are fundamentally limited by the low dynamics imposed by the frame rates of current cameras. In this chapter we present a new approach to the problem, which relies on event-based vision to provide high temporal resolution and sparse output, thus yielding a true dynamic framework to the problem -.
Despite its promising characteristics, few object tracking algorithms exploiting the possibilities of event-driven acquisition have been developed so far. An event clustering algorithm is introduced for tra c monitoring, where clusters can change in size but are restricted to a circular form -. The sensor has been recently applied to track particles in microrobotics  and in uid mechanics . It was also used to track micro-gripper’s jaws to provide real-time haptic feedback on the micro meter scale . In  a blob tracker is described which is capable of adapting its shape and position to the distribution of incoming events, assuming that it follows a bivariate Gaussian distribution.
The part-based shape tracking introduced in this chapter is designed to track more complex shapes. It de nes the model of an object as a set of blob tackers linked by springs. Event-based vision, with its asynchronous acquisition, allows a data-driven part-based update of the model iteratively for every incoming detected event. Thus, instead of explicitly minimizing the energy function, we simply let the system evolve. As we apply the e ects of the springs, their elastic energy tends to get smaller, in the same way as it would in a real mechanical system. Therefore the resulting algorithm is simple and still robust.
This chapter is organized as follows: in the rst place we give a mathematical descrip-tion of the stream of visual events generated by neuromorphic cameras in Section 2.2. Next, we describe our part-based shape tracking in Section 2.3, while Section 2.4 presents the experiments carried out to validate the method. Finally, in Section 2.5 we brie y discuss the obtained results.
Stream of Visual Events
As previously explained in Chapter 1, neuromorphic cameras encode visual information as a stream of asynchronous events. Here, events indicate a su cient change of luminance at a given position on the focal plane and at a precise instant. The stream of visual events can then be mathematically described as follows: let ek = [uTk ; tk; pk]T be a quadruplet describing an event occurring at time tk at the spatial position uk = [xk; yk]T on the focal plane. The polarity pk can take values 1 or -1, depending on whether a positive or negative change of luminance has been detected. The time tk is often referred to as the timestamp of the event, and is expressed with microsecond precision.
For the remainder of this document, the subindex k will always indicate the time dependency of a given event-based variable (i.e. its value at time tk).
Part-Based Shape Tracking
We next present our part-based shape tracking. Let us remind the reader that the method presented in this chapter describes an object as a set of simple shapes linked by springs. Here, the simple shapes are the Gaussian trackers developed in  and explained in Section 2.3.1. The spring-like connections are then presented in Section 2.3.2.
Gaussian Blob Trackers
When an object moves, the pixels generate events which geometrically form a point cloud that represents the spatial distribution of the observed shape. The event-based Gaussian tracker developed in  assumes that these events are normally distributed. According to this assumption, the event cloud approximates a bivariate Gaussian distribution, whose parameters can be iteratively updated with the incoming events. Thus, the Gaussian tracker is driven by the asynchronous events, causing it to move and to deform so as to approximate the event cloud’s spatial distribution.
Figure 2.1: A Gaussian tracker B following a cloud of events is de ned by its location k and covariance matrix k.
In order to illustrate the update procedure, let us rst assume that we are observing a scene with just one object. Let B( k; k) be the Gaussian tracker shown in Fig. 2.1, de ned by its mean k 2 R2 and its covariance matrix k 2 R2 2. Here, the mean represents the object’s position, and the covariance matrix is used to compute its size and orientation (the interested reader can refer to  for details). Following the standard convention in this document, the subindex k indicates the time dependency of these variables (i.e. k represents the value of at time tk, etc.).
The position k is then computed as the weighted mean of the position of the events previously assigned (i.e. spatio-temporally close) to the tracker. Without loss of gener-ality, and just to ease the notation during the mathematical development, let us assume that all the previous events have been assigned to this Gaussian tracker.
We can think of a number of di erent weighting strategies but, in general, we will set the weights !k;j so that they decrease with j, giving a greater importance to the most recent events. Here, k is just a normalizing factor.
which are the iterative expressions used in . Here, !0 and 0 are update factors and should be tuned according to the event rate. Let us note that these expressions still hold when not all the previous events have been assigned to the same Gaussian tracker, provided that the position and the covariance matrix of a tracker are only updated when an event is assigned to it.
Next, let us imagine that the scene we are observing contains several objects. Conse-quently, let us assume that several Gaussian trackers have been initialized. These Gaus-sian trackers are identi ed(i)by their index i, and denoted B(i)( k(i); k(i)). The probability that a Gaussian tracker B generates an event ek at the spatial position uk is then given.
where the probability is computed with the last available position and covariance matrix (i.e. the values at tk 1). Then, assuming that all the Gaussian trackers have the same prior probability of generating an event, Bayes’ theorem  allows us to conclude that the tracker that has most likely generated ek is the one with the highest Gaussian probability p(ki). Each incoming event will then be assigned to the tracker with the highest p(ki), provided that this probability is greater than a prede ned threshold, p(ki) > (usually set to 0:1). Once the most probable tracker has been identi ed, its parameters are updated by integrating the last distribution with the current event information, as described in (2.6) and (2.7).
Finally, the activity A(ki) of each tracker B(i) is updated with each incoming event ek, following an exponential decay function which describes the temporal dimension of the Gaussian kernel.
The Gaussian tracker introduced in the previous section produces robust results when dealing with simple objects, specially in the case of ellipse-like shapes. When attempting to track a more complex object, we can assume that this object is composed of a set of simple shapes linked by geometric relations. These relations, however, cannot be xed, as the movement of the object in the 3D space will cause its projection onto the focal plane to be modi ed.
In order to build a tracker capable of following the structure of observed objects, we model our system as a set of simple trackers linked by springs. Thus, each tracker of the set will be driven both by the incoming events and by the elastic connections linking it to other elements.
The force F exerted by a linear spring is given by the well known Hooke’s law , which states that the direction of the force is that of the axis of the spring, and its magnitude is given by the expression:
where l = l l0 represents the elongation of the spring, which is the di erence between its current length l and the equilibrium length l0. C is a characteristic of the spring known as its sti ness.
Fig. 2.2(a) shows a linear spring to which an object of mass m has been attached. An energy dissipation mechanism is also added and is represented in this case by an ideal damper. The damping force D applied by a damper is modeled as being proportional and opposed to the speed between its opposite sides:
Figure 2.2: Principle of a damped spring: (a) A mass m is attached to a mechanical system composed of a linear spring and a linear damper. (b) When the system is out of its equilibrium position two forces F (elongation) and D (frictional) appear. Their directions are opposed respectively to those of the displacement and the velocity.
Fig. 2.2(b) shows the system out of its equilibrium position and with a certain speed. In that case, two forces F and D appear, their directions being opposed to those of the displacement and the velocity respectively. Applying Newton’s second law we obtain the di erential equation of the system, needed to solve in order to calculate the object’s acceleration, velocity and position:
This is a typical problem in classical mechanics, and its solutions are well known and studied -. If a series of connections is set between the di erent trackers, assigning masses to these trackers allows us to actually model the behavior of this virtual dynamic system. Even if we are not modeling a real system, keeping the concept of masses for the trackers allows us to control their relative displacements, assigning bigger masses to the elements that we wish to be more stable.
Let C(ij) be a connection bounding the Gaussian trackers B(i) and B(j). As shown in Fig. 2.1, the position of the trackers is represented by (ki), (kj). Fig. 2.3(a) shows this connection in its equilibrium state, where l0(ij) represents the equilibrium distance and k(ij) the angle formed by the axis of the connection and the horizontal axis. In what follows, we will use a simpli ed representation of the connection, showing only the spring.
Figure 2.3: (a) Connection C(ij) (that links the Gaussian trackers B(i) and B(j)) in its initial equilibrium state: l0(ij) represents the initial length of the spring, and 0(ij) the angle formed by the axis of the connection and the horizontal axis. (b) The trackers follow incoming events, moving the connection away from its equilibrium position. The di erence between the current length of the connection lk(ij) and the initial length is known as the elongation of the spring lk(ij). (c) The trackers are driven by the connection, which tries to recover its initial length.
Let us assume that this connection behaves as a single linear damped spring that can freely rotate around its ends, where it is connected to the respective trackers. When modeling the system this way, we are only taking into account the euclidean distance between the trackers, and not at all the direction of the connection. From now on, we will refer to this con guration as the euclidean con guration.
Fig. 2.3(b) and Fig. 2.3(c) illustrate the evolution of the trackers from equilibrium, as they are driven by both the incoming events and the spring-like connections. As the events start arriving, they will cause the trackers to move away from their initial positions, eventually producing a certain elongation of the spring. Fig. 2.3(b) shows the state of the system after the trackers have been displaced by the events, where lk(ij) represents the current distance between the trackers and lk(ij) the corresponding elongation, given by:
As a consequence of this elongation, the trackers will then be driven by the spring-like connection, which tries to recover its initial equilibrium length. Fig. 2.3(c) shows the corresponding displacement of the trackers k, which are always in the opposite direction to the elongation. Thus, when we apply the e ect of the springs we update the position of the trackers making:
In order to compute these displacements, we will simply assume that they are pro-portional to the elongation. This approximation, causing an exponential decay towards equilibrium, is valid under the conditions described in the Appendix B, and will result in the following values for the displacements:
where (ij) is a scaling factor that controls the sti ness of the connection, and m(i), m(j) represent the masses associated with the ith and jth trackers respectively. Here, k(ij) denotes the angle between the axis of the connection and the horizontal line.
Torsional Con guration
If we want to keep the angle of each connection close to the equilibrium value, we can imagine the trackers as being linked by torsion springs. The force applied by a torsion spring is proportional to the di erence between the current angle and the equilibrium angle. Adding an ideal prismatic joint between the trackers allows us to avoid taking into account the distance between them. The equivalent mechanical system can be seen in Fig. 2.4, where 0(ij) represents the initial equilibrium angle of the connection and k(ij) its current value.
We will refer to this con guration as the torsional con guration. In this case, the torsional elongation is given by:
Figure 2.4: Torsional con guration: trackers are linked by torsion springs. The equilib-rium angle is initially 0(ij), while k(ij) denotes its position after torsion.
where k is the mean torsional elongation. Thus, if all the connections rotate through the same angle in the same direction, the torsional elongation will be zero for all of them, making the system insensitive to rotation.
Next, from the value of the elongation, we compute the corresponding displacements to be applied to the the trackers. As in the case of the euclidean con guration, we simplify the e ect of the spring using a rst order approximation.
Cartesian Con guration
If we want to keep both the distance and the angle of each connection close to those of the equilibrium, we can set a horizontal and a vertical equilibrium distances. The equivalent mechanical system is represented in Fig. 2.5, where x(0ij) and y0(ij) are the horizontal and vertical equilibrium distances respectively. x(kij) and yk(ij) represent the horizontal and vertical elongations, given by the equation:
Figure 2.5: Cartesian con guration: the initial distances between the trackers are equal to x(0ij) and y0(ij), which correspond to the horizontal and vertical equilibrium distances of the connection. The tracker activity following events originates an elongation given by x(kij) and yk(ij) causing a change of distance between trackers B(i) and B(j).
Using the energy as a matching criterion
The elastic energy of a mechanical spring is given by the expression:
The elastic energy of a connection constitutes a measure of its deformation. Conse-quently, the energy of the spring-like connections can be used as a quality criterion for the tracking: if our tracker is correctly following the desired object, chances are that the energy of all the connections will be coherent. Thus, we de ne two energy-based criteria designed to yield the tracker more robust to partial occlusions. Let us imagine that our part-based tracker is correctly tracking an object. Fig. 2.6(a) shows the state of the system in such a situation. As a certain degree of deformation is acceptable, the energy of its connections will typically be di erent from zero. However, we will assume them to be relatively stable and similar to each other as long as the system is correctly tracking the desired object. Next, let us imagine that a partial occlusion occurs, generating a cloud of events that does not correspond to the tracked object. In a rst step, let us imagine that a single tracker B(i) starts following this wrong cloud of events, while the rest of the trackers are una ected. Fig. 2.6(b) shows the resulting situation: in this case, the energy of all the connections C(ij) bounding B(i) will grow to be much bigger than the rest. Thus, when the energy of all the connections bounding a certain tracker is bigger than a threshold, we will make this tracker stop following events. As we wish to allow stable growth of the elastic energy, this threshold will be de ned as proportional to the mean elastic energy of all the connections. The criterion will therefore be expressed by:
where Ek represents the mean elastic energy of all the connections and s is a positive scaling factor. An insensitive tracker cannot have events assigned to it. As this condition is evaluated for every incoming event, the tracker will be insensitive until the energy of one of its connections drops below the threshold.
Figure 2.6: (a) The system is correctly tracking an object, until an occlusion occurs. (b) If this occlusion attracts only one tracker B(i) the energy of every connection linking this tracker will grow to be bigger than the rest. (c) If the elastic energy of every connection linking the tracker B(i) gets bigger than a threshold (de ned as proportional to the mean elastic energy) then the tracker becomes insensitive. This means that no event can be assigned to the tracker. Consequently, it will be driven exclusively by its connections, and quickly recover its equilibrium position relative to its neighbors.
As a consequence of the tracker being insensitive to the incoming events, it will exclu-sively be driven by the spring-like connections. This will cause it to quickly recover its equilibrium position relative to its neighbors, typically nding the desired object again.
Preventing a group of trackers from following the wrong cloud of events
The second mechanism is designed to avoid a group of trackers following the wrong cloud of events. Fig. 2.7(a) shows the same system as in the previous case, correctly tracking the desired object. In this case, however, the cloud of events generated by the partial occlusion will attract two trackers B(i) and B(j). As we can see in Fig. 2.7(b), the previous mechanism will not be activated by this situation, as the energy of the connection bounding B(i) and B(j) remains stable.
Figure 2.7: (a) The system is correctly tracking an object, until an occlusion occurs. In this case, we will suppose the occlusion to attract two trackers B(i) and B(j). (b) In this case, Ek(ij) remains stable, and the previous mechanism will not be activated. However, the energy of the rest of connections linking these trackers will grow to be bigger than the energy threshold. (c) If the elastic energy E(ik) of any connection C(ik) gets bigger than a threshold (de ned as proportional to the mean elastic energy) then both of the trackers B(i) and B(k) linked by the connection are attracted towards their equilibrium position, relative to the center of mass of the set of feature trackers.
Let Gk denote the coordinates of the center of mass of the set of trackers at time tk, and let G(0i) = (0i) G0 represent the di erence between the center of mass and a generic tracker B(i) at the initial equilibrium position (see gure 2.7(a)). If the energy Ek(ij) of a connection C(ij) is bigger than a certain multiple of the mean elastic energy, then we will displace both trackers B(i) and B(j) towards their equilibrium position, relative to the current position of the center of mass. In the same way as for the spring-like connections, the displacements applied to the trackers will be proportional to the distance to the equilibrium position (relative, in this case, to the center of mass).
where (en) is the proportionality factor, equivalent to the sti ness of our spring-like connections. This mechanism is in fact quite similar to that of the spring-like connections. However, there is a fundamental di erence: it is just applied when the connections surpass the energy threshold, and its equilibrium distance is de ned relatively to the center of mass.
Let us note that the parameter s has an strong impact on the behavior of the system, and it should be chosen carefully. A detailed study of its impact will be discussed in the next section.
Table of contents :
2 Asynchronous Part-based Shape Tracking
2.2 Stream of Visual Events
2.3 Part-Based Shape Tracking
2.3.1 Gaussian Blob Trackers
2.3.2 Spring-Like Connections
2.3.3 Using the energy as a matching criterion
2.3.5 Global algorithm
2.4.1 Tracking a planar grid
2.4.2 Face tracking
2.4.3 Computational Time
3 Event-Based Line and Segment Detection
3.2 Event-Based Line Detection
3.2.1 Event-Based Visual Flow
3.2.2 Event-Based Least-Squares Line Fitting
3.2.3 Optimization strategy
3.2.4 Event-Based Line Detection Algorithm
3.3 Event-Based Segment Detection
3.3.1 Activity of each pixel
3.3.2 Generation of Endpoint Events
3.3.3 Event-Based Segment Detection Algorithm
3.4.1 Controlled scene
3.4.2 Urban scenes
3.4.3 Computational time
4 Event-Based 3D Pose Estimation
4.2 Event-based 3D pose estimation
4.2.1 Problem formulation
4.2.2 Rotation formalisms
4.2.3 2D edge selection
4.2.4 3D matching
4.2.5 Rigid motion estimation
4.2.6 Global algorithm
4.3.3 2D matching using Gabor events
4.3.4 Fast spinning object
4.3.5 Degraded temporal resolution
4.3.6 Computational time
5 An Event-Based Solution to the Perspective-n-Point Problem
5.2 Event-based solution to the PnP problem
5.2.1 Problem Description
5.2.2 Rotation formalisms
5.2.3 Object-space collinearity error
5.2.4 Optimal Translation
5.2.6 Global algorithm
5.3.1 Synthetic scene
5.3.2 Real recordings
5.3.3 Computational time
6 Conclusion and Future Perspectives
6.2 Future Perspectives