Addressing the scenario of free-standing conversational groups, Haritaoglu and Flickner  proposed a monocular real-time computer vision system for identifying shopping groups. First, silhouettes are identified in the image. Groups are identified by analyzing distances between the persons waiting in a checkout line or service counter. People are grouped together as a shopping group by analyzing interbody distances. The system also monitors the cashier’s activities to determine when shopping transactions start and end. This is an ad-hoc system was implemented, without considering sociology background.
Bazzani  introduces person relationships based on what he calls Subjective View Frustum (SVF), which is a 3D geometric representation of the VFOA (i.e. the space a person is able to see). Each person has a SVF, he then analyses the intersections of all the SVF during a period of time in order to suggest possible social interactions. This method is prone to aﬀected considerably by head movements.
V´azquez et al.  used the tracking of lower body pose as an input for these algorithms. They present a distribution for every subject in a scene, mixing the functions and using the Hessian of these functions to localize the centers of groups. They use the strides of a person to calculate the mentioned functions in order to find an O-space. The use of value of fixed strides may be inconvenient with diﬀerent subjects in a scenario. Figure 2.5a depicts the contours of the function proposed with the calculation of new O-spaces. This work is included as dynamic because it relies on data obtained by a real time tracking system. However, their results are computed on a dataset which sample rate is 0.2 Hz.
Lau  clusters people with a multi-model hypothesis analysis. The perception system they use is based on a SICK laser, through a supervised learning algorithm they detect which points compose a person. However, the orientation of the person is noisy because it is obtained with the current and previous position of the person. Figure 2.5b shows a figure where a group splits into two groups. This work tracks and reasons about multiple social grouping hypotheses in a recursive way.
Navigating within Humans and Motion Planners
In robot navigation, navigation planners usually minimize time or distance to go from point A to point B. This minimization consists in the robot taking into account its geometry and constrains (i.e. walls). Widely known methods on this kind of navigation are by occupancy grid mapping  and potential fields . However, this is often not the case for social navigation, because we need to respect the private and social spaces of a person or group of people.
One of the earliest works about human aware robot navigation was developed by Tadokoro et al. . They use a grid to exemplify the steps that the robot shall take to go to the goal as it tries to maintain a high safety risk for the human. Some other pioneering work using Partially Observable Markov Decision Process (POMDP) are Foka and Trahanias  in which they use prediction of obstacles. Weak points about Foka’s work is that the POMDP they use is expensive and the environment they used was only simulated.
Another work by Arras et al.  where they developed a robot to perform exhibitions for the Swiss National Exhibition Expo-02. However, their focus is primarily the localization of the robot, and the navigation of the robot is made through waypoints stopping when the robot detects an obstacle. In more recent theory, we find the Human Aware Motion Planner (HAMP) developed in . They state that a social motion planner must not only provide safe robot paths, but also synthe-size socially acceptable and legible paths. HAMP is a general Human Robot Interaction frame-work that considers safety and comfort of people. Subsequently, an extension of this framework was developed by . In this framework human actions can be considered in order to help the robot to accomplish its goal, e.g. the person can move in order to let passage to the robot, in the opposite case, the robot may be force to take a long path or it may not be able to reach its goal. Analyzing further the HAMP methodology, the left side of Figure 2.6a presents a safety cost around a person. In a trajectory planner this cost represents the following: The closer a robot is to a person, the higher the cost to be in that position. Thus, the safety of a person is ensured by not making the robot go unreasonably near. They proposed also a visibility cost that can be used for an analogous purpose, but instead of avoiding passing near the person, it avoids passing to positions that the person is not able to see.
Naive Global Planner
The result of this IRL provides the rewards to the MDP, and by applying the optimal navigation policy in this MDP, the robot moves along the sequence of states which forms the optimal trajectory to approach a person. Each state (i.e. the cell in the representation described in the previous section) is represented by its center. As a result the trajectory is a discontinuous line as shown in green in Figure 3.8a. We hence need to smooth this trajectory taking into account the robot orientation and human orientation. Smoothing process is described next and the result is also shown in Figure 3.8a. These trajectories are the global plan, nonetheless they do not take into account other constraints such as obstacle avoidance.
Layered Costmap Navigation
After the learning process the w vector is set. One important aspect is that Φ(s, a) = Φ(s) and s is represented by spatial features. Thus, a costmap can be generated in the environment. The cost of some area around the person is calculated given a normalization of R(s) = wT Φ(s) for all the coordinates in the map. Thus, s must be translated to the polar coordinates of the human frame. Then, based on , the cost is passed to the upper layer for every field if the value is higher than the one already set in the upper layer. Then Dijkstra’s algorithm implementation is used to calculate the best path. The goal position of the planner is the position in which the maximum value of reward is found in the costmap. Finally the goal orientation to the direction the human.
Figure 3.9 shows a costmap result the weighted values of R(s), result of the application of IRL with the demonstrations given in Figure 3.7, this is feasible due to the representation of features as continuous functions. Even when we have discrete states, the values of the coordinate system is in R for distance and angle.
Proposed algorithms for group detection
In this work, we use the definition of gathering in public places provided by Goﬀman : a gathering consists of any set of two or more individuals in mutual presence at a given moment who are having some form of social interaction. We argue that this definition is particularly suitable when a robot has to perform group detection tasks, considering that a robot with on-board cameras and laser is able to perceive and recognize people based on state-of-the-art computer vision techniques.
Two algorithms were conceived and developed. The first one, the Link Method, relies on evaluating at each instant of time the graph of possible connections between the pairs of people on the scene. Time parameters are inspired by the Ebbinghaus’s forgetting curve . The novelty of this approach is to merge dynamic and static analysis for group detection. The second algorithm, the Interpersonal Synchrony Method, grounds on the hypothesis by Fiske  and Lakens . This hypothesis ascertains that interpersonal synchrony is as antecedent of entitativity, that is the degree to which a collection of people are perceived as a group (Campbell ). The following subsections detail the methods we propose.
This method is performed in three steps:
1. Static Analysis.- subdivided into Link Method Simple and Link Method Gauss, is inspired by proxemics and in which data about the people acting on the scene is processed.
2. Dynamic Analysis.- is inspired by Ebbinghaus’s forgetting curve.
3. Forming Groups from Pairs.- that allows to cluster people in groups.
Table of contents :
Acknowledgment and Dedications
Table of Contents
List of Figures
List of Tables
1.2 Summary of Contributions
1.3 Outline of the Dissertation
2 Related Works
2.1 Identification of Groups of People
2.1.1 Static Groups
2.1.2 Dynamic Groups
2.2 Social Navigation
2.2.1 Navigating within Humans and Motion Planners
2.2.2 Approaching Humans
3 Approaching One Person
3.1 Modeling Steps
3.1.1 MDP and IRL
3.1.3 Actions and Transitions
3.1.4 Feature Representation
3.2 Adapting IRL Results
3.2.1 Naive Global Planner
3.2.2 Layered Costmap Navigation
3.3 Experimental Results
3.3.1 Experimental Setup
4 Analysis of Groups of People
4.1 Proposed algorithms for group detection
4.1.1 Link Method
4.1.2 Interpersonal Synchrony Method
4.2 Experimental Evaluation
4.2.1 Data sets
5 Approaching Groups of People
5.1 Environment and Demonstration
5.2 Graph Representation
5.2.1 Computation of Vertices
5.2.2 Computation of Edges
5.3 Generalization of States and Rewards
5.3.1 Social Features
5.3.2 Non-social Features
6 Conclusions and Outlook
6.1.1 Main contributions
A R´esum´e (French)
A.1 ´Etat de l’art
A.1.1 Identification de groupes de personnes
A.1.2 La navigation sociale
A.2 Analyse de groupes
A.2.1 Mod`eles propos´es
A.3 Comment approcher une personne
A.4 Comment approcher un groupe de personnes
A.4.2 Mod`ele d’apprentissage
A.4.3 G´en´eralisation de l’´etat