The EXPO-sitting coding scheme leads to annotating separately the positions of the different body parts. Thus each posture annotations contain descriptions concerning only one aspect of the body posture. For instance, the trunk is leaning forward, or the left arm is rising at the level of shoulders. The posture observation is thus represented as a vector of monotonic posture annotations performed by different parts of the body.
We were willing to investigate the most common whole-body postures that the subjects take up, so called posture type. The posture type would refer to a typical posture generalized based on the posture observations.
We exported 19183 frames from Anvil for the posture annotations of 15 minutes of only one subject (CM). Then, we converted these 19183 instances into a set of significant postures according to the two criteria below:
Exclude the frames in which there is no annotation in any track.
Exclude the successive frames that contain exactly the same annotations.
These criteria are inspired by those of the Posture Scoring System (Bull 1987). The author emphasized that if the speaker moves from one posture without establishing a different posture and then returns to the original posture, the time spent moving should be excluded from the total time length of that posture. For that reason, we excluded the frame in which there was any movement in any body parts.
The extracted data contained a matrix of 89 frames by 25 features. The 25 features were composed of 24 attributes from the EXPO coding scheme and one feature of posture type that describe the recurrent posture to be found.
A scheme for coding gesture space2
The focus of this thesis is not only on posture but also on how the space is used during bodily interaction. We did not find any multimodal corpus approach defining and applying a scheme for manually annotating the gesture space during conversations. We thus defined our own scheme for coding the gesture space based on the textual and graphical descriptions of the gesture space proposed by (McNeill and Duncan 2000). The gesture space is divided in four regions (center-center, center, periphery and extreme periphery) and eleven coordinates (no coordinate, right, left, left-and-right (both hands), upper right, upper left, lower right, lower left, upper, lower, upper left-right, lower left-right). We used McNeill‟s diagram for defining our coding scheme (Figure 14). These attributes were annotated independently for the two hands. The left-and-right coordinate is used to code a gesture produced with both hands.
Thesis contribution: Estimating postural convergence during dyadic conversations
The section on related work explained why the phenomenon of convergence might be of interest for the study of postures and space. Yet, we did not find any adequate corpus and coding schemes to enable such studies. This section explains how we exploited the CID video corpus and how we defined coding schemes that we applied for studying postural convergence.
A formal definition of postural convergence
As explained previously, the number of postural segments and their durations are similar for the two subjects AB and CM. We were willing to study whether this might be explained by a phenomenon of postural convergence between the two subjects.
During a preliminary observation of the video, we found several cases during which the two subjects display similar postures, as illustrated in Figure 12. In the first image (top left), the speaker raises her left arm and put it on the top of her head. Half a second later, as shown in the second image (top right), the listener is preparing to put her right arm on her head, and this position is maintained for more than one second. This is an illustration of what we call a postural convergence.
Relations between the speaker / listener role and the postures
Our definition of postural converge requires to have some information about the speaking role of each subject (e.g. speaker or listener). Such annotations will enable us to investigate whether convergence occurs in the postural expressions taken up by two subjects during a conversation. Thus, we annotated one hour of the CID corpus in terms of speaker / listener roles to investigate the convergence between both. The speaker / listener role is based on the concept of floor, which refers to the right of one member to speak in preference to other members (Bavelas, Coates et al. 2000). We focus only on this notion of speaking role and do not consider more sophisticated functions such as backchannels (the verbal and nonverbal listener responses) or turn-taking (the turn construction and the turn allocation) (Sacks, Schegloff et al. 1974; Allwood 1999). Our hypothesis here is that the speaker /listener role are already enough to investigate postural convergence. Thus, the backchannel behaviors occurring during a listening turn were not coded in the annotations that we describe below (i.e. they were included in a segment annotated as listener).
Table of contents :
Table of contents
I.1 Related work and limitations of current systems
I.2 Thesis scope and objectives
I.4 Collaboration projects
I.5 Outline of the thesis
II. Coding and analyzing static postures of humans
II.1 Related work
II.2 Thesis contribution: the EXPO-sitting scheme for coding whole body static postures during seated dyadic conversations
II.3 Thesis contribution: Estimating postural convergence during dyadic conversations
II.4 Thesis contribution: the EXPO-standing scheme for coding standing leg postures .
III. Bodily expressions in virtual characters: Application to Affective Interaction
III.1 Related Work
III.2 Thesis Contribution: Designing static postural expressions of action tendencies
III.3 Thesis Contribution: Evaluating the static postural expressions of action tendencies
III.4 Thesis Contribution: Designing dynamic postural expressions of action tendencies
III.5 Thesis contribution: Evaluating the dynamic postural expressions of action tendencies
IV. Bodily Expressions in Virtual Characters: Application to Ambient Interaction .
IV.1 Related work
IV.2 Designing and evaluating a location-aware virtual character in Ambient Interaction
V. Conclusion and Future Directions
V.1 Research contributions
V.2 Practical contributions
V.3 Future directions