User Study on Human Teaching Behavior Towards Robots in a Sensorimotor Task

Get Complete Project Material File(s) Now! »

Social- and Task Channel tutor learner social channel task channel provider/ provider/ recipient recipient

According to the Shannon-Weaver model (see Section 2.2), the channel is an essential part of communication. While the channel definition of the Shannon-Weaver model is rather technical, it is also useful in a social context. Thus, it is useful to have a closer look onto the properties of the available channels in a human robot interaction, in particular in a learning setting. In this section the focus is more on a theoretical point of view of the channels.
Sigaud et al. (2021) conceptualize social learning processes (Bandura and McClelland, 1977), more specifically interactions between a tutor and a learner as a mutual exchange process using two main communication channels. These two channels are the social channel and the task channel as shown in Interaction from the task channel includes recognizing pedagogical signals (see Section 4.2), learning from demonstration (see Section 3.5), observational learning (Varni et al., 1979; Meltzoff, 1999; Burke et al., 2010) and inference from indirect goal-related signals (Bobu et al., 2020; Reddy et al., 2021). Except for tasks that explicitly include speech these signals are nonverbal.
Interaction from the social channel include feedback, instructions (see Sec-tion 3.3) and gaze (Nomikou et al., 2016; Fournier, Sigaud, and Chetouani, 2017).
Even if it is not necessarily called a channel, the idea of relying on task- as well as on social dimensions in an interaction is present in other work too. For example in Castellano, Pereira, et al. (2009) children play chess with a robot companion called iCat. The user engagement is detected using task features (e.g. the game state) as well as social features (e.g. the user smiling or looking at the iCat and the iCat displaying aﬀective reactions). Similarly, in Leclère et al. (2016) task- and social features are used to distinguish, in the context of a mother-infant interaction, dyads that are at high-risk of showing neglect from dyads that are at low risk. In Ivaldi et al. (2014) related ideas are used in an interaction of a human with an iCub, a humanoid robot. In this work a social cue (gaze) is combined with task information (color of an object) to teach the color of objects to the robot. The participants indicated that they would like to see improved behaviors even if they were not task related.
A concept that explicitly combines pedagogical intentions on the task channel is SMC (see Section 2.6). Interaction from the social channel includes learning from feedback, instructions, joint attention and engagement. These signals can be verbal, i.e. feedback, or non-verbal like joint attention by gaze following. While research on robot learning usually uses these channels, research explicitly on the channel usage is rather sparse.
While not explicitly calling it a channel, Ho, Littman, Cushman, et al. (2015) and Ho, Cushman, et al. (2019) provide research on how people use evaluative feedback, namely rewards and punishment. They investigate if evaluative feedback should be interpreted as reinforcement to shape the learner or as communication to signal to the learner to reason about the tutor’s pedagogical goals. They come to the conclusion that people have a strong bias to use evaluative feedback as communication rather than as reinforcement.
Interesting insides on how people use the available channel for giving feedback to robots can be found in Thomaz and Breazeal (2008) and Thomaz and Breazeal (2006b). These works introduce the Sophie’s kitchen framework, where people were asked to teach a reinforcement agent how to bake a cake. The works show that people are using the reward channel not only for rewards, but also for future directed guidance. The introduction of an explicit guidance channel speeded up the learning.
We see that social- and task dimensions play an important role in HRI research and we integrate this idea in our work in form of the task channel, social channel, and combined task and social channel (see Section 6).

Ostensive-Inferential Communication

As already explained in Section 2.2, the code model fails to account for the full range of human communication. The main defect of the code model is its descriptive inadequacy: there is more to communication than coding and decoding.
A good example for that is language. The same sentence can express a variety of thoughts, depending on the context and the relation between the communication partners. Comprehension of what is being said requires more than just the decoding of a linguistic signal. Take for example the utterance « Do you know what time it is? » could be a genuine question to get to know what time it is. The utterance could likewise be used to express that the speaker is quite annoyed about being called very late in the night. Thus, the meaning of the utterance depends on the context.
Grice (1957) provides an analysis that can be used as starting point for an inferential model of communication (Sperber and Wilson, 1995): « [S] meant something by x’ is (roughly) equivalent to ’[S] intended the utterance of x to produce some eﬀect in an audience by means of recognition of this intention ».
Thus, the communication succeeds when the hearer not only infers the linguistic meaning, but also what the speaker wants to convey.
While this approach has been criticized (e.g. Searle, 1969) that understanding intention can be just included in the decoding step, this critic misses an important point: communication can happen without any code (Sperber and Wilson, 1995). For example if Mary asks Bob how he is doing and he shows her his packet of painkillers in response. The conveyed message is that he is in pain, without the explicit presence of a code. Thus, the concept of inferential communication can be used for cases the code model can not account for. Consequently, the two models complement each other.
Sperber and Wilson (1995) extend the concept of inferential communica-tion to ostensive-inferential communication. They put forward that ostension provides two layers of information: the informative intention and the commu-nicative intention.
The informative intention is the information itself that has been pointed out. The communicative intention is to mutually manifest that the communicator has the informative intention. Thus, the communicative intention can be seen as « meta » intention. While in some cases the informative intention could be recognized without recognizing the communicative intention, in general failing to recognize the communicative intention might lead to missing relevant intention. In the next section we will turn to SMC, a certain type of ostensive-inferential communication that we consider promising for HRI and we investigate further in Chapter 7.

Robots as Embodied Agents

In this section we have a closer look on characteristics that define a robot, because these characteristics are important when addressing the question how we can enable robots to learn. Robots can be considered as a special type of agents. An agent is an identity that is capable of making decisions. While theoretically these decisions could be random, in most cases these decisions will somehow be based on information gathered by or provided to the agent. Rational agents will try achieve the best outcome based on some objectives. While there might be useful applications of non-rational agents, we will consider agents as rational.
Following this definition robots are certain type of agent – robots are embodied agents. Technically a robot could be considered as a (software) agent reading and processing information coming from sensors and controlling certain hardware. However, we will consider all these parts together as integral parts making up one robot identity. In this sense, a robot is more than just an agent that comes with its own problems and advantages.
The most obvious and striking diﬀerence between a robot and a virtual agent is the fact that the robot can interact with the real world. The capability of interacting and manipulating objects in the environments has been used to learn diﬀerent interesting tasks such as locomotion (Jun Nakanishi et al., 2004), the game « ball in a cup » (Kober, Mohler, and Peters, 2008) and table tennis (Muelling, Kober, and Peters, 2010)). The capability to interact with the real world makes robots well suited to automate tiring or even dangerous tasks that otherwise would need to be executed manually by humans. Thus, these kinds of robots are widely deployed in industry.
However, robots can not only be used for automation, but also in a social context. Thus, research on social robots, robots that are able to communicate and engage in social interactions with humans has recently got more attention (Fong, Nourbakhsh, and Dautenhahn, 2003; Bütepage and Kragic, 2017; Dautenhahn, 2007). While most striking, the capability to physically interact is not the only advantage of using robots over virtual agents. Already the physical presence of a robot can yield its advantages. The work of Leyzberg et al. (2012) shows that the use of a robot increased learning gains for a human learner in comparison with a pure virtual system. Furthermore, a review on social robots for education (Belpaeme et al., 2018) identifies three advantages of robots over virtual systems. The first two already mentioned advantages are the capability to interact with the real world and increased learning gains for the human learner. The third advantage is that users show more social behavior beneficial for learning.
While using robots over virtual systems comes with advantages, it also comes with its own challenges and problems. These problems can be of two diﬀerent types. The first type contains problems that directly concern the hardware. The second type contains restrictions on the software. These restrictions derive from the fact that hardware is used, but does not concern the hardware directly.
One considerable aspect concerning the hardware directly is the financial aspect: robotic systems are usually considerably more expensive than virtual systems. Not only the acquisition cost are higher, but also maintenance, since robots are exposed to wear and tear. They can break and malfunction for mechanical reasons, and unfortunately they often do in inappropriate moments. Even if they function properly, conducting robot experiments is time consuming. Somebody has to be around to ensure a smooth execution and verify that nothing goes wrong. Ensuring that multiple experiments in a row have the exact same conditions is diﬃcult, even more so running multiple experiments in parallel. Furthermore, depending on the robot, malfunctions can be physically dangerous to humans interacting with or operating the robot.
On the software side we have the problem that typical assumptions that are often made in machine learning do not hold in robotics. Usually, it can neither be assumed that the true state is fully observable nor that the data is noise free.
Also the high-dimensional continuous state and action space is rather large (Kober, Bagnell, and Peters, 2013). While it is possible to simulate the robot, it is quite unrealistic that the robot will match this behavior in the real world, as a consequence, the algorithms that are being used need to be robust with respect to models not capturing all details of the real system correctly (Kober, Bagnell, and Peters, 2013).

READ RHYTHM PERCEPTION BY COCHLEAR IMPLANTEES IN CONDITIONS OF VARYING PITCH

Overview of Approaches to Robot Learning

Reward Evaluative Corrective Guidance Instruction Demonstration function Feedback Feedback autonomous exploration human control Najar and Chetouani (2021).
After having a better understanding of the characteristics of a robot, we now turn to an overview of commonly used approaches to enable robots to learn. These approaches can be located on the exploration-control spectrum (Najar and Chetouani, 2021; Breazeal and Thomaz, 2008) as shown in Fig. 3.1. On the left side of the spectrum we find approaches where the agent learns autonomously like RL (Sutton and Barto, 1998). RL provides a mathematical framework to implement the idea of trial-and-error learning that has a broad corpus of research, particularly in robotics (Kober, Bagnell, and Peters, 2013). Classical reinforcement learning relies purely on the agent to explore the eﬀects of its action on the environment. On this side of the spectrum the agent has a high autonomy and learns by itself.
When moving towards the right of the spectrum, the control influence of the human on the learning process increases. Coming from classical reinforcement learning we move to approaches that integrate feedback that the agent receives on taken action from a human tutor. These approaches are often combined with RL. However, how to integrate the feedback into the learning algorithms needs research on its own (e.g. Knox and Stone, 2012b; Li et al., 2019).
If we move further on the spectrum, we find guidance and instruction. These approaches limit the set of possible actions or suggest optimal actions (Thomaz and Breazeal, 2006a). On the right corner of the spectrum we find the idea of demonstrations. This idea is implemented with the LfD framework (Argall et al., 2009; Calinon, 2019). The LfD approach is a commonly applied approach for robots learning new skills from humans, where the human demonstrator demonstrates how to solve a certain task to the robot. The robot learns from these demonstrations how to solve this particular task.
Except for classical reinforcement learning, all approaches on the spectrum can be counted toward interactive learning methods. In interactive learning approaches the teaching signals to an agent can be achieved via a variety of teaching channels like natural language (Paléologue et al., 2018; Cruz et al., 2015; Kuhlmann et al., 2004), computer vision (Atkeson and Schaal, 1997; Najar, Sigaud, and Chetouani, 2019, computer code (Maclin et al., 2005; Torrey et al., 2006, artificial interfaces (Abbeel, Coates, and Ng, 2010; Suay and Chernova, 2011; Knox, Stone, and Breazeal, 2013) or physical interaction (Akgun et al., 2012). Najar and Chetouani (2021) identify two main categories of teaching signals based on how they are produced: advice and demonstration. While these teaching signals could use the same channel, they are fundamentally diﬀerent as the demonstration requires task execution and advice does not. In other words, demonstrations rely mainly (if not exclusively) on the task channel characteristics of the communication channel, while advice relies mainly on social channel characteristics (see Section 2.4).
Furthermore, Najar and Chetouani (2021) define advice as: « teaching signals that can be communicated by the teacher to the learning system without executing the task ». Based on these considerations Najar and Chetouani, 2021 propose the following taxonomy of advice:
• General advice can be used to provide prior information on the task before the learning starts. It can be split into general constraints and general instructions.
• General constraints include information about the task such as domain concepts, behavioral constraints and performance heuristics.
• General instructions explicitly specify what actions to perform. It can either be provided in form of if-then rules or as detailed action plans.
• Contextual advice is provided during the task. It is dependent on the current state of the teacher-agent setting. It can be split into guidance and feedback.
• Guidance informs about future actions. In the most specific sense, it aims at limiting the set of all possible actions to a sub-set that is favored by the teacher.
• Contextual instructions are a particular type of guidance where only one action is suggested by the teacher.
• Feedback informs about past actions taken by the agent. It can be split into corrective and evaluative feedback.
• Corrective feedback can consist of either a corrective instruction or a corrective demonstration.

Table of contents :

List of Figures
List of Tables
List of Acronyms
I Introduction
1 Introduction
1.1 Motivations
1.2 Research Approach
1.3 Thesis Outline
1.4 Contributions
1.5 Publications
1.6 The Animatas Project
II Background and Related Work
2 Cognition and Communication
2.1 Introduction
2.2 The Code Model
2.3 Theory of Mind
2.4 Social- and Task Channel
2.5 Ostensive-Inferential Communication
2.6 Sensorimotor Communication
3 Approaches to Robot Learning
3.1 Introduction
3.2 Robots as Embodied Agents
3.3 Overview of Approaches to Robot Learning
3.4 Reinforcement Learning
3.5 Learning from Demonstration
4 Teaching Machines and Robots
4.1 Introduction
4.2 Pedagogy
4.3 Machine Teaching
4.4 Humans Teaching Robots
5 Observer Related Metrics
5.1 Introduction
5.2 Legibility
5.3 Predictability
III Implementation of Research
6 Communication Model
6.1 Introduction
6.2 General Communication Model
6.3 Specific Approach
6.3.1 Specific Model
6.3.2 Model Application to Implemented Research
7 User Study on Human Teaching Behavior Towards Robots in a Sensorimotor Task
7.1 Introduction
7.2 Study
7.2.1 Overview
7.2.2 Experiment 1
7.2.3 Experiment 2
7.3 Conclusion
8 Augmenting RL with Social Channel Usage
8.1 Introduction
8.2 Integrating Observer Feedback on Legibility into Interactive RL
8.2.1 Interactive RL
8.2.2 Legibility
8.2.3 Modeling the Observer
8.3 Experiments
8.3.1 Environment 1
8.3.2 Environments 2 – 5
8.4 Discussion
8.5 Conclusion
9 Discussion and Conclusion
9.1 Summary of Contributions
9.2 General Limitations of the Approach
9.3 Perspectives
9.4 Conclusion
Bibliography