Yumi industrial robot learning complex hierarchical tasks on a tangible interactive table

Get Complete Project Material File(s) Now! »

Multi-task learning by a hierarchical representation

Indeed, an essential component of autonomous, flexible and adaptive robots will be to exploit temporal abstractions, i.e. to treat complex tasks of extended duration, that is to treat complex tasks of extended duration (e.g. making a drawing) not as a single skill, but rather as a sequential combination of skills (e.g. grasping the pen, moving the pen to the initial position of the drawing, etc.) Such task decompositions drastically reduce the search space for planning and control, and are fundamental to making complex tasks amenable to learning. This idea can be traced back to the hypothesis posed in Elman, 1993 that the learning needs to be progressive and de-velop, starting small. It has been reintroduced as curriculum learning in Bengio et al., 2009, as formalised in terms of the order of the training dataset: the examples should not be randomly presented but organized in a meaningful order which illus-trates gradually more concepts, and gradually more complex ones. For multi-task learning in the reinforcement learning framework, it has been studied as hierarchical reinforcement learning as introduced in Barto and Mahadevan, 2003, relying on task decomposition or task hierarchy.
Indeed, the relationships between tasks in task hierarchy Forestier and Oudeyer (2016) and Reinhart (2017) have been successfully exploited for learning tool use or learning inverse models for parameterized motion primitives, allowing the robot to reuse previously learned tasks to build more complex ones. As opposed to clas-sical methods enabling robots to learn tool-use, as (Brown and Sammut, 2012) or (Schillaci, Hafner, and Lara, 2012), which consider tools as objects with affordances to learn using a symbolic representation, (Forestier and Oudeyer, 2016) does not ne-cessitate this formalism and learns tool-use using simply parametrized skills, lever-aging on a pre-defined task hierarchy. Barto, Konidaris, and Vigorito (2013) showed that building complex actions made of lower-level actions according to the task hi-erarchy can bootstrap exploration by reaching interesting outcomes more rapidly. Temporal abstraction has also proven to enhance the learning efficiency of a deep reinforcement learner in Kulkarni et al. (2016).
On a different approach (Arie et al., 2012) also showed composing primitive ac-tions through observation of a human teacher enables a robot to build sequences of actions in order to perform object manipulation tasks. This approach relies on neuro-science modelling of mirror neuron systems. From the computational neuroscience point of view for sequence-learning task with trial-and- error, Hikosaka et al. (1999) suggested that procedural learning proceeds as a gradual transition from a spatial sequence to a motor, based on observations that the brain uses two parallel learn-ing processes to learn action sequences: spatial sequence (goal-oriented, task space) mechanism and motor sequence (action space) mechanism. Each of the acquired motor sequences can also be used as an element of a more complex sequence.

Strategic Intrinsically Motivated learner

To tackle this problem of learning to reach fields of outcomes using sequences of motor actions, I consider the family of strategic learning algorithm. These strategic learners propose active learning architecture able to decide when, what and how to learn at any given time. When to learn refers to the time at which the agent will learn, what to learn refers to the outcomes to focus on, and how to learn refers to the method used to learn reaching that outcome called strategy. More particularly, I focused on the branch of intrinsically motivated algorithms that started by the Self-Adaptive Goal Generation – Robust Intelligent Adaptive Curiosity (SAGG-RIAC) algorithm.

SAGG-RIAC

This algorithm presented in Baranes and Oudeyer, 2010 focuses on the problem to help a learning agent to decide what outcome to focus on at any given time. It learns by episodes, where a goal outcome is generated based on the competence improve-ment recorded during the learning process. This goal outcome tends to be generated in areas where this competence improvement or progress is maximal. Then the al-gorithm performs the autonomous exploration of the action space which generates an action to reach that goal, based on its inverse model.
This algorithm was successfully used on high-dimensional robots that learned reaching whole outcome spaces, using primitive actions. It was also extended by the Socially Guided Intrinsic Motivation architecture (SGIM), to add new imitation strategies that the robot can use to bootstrap its learning process thanks to demon-strations provided by human teachers.

Tackling the experiment of the rotating robot arm draw-ing

If we test the SGIM-SAHT architecture on the experimental setup of the rotating robot arm drawing introduced in section 2.2, we can expect to see interesting pat-terns in the results showing its capabilities. This architecture shall learn faster and more broadly than simplier strategic algorithms like SAGG-RIAC or SGIM-ACTS.
If provided with different teachers and autonomous strategies, it shall be able to identify the most adapted combinations of strategies and tasks that optimize its learning accuracy and speed. It shall also be able to organize its learning process, so as to learn the easiest tasks first (in this case Ω0), then tackling more and more complex tasks (Ω1 and finally Ω 2) when maturing. It shall rely more heavily on interactive strategies early on to benefit from their observed bootstrapping effect on learning, while relying more on autonomous strategies on the long run. The evolution of the learner strategical decisions of what task and which strategy to use shall also be greatly influenced by the task hierarchy. SGIM-SAHT shall be able to discover and use the task hierarchy wisely to learn faster. For example, using the task hierarchy would be useless to learn to move its tip (Ω0), as it corresponds to the simplest task of the setup. Also, this task being the base of the task hierarchy, its learning shall be prioritary for the robot. However, hopefully, when desiring to move the pen (Ω1), the robot shall use its skill at moving its arm’s tip, as an intermediary to more easily learn, this being possible only when maturing the learning of the tip motion task (Ω0). Finally when able to move the pen reliably, making drawings (Ω2) shall be decomposed into a first skill corresponding to moving the pen position towards the first position of the drawing, then a second one moving the arm’s tip towards the last position of the drawing. It is to note that the robot might also decide to combine two displacements of the pen as skills to do this same drawing, although it might lead to a less efficient learning. Indeed, after the first displacement of the pen, this latter is still in the robot’s grasp, so it won’t need to grab it again to move it afterwards, a simple arm motion will suffice. This could lead to two potential problems: the former is that when chaining two skills of pen displacement together the second one might have an unnecessary hovering by the initial pose of the pen, leading to more complex actions than needed, the latter is that the robot will necessary have learned more skills to displace its tip than to move the pen, so it would be able to tune a arm’s tip motion to reach the final drawing position more accurately than a pen’s one. This is an example of a potential suboptimal use of the task hierarchy by the learner. More extreme ones could be to use drawing skills to move the arm’s tip.
Finally, SGIM-SAHT, while chaining multiple primitive actions together to move the pen or make drawings, shall be able to limit the complexity of this action se-quences. Indeed, it would be suboptimal to use more complex actions than prim-itives to move the arm’s tip around. Moving the pen’s position shall be possible using primitives also or at least less than 2 primitives. Indeed, using the task hierar-chy by reusing its skills in moving the arm’s tip could ease the learning of the pen’s motion one, at the result of more complex actions of 2 primitive actions. However, making a drawing shall not require more than the number of primitives used for moving the pen plus an additional one. The tradeoff between accuracy of the skills built and their efficiency in terms of number of actions chained can be tune by the γ parameter.

Table of contents :

Acknowledgements
1 Life-long learning of hierarchical tasks using sequences of motor primitives
1.1 Life-long learning problem
1.2 Learning methods
1.2.1 Learning motor action sequences
1.2.2 Multi-task learning by a hierarchical representation
1.2.3 Active motor learning in high-dimensional spaces
Intrinsic motivation
Social guidance
Strategic learning
2 A strategic intrinsically motivated architecture for life-long learning
2.1 Formalization of the problem
2.2 Example of experimental setup: rotating robotic arm drawing
2.3 Strategic Intrinsically Motivated learner
2.3.1 SAGG-RIAC
2.3.2 SGIM-ACTS
2.3.3 Extension to complex tasks
2.4 Socially Guided Intrinsic Motivation for Sequence of Actions through
Hierarchical Tasks
2.5 Tackling the experiment of the rotating robot arm drawing
2.6 Conclusion
3 Poppy humanoid robot learning inter-related tasks on a tactile tablet
3.1 SGIM-ACTSCL
3.1.1 Strategies
Mimicry of an action teacher
Autonomous exploration of the primitive action space
3.1.2 Interest Mapping
3.2 Experiment
Description of environment
Dynamic Movement Primitives
Action space
Observable spaces
Task spaces
3.2.1 The teacher
3.2.2 Evaluation Protocol
Evaluation method
Compared algorithms
3.2.3 Results
Evaluation performance
Learning process organization
3.2.4 Conclusions
4 Using the task hierarchy to form sequences of motor actions
4.1 Experimental setup
4.1.1 Environment
4.1.2 Formalization of tasks and actions
Action spaces
Outcome subspaces
4.2 Procedures framework
4.3 Intrinsically Motivated Procedure Babbling
4.3.1 Strategies
Autonomous exploration of the action space
Autonomous exploration of the procedure space
4.3.2 Overview
4.3.3 Experiment
Evaluation Method
4.3.4 Results
Evaluation performance
Lengths of action sequences used
4.3.5 Conclusion
4.4 Socially Guided Intrinsic Motivation with Procedure Babbling
4.4.1 Interactive strategies
Action teachers
Procedural teachers
4.4.2 Algorithm overview
4.4.3 Experiment
Teachers
Evaluation method
4.4.4 Results
Distance to goals
Analysis of the sampling strategy chosen for each goal
Length of the sequence of primitive actions
4.4.5 Conclusion
5 Yumi industrial robot learning complex hierarchical tasks on a tangible interactive table
5.1 Simulated experiment
5.1.1 Setup
5.1.2 Experiment variables
Action spaces
Task spaces
5.1.3 The teachers
5.1.4 Evaluation method
5.1.5 Results
Evaluation performance
Analysis of the sampling strategy chosen for each goal
Length of the sequence of primitive actions
5.1.6 Conclusion
5.2 Physical experimental setup
5.2.1 Description of the environment
5.2.2 Formalization of tasks and actions
5.2.3 Teachers
5.2.4 Evaluation method
5.2.5 Results
Evaluation performance
Analysis of the sampling strategy chosen for each goal
Length of actions chosen and task hierarchy discovered
5.2.6 Conclusion
5.3 Transfer learning
5.3.1 Experimental setup
5.3.2 Definition of the problem
5.3.3 Transfer Learning in SGIM-PB
5.3.4 Teachers
5.3.5 Evaluation method
5.3.6 Results
5.3.7 Conclusion
6 Conclusion
6.1 Conclusion of the manuscript
6.2 Conclusions and limitations
6.2.1 Conclusions of the approach
6.2.2 Limitations of the approach
6.2.3 Perspectives
6.3 Contributions
6.4 Takeaway message
6.5 Impact
6.6 Papers
Bibliography