Humanoid robot locomotion control
Bipedal locomotion is a complex task which control needs to take into account multiple restrictions and performance criteria. Locomotion controllers in biped robots often face the challenges of having to adapt themselves to the dynamics of the locomotion, the reality of the environment surrounding the robot, and the user requirements for the task. Some controllers are able to take care of some of these aspects to a greater or less degree, but not all of them in a general way.
The goal of this work is to build a framework that can be used with any humanoid locomotion controller with open parameters (i.e., parameters whose values are not set) and that can be used to optimize them towards user requirements, as well as to adapt them to changes related to the locomotion process, such has changes in the environment, or in the robot itself.
This chapter outlines the basic characteristics of humanoid locomotion in order to give context to the most common approaches to its control. In discussing these, we will highlight how the existence of open parameters in these controllers, and many others, leads to the possibility of tuning these parameters for better outcomes in an optimiza-tion process. Afterwards, comes a discussion of some approaches to that process, and the chapter nishes with a discussion of adaptation to changes in the locomotion task that are external to the controller (more commonly, terrain changes).
Humanoid robots locomotion
There are advantages to an humanoid robot mimicking the human body in appearance and functionality. Humanoid robots can more easily work in environments and with equipments designed for humans (Kajita et al., 2014), and there is increased empathy and understanding for closer interaction between humans and robots (Meng and Lee, 2009). Moreover, there is the postulation that natural structures and processes were evolved towards being closer to optimal in the context of what is possible with a given structure (Alexander, 1996), and therefore desirable to replicate.
Figure 2.1: Abstract link-segment of a biped locomotion system. Approximations to the system’s Center of mass of a robotic system (CoM), and its projection on the ground (PCoM) are also represented.
Fully understanding and replicating a walking motion in a stable and optimal fash-ion is a challenge for people in multiple elds, spanning biology, physiology, medicine, mathematics, and engineering, and its optimization is subject to multiple criteria (Vukobratovic et al., 1990). Mechanical complexity observed in biological systems is usually replicated in robots by individually fabricated parts, which are costly in mass-scale production. Humans share this mechanical complexity, along with biolog-ical energy storage systems that can achieve torque, response times, and conversion e ciency that exceed the man-made robotic systems with a similar scale (Siegwart, Nourbakhsh, and Scaramuzza, 2011). These production related problems are accentu-ated by the nature of an underactuated system, and the associated nonlinear multibody dynamics. As such, humanoid locomotion is still unable to match the robustness and nesse of biological systems.
Because of the challenges in replicating the locomotion of humans, the design of the humanoid and its control system is usually done with resort to accurate kinematic and dynamic models. These are usually based on a link-segment model in which parts of the body are represented as rigid segments of constant length, with their mass concentrated at their CoM, and are linked by one or more hinge joints. A basic representation of a humanoid mechanism of this type can be seen in Figure 2.1. Humanoid locomotion systems have a high number of DoFs, which by itself causes complex dynamics and coordinate frame handling. The lack of xed frames of reference causes these to be underactuated systems where their interactions with the environment lead to a con-version of internal joint forces to external reaction forces (Vukobratovic et al., 1990). This external reaction forces are observed in the contact between the feet and the ground, which is essential for walking since it can cause the body’s relative position to be changed (Vukobratovic and Borovac, 2004).
When walking, a humanoid comes in contact with the ground at multiple points, and these are broken and then recovered in order to generate movement (Wieber, 2002). This brings versatility to overcome obstacles, but also instability that requires the robot to be stabilized. The robot can collect information to use in that stabilization, using sensors that provide feedback about the robot itself, such as gyroscopes and accelerometers to nd the position and the orientation of the robot, and force/torque sensors to measure the contact forces and torques between the feet and the oor (Kajita et al., 2014), or sensors that provide feedback of the robot environment, like vision or sound sensors.
Characteristics of humanoid locomotio
Humans adjust the locomotion pattern for the limbs | the gait | to adapt to di er-ent walking motions under di erent circumstances, such as changes in terrain, and to desired behaviors, including, but not limited to, a target walking velocity (Alexander, 1984). When describing the gait, a stride is a cyclical movement which is usually di-vided into the stance and swing phases. In the stance phase the foot is on the ground and can be used as a pivot to push the body forward, while maintaining an upright posture. In the swing phase the foot comes o the ground to enable the contrary leg to advance and its foot to enter its stance phase. There is a period between right after the heel strike of one of the feet and the toe o of the contrary foot where both feet are on the ground, consisting in a double support phase (Morecki, 1997). A visualization of these phases can be seen in Figure 2.2. In the double support phase the locomotion mechanism is a closed kinematic chain, because both ends of the chain (the feet) are in contact with the ground, while in the single support phase one of the ends has no anchor, making it a open kinematic chain (Vukobratovic and Borovac, 2004).
The human gait can be quanti ed with the stride (or step) frequency (the number of strides taken in unit time), the stride length (the distance traveled in a stride), the duty factor of a foot (the fraction of time for which the foot is on the ground), and the relative phase of a foot (the stage of the stride at which the heel strike occurs).
When a humanoid robot is in single support phase, its dynamics can be largely represented by an inverted pendulum connecting the support foot to the CoM of the robot (Kajita et al., 2001), referred to as the linear inverted pendulum model (LIMP), and represented in Figure 2.3. In this analogy, the kinetic energy of a sti inverted pendulum being traded in potential energy, and sequentially back into kinetic energy, is akin to how energy is expended and transformed during a step (Kuo, Donelan, and Ruina, 2005).
Statically and dynamically balanced locomotion
Balance in bipedal locomotion can be divided in static balance and dynamic balance. Static balance is normally characterized by small velocities and accelerations, and is achieved if the system’s projection of the CoM on the ground is kept in the supporting area of the feet. During static balance, the projection of the CoM in the ground corresponds to the center of pressure (CoP) | the point on the ground where all the ground reaction forces of the system act. Static balance is so referred to because the robot can stop at any instant and keep its balance. Dynamic balance is only maintained with continuous movement, and stopping without taking into account the body’s equilibrium can result in a high risk of falling. In dynamic balance the projection of the CoM can be placed outside the supporting area of the feet, as long as the CoP of the mechanism is kept inside it (Nwokah and Hurmuzlu, 2002). The supporting area of the feet is the support polygon formed by the feet in contact with the ground, which is a convex hull of the supporting points, as exempli ed in Figure 2.4.
The zero moment point (ZMP) is a concept introduced in the context of preserving the dynamic balance in legged locomotion (Vukobratovic and Borovac, 2004), and which is related to the CoP. The ZMP is termed as \the point where the in uence of all forces acting on the mechanism can be replaced by one single force » (Vukobratovic et al., 1990; Vukobratovic and Borovac, 2004). In the single support phase, the dynamic reaction force and moment produced at the contact of the foot with the ground exist as a result of the moment and forces produced by the rest of the mechanism. If the point where this reaction occurs does not produce any moment in the horizontal direction, the body will not rotate around it, something which could cause the robot to fall. This is equivalent to the point where the total horizontal inertia and gravity forces equal zero. For these reasons, the point was called the zero moment point. If the force acting at the CoP balances all forces acting on the mechanism in motion that point is also the ZMP, which means the CoP and the ZMP coincide on a dynamically balanced gait. When the balance does not exist, the ZMP also does not exist (i.e. the CoP is outside the support area) and the mechanism collapses around the foot edge. This relationship is illustrated in Figure 2.5.
The ZMP criterion, in its original state, is restricted to locomotion in at ground, and unbounded tangential friction forces between the feet and the ground, which trans-lates to a at friction cone (Dai and Tedrake, 2016).
Biped locomotion control strategies
The problem of the control of a biped robot can be de ned as choosing the proper inputs to its joints such that the system behaves in the desired fashion. One way to separate approaches to this problem is relative to the amount of information they use regarding the dynamic model of the system. One category of approaches uses precise information regarding dynamic information about the system, including the mass, the location of center of mass, and the inertia of each segment, as well as the whole kinematic structure, which details how each segment is connected by a number of joints to other segments, forming a tree. The other category of this division uses little or no information about the system, such as using only its center of mass, or total momentum (Kajita et al., 2001).
Sections 2.2.1 and 2.2.2 refer to the most prevailing control strategies in these two categories of approaches, and highlight their general drawbacks regarding open parameters and consequent optimization towards certain goals. Later sections analyze works that tackle these optimization possibilities.
Control based on dynamic models
This type of control takes into account the dynamics of the robot, and its interactions with its environment during locomotion. In this, the robot is considered a multi-body system modeled with rigid bodies, and applies bounded torque on each joint. It follows Euler-Lagrange motion equations (Salini, Padois, and Bidaud, 2011), M(q)q + N(q; q)q = g(q) + J (q)> ; (2.1) that relates the joint positions q, velocities q, and accelerations q, to the mass matrix M(q), the nonlinear e ects matrix N(q; q), the gravity forces vector g(q) and the gen-eralized wrench Jacobian J (q), which describes how external forces a ect the system in motion. is called the action variable, and composed by the vector of contact forces, and the vector of torque inputs, = [w>c; >]. The motion equation can be used to optimize the system towards the torque inputs, the joint accelerations, or the contact forces (e.g. minimizing the accelerations).
A well-known example of control that uses a dynamic model of the controlled system is Kajita’s preview control (Kajita et al., 2003). It uses a simpli ed version of the locomotion’s dynamics, and not the full whole body dynamics of the system. In preview control the goal is to have an output that tracks a reference signal, in this case a trajectory for the ZMP of the system. In this work, they rst started by modeling the dynamics of the locomotion of a biped robot as a 3D linear inverted pendulum. This model’s purpose is to o er preview of the dynamics of the system that is accurate enough to enable locomotion, speci cally the position of the CoM given the position of the ZMP, while being simple enough to be reduced to a handful of computationally inexpensive calculations. The system then generates a CoM trajectory such that the acceleration of the CoM throughout the locomotion is minimized and the resulting ZMP follows a given reference trajectory as closely as possible. This trajectory is then translated into a walking pattern by solving a problem of inverse kinematics. This approach has the constraint that the footsteps are xed and impossible to change. These footsteps need to be prede ned by the user, who needs to have enough knowledge about the environment to de ne a trajectory that results in a stable gait, and has no way to adapt to new environments automatically.
Expanding on Kajita’s work, Wieber (2006) proposed what he called a linear model predictive control (LMPC) scheme, improving on the original ZMP preview control.
The ZMP equations from the previous work are formalized as a quadratic program (QP) | a mathematical optimization problem | that, given the tracking of a ZMP reference trajectory, minimizes the jerks (the derivative of the acceleration) of the CoM of the robot. The idea of Wieber’s LMPC approach is to execute only a small part of the trajectory, and then recompute a new trajectory taking into account the current state of the CoM, allowing therefore for some feedback. The author also added a constraint to the QP that restricts the reference points of the ZMP to always stay a certain margin inside the convex hull of the two feet. This scheme was showed to be able to produce a CoM trajectory and adapt it in the middle of the locomotion after having a mass that corresponds to 33% of the total mass of the robot hit its trunk. It was not showed, however, how it would respond to variations of that perturbation (an obstacle for the feet would be more likely and more impactful), and, more importantly for our work, if this adaptation allows the robot to automatically adapt to di erent oor conditions. The main issue this scheme would have with this kind of adaptation is the fact that the output of the QP is a trajectory for the CoM: the joint commands are obtained through inverse kinematics, and if a perturbation to the environment changes the way the gait of the robot translates into CoM positions there is no way in this predictive control to account for this.
Later, Diedam et al. (2008) kept building on the LPMC scheme, making it so the positions of the feet do not have to be decided beforehand by a step planner, and are instead decided by simply adding new variables to the QP that correspond to the foot steps occurring over the prediction horizon. They propose to still use a step planner based on inverse kinematics, but keep complete freedom in the nal choice of the step positions according to the robot stability and mechanical limits. When allowing for this freedom of choosing step positions, one also needs to make sure they will not lead to motions impossible to realize because of geometric and kinematic constraints of the robot (like leg length or joint limits), which was solved by adding inequality constraints to the QP. The controller’s ability to adapt was tested in a similar way to the experiments done in Wieber (2006), but managing to recuperate from stronger perturbations.
Herdt et al. (2010) further improved on this motion generation scheme to allow it to generate stable walking motions without the use of prede ned foot steps. A reference speed is given to the controller, and according to this speed and the current state of the robot a foot step placement is decided. Since the position of the CoP no longer follows a reference trajectory, the feasibility of the motion is obtained by constraining this position to lie in the middle of the feet positions decided by the algorithm. Later Herdt, Perrin, and Wieber (2010) improved the model predictive control (MPC) by adding an algorithm for the control of orientations of the feet and the trunk, allowing the robot to turn in a safe way, and also adding polygonal constraints on the positions of the computed feet positions to improve its reliability.
The work from Salini, Padois, and Bidaud (2011) and Salini (2012) uses the same Linear Quadratic Programming (LQP) optimization approach as the MPC, but it orga-nizes the problem’s constraints in a hierarchy, and treats them as di erent tasks to be performed with an overall objective. These tasks consists in movements related to the whole body control of the robot. For the purpose of locomotion, it uses the trajectory tracking from a version of the ZMP Preview Control (Wieber, 2006) as the one of its tasks, and adds additional constraints that take into account physical laws of motion, the robot ’s actuation limits and physical properties of the robot’s environment. This hierarchical organization provides the ability to numerically change the priority of a postural task or the trajectory tracking, for example.
The changes to MPC introduced by Herdt et al. (2010) to make it so it only requires the reference speed as an input (and the dynamic, kinematic, and physical informa-tion of the robot controlled), causes the steps positions to be automatically selected with mostly stability in mind. This sometimes results in large oscillations in the sys-tem’s forward speed in order to maintain the target mean velocity (Herdt, Perrin, and Wieber, 2010). It also translates in a lack of exibility in choosing characteristics of the locomotion like the length, frequency, and width of each step, as well as the duty factor of the feet. This, in turn, can lead to having less potential to explore diverse behaviors provided by the controller. These behaviors could result in more optimal constant speeds or better energy e ciency, for example, with high energy costs being pointed as a functional shortcoming of MPC in general (Torricelli et al., 2016).
Because of this potential for exploring diverse behaviors, and because MPC is a robust and well regarded control system, it was chosen to be used in this project’s work in parameter tuning and locomotion behavior exploration and optimization, leading then into terrain adaptation. The version of MPC used was that from Wieber (2006), implemented by Salini, Padois, and Bidaud (2011), since there was easy access to it and this speci c implementation provides control over various parameters of the locomotion control (as opposed to just a reference speed), while also improving the locomotion stability by adding posture tasks in addition to the locomotion task. This control scheme is described in Appendix B.
Biologically inspired control
Since humanoid robots have the goal of mimicking humans in both form and function-ality, it is natural that people look at them for inspiration in terms of control strategy. Although they are not restricted to their own category in a de nitive way, biological inspired locomotion control approaches, speci cally ones that do not explicitly use the whole dynamic and kinematic models of the robotic system, can be de ned as using a more abstract model of the locomotion system, or some part of it, that is inspired by what is observed in humans or other animals.
Models that look to identify and explain the principles behind movement generation in animals are based on neurophysiologic principles, and, in some animals (including humans), describe this system as a distributed signal generation and processing, where the brain performs high-level movement control supported by a feedback system of sensors such as pressure, force, and intramuscular ones (Katz, 1996; Marder et al., 2005; Kiehn, 2006; Orlovsky, Deliagina, and Grillner, 1999).
Central Pattern Generators
The movement generation and control in humans is enacted by neural networks of the central nervous system. These networks transmit neuromuscular excitatory signals that travel in the body through action potential propagation in cells (Hodgkin and Huxley, 1952). They can produce rhythmic pattern outputs when only receiving simple input commands, without the need of sensory feedback, and are referred to as Central Pattern Generators (CPGs) (Marder and Bucher, 2001). These ndings resulted from earlier studies (Jones, Tansey, and Stuart, 2011) that replaced the view that locomotion behavior resulted from consecutive muscle sensor re exes chained together (Clower, 1998). It is believed there is a CPG unit for each limb, which are in turn composed by smaller circuits that control one muscle group of extensors and exors of a limb (Grillner, 2011).
Although sensory feedback is not required for CPGs to produce the motion patterns, it can be used to adapt movements dynamically to changes in environment (Grillner, 2006). Signals from low-level sensors can be combined with the high-level brain signals that activate the CPGs in order to achieve di erent motor outputs by selecting di er-ent rhythmic patterns, and modulating the amplitude and frequency of burst signals (Rossignol, Dubuc, and Gossard, 2006), with increased stimulus resulting in higher frequency of the network rhythm. This modulation is fundamental for keeping coor-dination between body movements, since changing the phase of the signals changes the timing with each movement operates | the unit CPGs must be coordinated in di erent ways in order to generate a di erent activation pattern. It also can change the duration of step phases, their structure, and the transition between them (Rossignol, Dubuc, and Gossard, 2006).
Very frequently the control based on CPGs will present a rhythm generation layer, which serves as a temporal reference for a pattern generation layer. Such is the case with the implementation from Matos (2013), This results in i being an increasing periodic signal that is used as the phase of the leg i, with rate !. o is the phase of leg o, kept in a desired relationship with the oscillator for i. The coupling strength can be controlled with k. The pattern generation uses the periodic signal to control motion patterns encoded as a set of non-linear dynamical equations with well-de ned attractor dynamics, which can be smoothly regulated in regard to their amplitudes, frequencies, and pattern o sets, zj;i = (Oj;i zj;i) + Xf(zj;i; i; _ i): (2.3)
The position zj;i of joint j from leg i is generated according to the current phase of the leg, i. The o set attractor Oj;i is a position the equations converges to if > 0. Here, each function f de nes a motion primitive, whose sum is used as the nal output trajectory for the robot’s joints.
Control based on CPGs
In his review, Ijspeert (2008) highlights the notion that CPGs usually have a signi cant problem with regards to parameters tuning. He notes that one of the problems to overcome in their design is the learning and optimization used to t the sometimes dozens of parameters that shape the outputs of nonlinear dynamical systems to the desired waveforms. The waveforms themselves also have to be decided upon, and the way they translate into joint trajectories, and then to the robot’s locomotion, is also not always apparent.
CPG models are frequently used in the control of biped locomotion. Most of these implementations involve sets of coupled nonlinear oscillators implemented with coupled di erential equations that produce stable limit cycles | isolated periodic solutions, that are, in this case, attractive towards neighboring solutions. The phase of the oscillators can be used to control rhythmic nominal trajectories and to achieve interlimb coordi-nation (Ijspeert, 2008). Taga, Yamaguchi, and Shimizu (1991) used coupled oscillators to model a neural rhythm generator, given the oscillatory dynamics of both the gen-erator and the musculo-skeletal system the authors used. The coupling also allows for entrainment between those two systems, meaning they have the same period of oper-ation. This framework, as many CPG based approaches which will be discussed here, has many open parameters, such as a nonspeci c one that controls gait patterns, ones controlling feedback to the system, and also parameters related to the interconnection between unit systems. These last ones were set manually in a way that modulates amplitudes and relative frequencies to produce speci c joint motions, which may or may not be optimal. It is not clear whether the parameters related to feedback were set manually or optimized in some kind of way.
A later revision to Taga’s control scheme came with the objective of making the model of the musculo-skeletal system step over visible obstacles (Taga, 1998). The rhythm generator was combined with a discrete movement generator that, receiving visual information regarding the obstacle, modi es the gait pattern. This generator uses modi cation signals whose amplitudes are controlled by di erent parameters, and have to be adjusted to produce smooth and coordinated changes in the gait. The author also noted that choosing appropriate values of these parameters and ones that control step length is important for obstacle clearing, but does not clarify how that choice is made. Additionally, there are parameters that determine the strength of the torque. In short, both the original control framework and its revision have open parameters that a ect the behavior of the locomotion task and do not have a straightforward way to be tuned toward objective optimization or even basic, stable locomotion.
Table of contents :
1.2 Locomotion controllers adaptation
1.3 Goals and methods
1.6 List of publications
1.6.1 Conference papers
2 Humanoid robot locomotion control
2.1 Humanoid robots locomotion
2.1.1 Characteristics of humanoid locomotion
2.1.2 Statically and dynamically balanced locomotion
2.2 Biped locomotion control strategies
2.2.1 Control based on dynamic models
2.2.2 Biologically inspired control
2.2.3 Biped locomotion controller optimization
2.2.4 Biped locomotion controller adaptation
3.1 Mathematical optimization
3.2 Evolutionary algorithms
3.2.1 Multi-objective optimization
3.2.2 Pareto eciency
3.3.1 SBX and polynomial mutation
4 A general framework for biped locomotion control optimization
4.1 Problem definition
4.2 Exploration framework
4.3 Selecting the components of the framework
4.3.1 Robot model
4.3.2 Locomotion controller
4.3.3 Environment variable
4.3.4 Optimized locomotion features
4.3.6 Optimization algorithm
4.4 Sensitivity analysis on the environment variable
4.5 Correlation analysis
4.5.1 Tuning the optimization setup
5 Optimizing the control of biped locomotion in different conditions
5.1 Locomotion control of the iCub on oors with diffcerent frictions
5.1.1 Optimization framework setup
5.1.2 Stage 1: Locomotion control optimization
5.1.3 Stage 1 results
5.1.4 Stage 2: Optimization on oors with different frictions
5.1.5 Stage 2 results
5.2 Controller optimization of the DARwIn-OP on oors with different slopes
5.2.1 Optimization framework setup
5.2.2 Stage 1: Preliminary optimization
5.2.3 Stage 1 results
5.2.4 Stage 2: Optimization on noors with different slopes
5.2.5 Stage 2 results
5.3 Effects of mass and volume changes on the control of a virtual humanoid robot
5.3.1 Optimization framework setup
5.3.2 Locomotion control optimization under different body’s mass and height values
6 An adaptive approach to humanoid locomotion
6.1 Problem deffnition
6.2 Adaptation framework overview
6.3 Functions of the adaptation framework
6.3.1 Selection of the solution for the identification process
6.3.2 Identification of the new context
6.3.3 Selecting the nal solution
7 Humanoid locomotion adaptation to unknown terrain features
7.1 Wilcoxon test
7.2 Adapting the iCub’s locomotion control to dierent coecients of friction
7.2.1 Adaptation framework setup
7.3 Making the DARwIn-OP walk up ramps with dierent slopes
7.3.1 Adaptation framework setup
8 Summary and perspectives
8.2 Methodology and contributions
8.4.1 Main advantages and disadvantages
8.4.2 Towards a full-edged implementation of the adaptation framework
8.5.1 Time cost and safety of the adaptation framework
8.5.2 Adding feedback to the adaptation process
8.5.3 Automatization of the exploration phase
8.5.4 Duration of each trial
8.5.5 Optimizing towards multiple context variables
8.5.6 Relation between context values and locomotion features
8.5.7 Optimizing a humanoid for total mass and height