Natural Policy Gradient Method

somdn_product_page

(Downloads - 0)

Catégorie :

For more info about our services contact : help@bestpfe.com

Table of contents

1 Introduction 
2 Background 
2.1 Reinforcement Learning Problem
2.2 Value-Function Approach
2.2.1 Monte-Carlo Methods
2.2.2 Temporal-difference (TD) methods
2.3 Policy Search Approach
2.3.1 Stochastic Policy Gradient Method
2.3.2 Deterministic Policy Gradient Method
2.3.3 Natural Policy Gradient Method
2.3.4 Weighted Maximum Likelihood Methods
2.3.5 Trust Region Optimization Method
2.3.6 Direct policy search
2.4 Model-Based Policy Search
2.4.1 Learning the Model
2.4.2 Policy optimization
Back-propagation through time
Direct policy search
Information-theoretic
Sampling-based
2.5 Data-efficient Robot Learning with Priors
2.5.1 Priors on the dynamical model
Generic Robotic Priors
Gaussian Process Model with Non-Constant Prior
Meta-Learning of Dynamical Model
2.5.2 Priors on the policy
2.5.3 Priors on the objective function model
2.5.4 Repertoire-based prior
2.6 Conclusion
3 Data-efficient Robot Policy Search in Sparse Reward Scenarios 
3.1 Introduction
3.2 Problem Formulation
3.3 Approach
3.3.1 Learning system dynamics and reward model
Learning system dynamics with sparse transitions
3.3.2 Exploration-Exploitation Objectives
3.3.3 Multi-Objective Optimization
3.4 Experiments
3.4.1 Sequential goal reaching with a robotic arm
3.4.2 Drawer opening task with a robotic arm
3.4.3 Deceptive pendulum swing-up task
3.4.4 Additional Experiments
Drawer opening task with 4-DOF arm
Multi-DEX on non-sparse reward task
3.4.5 Pareto optimality vs. weighted sum objectives
3.5 Discussion and Conclusion
3.6 Details on the Experimental Setup
3.6.1 Simulator and source code
3.6.2 General and Exploration Parameters
3.6.3 Policy and Parameter Bounds
3.6.4 NSGA-II Parameters
3.6.5 Gaussian Process Model learning
4 Adaptive Prior Selection for Data-efficient Online Learning 
4.1 Introduction
4.2 Problem Formulation
4.3 Approach
4.3.1 Overview
4.3.2 Generating Repertoire-Based Priors
4.3.3 Learning the Transformation Models with Repertoires as Priors
4.3.4 Model-based Planning in the Presence of Multiple Priors
4.4 Experimental Results
4.4.1 Object Pushing with a Robotic Arm
4.4.2 Goal Reaching Task with a Damaged Hexapod
4.5 Discussion and Conclusion
5 Fast Online Adaptation through Meta-Learning Embeddings of Simulated Priors 
5.1 Introduction
5.2 Approach
5.2.1 Meta-learning the situation-embeddings and the dynamical model
5.2.2 Online adaptation to unseen situation
5.3 Experimental Results
5.3.1 goal reaching with a 5-DoF planar robotic arm
5.3.2 Ant locomotion task
5.3.3 Quadruped damage recovery
5.3.4 Minitaur learning to walk
5.4 Discussion and Conclusion
6 Discussion 
6.1 Learning the Dynamical Model from the Observations
6.2 Model-Learning in Open-Ended Scenarios
6.3 Repertoire-based Learning
6.4 Using Priors from the Simulator
6.5 What is Next?
7 Conclusion 
Bibliography

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *