Natural Policy Gradient Method

(Downloads - 0)

Description

For more info about our services contact : help@bestpfe.com

Table of contents

1 Introduction
2 Background
2.1 Reinforcement Learning Problem
2.2 Value-Function Approach
2.2.1 Monte-Carlo Methods
2.2.2 Temporal-difference (TD) methods
2.3 Policy Search Approach
2.3.1 Stochastic Policy Gradient Method
2.3.2 Deterministic Policy Gradient Method
2.3.3 Natural Policy Gradient Method
2.3.4 Weighted Maximum Likelihood Methods
2.3.5 Trust Region Optimization Method
2.3.6 Direct policy search
2.4 Model-Based Policy Search
2.4.1 Learning the Model
2.4.2 Policy optimization
Back-propagation through time
Direct policy search
Information-theoretic
Sampling-based
2.5 Data-efficient Robot Learning with Priors
2.5.1 Priors on the dynamical model
Generic Robotic Priors
Gaussian Process Model with Non-Constant Prior
Meta-Learning of Dynamical Model
2.5.2 Priors on the policy
2.5.3 Priors on the objective function model
2.5.4 Repertoire-based prior
2.6 Conclusion
3 Data-efficient Robot Policy Search in Sparse Reward Scenarios
3.1 Introduction
3.2 Problem Formulation
3.3 Approach
3.3.1 Learning system dynamics and reward model
Learning system dynamics with sparse transitions
3.3.2 Exploration-Exploitation Objectives
3.3.3 Multi-Objective Optimization
3.4 Experiments
3.4.1 Sequential goal reaching with a robotic arm
3.4.2 Drawer opening task with a robotic arm
3.4.3 Deceptive pendulum swing-up task
3.4.4 Additional Experiments
Drawer opening task with 4-DOF arm
Multi-DEX on non-sparse reward task
3.4.5 Pareto optimality vs. weighted sum objectives
3.5 Discussion and Conclusion
3.6 Details on the Experimental Setup
3.6.1 Simulator and source code
3.6.2 General and Exploration Parameters
3.6.3 Policy and Parameter Bounds
3.6.4 NSGA-II Parameters
3.6.5 Gaussian Process Model learning
4 Adaptive Prior Selection for Data-efficient Online Learning
4.1 Introduction
4.2 Problem Formulation
4.3 Approach
4.3.1 Overview
4.3.2 Generating Repertoire-Based Priors
4.3.3 Learning the Transformation Models with Repertoires as Priors
4.3.4 Model-based Planning in the Presence of Multiple Priors
4.4 Experimental Results
4.4.1 Object Pushing with a Robotic Arm
4.4.2 Goal Reaching Task with a Damaged Hexapod
4.5 Discussion and Conclusion
5 Fast Online Adaptation through Meta-Learning Embeddings of Simulated Priors
5.1 Introduction
5.2 Approach
5.2.1 Meta-learning the situation-embeddings and the dynamical model
5.2.2 Online adaptation to unseen situation
5.3 Experimental Results
5.3.1 goal reaching with a 5-DoF planar robotic arm
5.3.2 Ant locomotion task
5.3.3 Quadruped damage recovery
5.3.4 Minitaur learning to walk
5.4 Discussion and Conclusion
6 Discussion
6.1 Learning the Dynamical Model from the Observations
6.2 Model-Learning in Open-Ended Scenarios
6.3 Repertoire-based Learning
6.4 Using Priors from the Simulator
6.5 What is Next?
7 Conclusion
Bibliography

Natural Policy Gradient Method

For more info about our services contact : help@bestpfe.com

Laisser un commentaire Annuler la réponse

Sustainable tourism and ecological destination responsible management

Computerized crime tracking information system

New investigations into braess’ paradox

The implied reader’s sensitivities and the plot of numbers 16 and 17

Influence of hearing loss on the acquisition of information literacy

Sustainable tourism and ecological destination responsible management

The rubric and its impact on classroom written assignments and/or tests in the lycees

What is reading comprehension ?

Below are Agricultural Science Project Topics

Consumer Perceived Ethicality (CPE)

Current Implementation of Hands Overlay

Wireless Sensor Networks

Downloading and installation of Apps

The Birth of Game Semantics

History of Cellular Networks

About

Economics

Socially Responsible Investment (SRI)

Venture capitalist investment strategies

Sustainability and Sustainable Development

Marketing

Consumer Perceived Ethicality (CPE)

Online Personalized Advertisements

Customer experience management

For more info about our services contact : help@bestpfe.com

Produits similaires

The KPZ scaling theory

LHCb Tracking Performance

The Large Hadron Collider and the ATLAS detector

Short-Term Synaptic Plasticity Recordings

Vous pourriez aussi aimer

Laisser un commentaire Annuler la réponse