Get Complete Project Material File(s) Now! »
Motivations
Computational models are determinant tools in the scientific study of the nervous system [Day94]. They help to synthesize large quantities of empirical data in all disciplines of neuroscience, from studies at the molecular level to studies at the behavioural level. Theories regarding the mechanisms and functional roles of the various elements of the nervous system, e.g. anatomical parts or specific chemicals, or even more general capaci-ties, e.g. memory, are often validated afterwards, suggested beforehand and/or formalized by computational models. They allow one to replicate results, to explain findings with simple notions, to draw predictions and to guide research processes towards important questions.
Computational neuroscience is of particular interest for studying how one learns from its interactions with the world, anticipates future events and ultimately selects and/or produces actions. Among these capacities one can distinguish between Pavlovian and instrumental conditioning. Pavlovian conditioning [Pav27] consists in the acquisition of responses towards neutral stimuli that have been paired with rewards, such as when one salivates at the bell of the ice cream truck. Instrumental conditioning [Ski38] consists in the expression of a behaviour in order to achieve a goal, such as when one learns to dial a specific number on a phone to call someone. Combined together these mechanisms are at the heart of our learning capacities and their study significantly benefits from computational models.
Reinforcement learning (RL) [SB98], in short learning by trials and errors to decide which action to take in a given situation to achieve a specific goal, is one of the major frameworks used in the current computational models of Pavlovian and instrumental con-ditioning. As an example of its deep contribution in the expansion of Pavlovian condition-ing, the learning algorithm TD-Learning [SB81], first developed to explain the prediction capacities of animals in some experimental task, was subsequently shown to rely on a signal that could be paralleled with the activity of some dopaminergic neurons during the experimental task [Sch+97]. Hence, TD-Learning successfully linked the expression of a behaviour with some possible underlying neural correlates. With the accumulation of evidences, it is now well accepted that conditioning results from the combination of some kind of reinforcement learning processes.
Surprisingly, while early used in the investigation of Pavlovian conditioning, the mod-ern RL framework is more suited for the investigation of instrumental conditioning, where actions are indeed required to achieve goals. Recent computational models of instrumen-tal conditioning are often the result of a combination of multiple RL systems [Daw+05; Ker+11; Pez+13]. However, recent computational models of Pavlovian conditioning do not rely much more on this general framework but on more specific architectures [Sch+96; MM88; Cou+04; SM07; Jam+12; Has+10]. This is a problem when one investigates the interactions between both types of conditioning as the combination of the various com-putational models is often not straightforward nor natural.
Objectives
In the present thesis, we aim at finding key concepts that could be used in RL computational models to allow the study of Pavlovian conditioning, instrumental conditioning and their interactions. Taking inspiration from a variety of experimental data, our in-tuition is that combining dual-learning and factored representations may help to explain experimental data yet unaccounted for. Dual learning is a commonly accepted concept in the study of instrumental conditioning while factored representations are a concept ne-glected in RL algorithms of conditioning but often present in the alternative architectures developed to account for Pavlovian conditioning. Especially, we investigated some experi-mental data about behavioural inter-individual differences in rats undergoing a Pavlovian conditioning task and other experimental data about maladaptive behaviours expressed by pigeons in a supposed interaction task, that could well be explained by such concepts.
In order to address these issues, the work presented in this thesis is grounded on a multidis-ciplinary approach, combining tools or data from neuroscience and artificial intelligence.
On the neuroscience side, we took inspiration from experimental data about mal-adaptive behaviours and inter-individual variability in conditioning tasks, including be-havioural, neurophysiological, neuropharmacological and neuropsychological data. Be-havioural analyses involve the observation of animals behaviours in experimental task to investigate their response properties, their capacities, their limits and the strategies developed. Neurophysiology consists in recording the activity of brain regions and/or par-ticular cells, using various techniques such as functional Magnetic Reasonance Imaging (fMRI) or Fast Scan Cycling Voltammetry (FSCV). It helps to investigate the signals on which might rely learning processes that lead to the observed behaviours and locate where values, variables or associations might be stored. Neuropharmacolocy studies the effect of drugs on the nervous system. By injecting drugs that affect specific cells or chemicals, e.g. dopamine, either locally or systemically, it helps to investigate their functions and contributions in the observed behaviours. Similarly, neuropsychology studies the effect of lesions of parts of the brain to identify which and how brain areas contribute to the different aspects of particular behaviours.
On the artificial intelligence side, we mainly use computational models based on ma-chine learning and evolutionary algorithms. Machine learning, from which reinforcement learning algorithms are a subset, are algorithms designed to learn from data in a wide diversity of manners for as many different purposes. In our case, we use it mainly to learn how to successfully accumulate rewards in an efficient way. Evolutionary algorithms are population-based metaheuristic optimization algorithms that can be used to tune algo-rithms to fit as closely as possible some particular behaviours or results.
In the present work, we first investigated experimental data about conditioning, col-lected by different approaches, from which we extracted challenging data for the current literature and hints about the mechanisms they might be the result of. Then we developed a computational model with such mechanisms, tuned it with evolutionary algorithms and confronted it to the data for validation.
Organization
ditioning task that are conflictual with the current literature. Chapter 6 extends the first one with detailed predictions drawn from the model through new simulations in pro-posed variants of the original experimental protocol. Chapter 7 shows some generalization abilities of the computational model by applying it to another set of experimental data suggesting inter-individual variability in a different conditioning task about maladaptive behaviours. Each chapter begins with a short introduction that outlines the content of the article and how it is related to the present work.
Finally, Chapter 8 details our contributions, their limits, discusses our architecture choices, and give possible directions for future research.
Table of contents :
1 Introduction
1.2 Objectives
1.3 Methods
1.4 Organization
2 Reinforcement learning
2.1 Introduction
2.2 Markov Decision Processes
2.3 Reinforcement learning
2.4 Extensions of the classical framework – Factored representations
3 Animal conditioning
3.1 Introduction
3.2 Instrumental conditioning
3.3 Pavlovian conditioning
3.4 Pavlovian-instrumental interactions
4 Synthesis and working hypothesis
5 Modelling individual differences in autoshaping CRs (article)
6 Predictions from a computational model of autoshaping (article)
Abstract
Introduction
Material and methods
Results
Discussion
7 Model of individual differences in negative automaintenance (article)
Abstract
Introduction
Methods
Results
Discussion
8 Discussion
8.1 Contributions synthesis
8.2 Limits and perspectives
8.3 Concluding remarks