A computational model for human executive control and adaptive behavior

Get Complete Project Material File(s) Now! »

Graphene as transparent conducting electrode

Most of the techniques described above have been used to try to transform the graphene sheet in a transparent conducting electrode (TCE). Graphene is very promising for this kind of application because of its transparency and its impressive transport properties. However, undoped (or slightly doped) graphene has a sheet resistance too high for TCE applications and doping is thus necessary to increase its carrier density and lower RS. Electrostatic doping performed with conventional techniques, like shown in References [37, 38, 39, 40, 41, 42], is not suitable for TCE applications because it involves the use of other materials deposited on the graphene sheet which lower its transparency and may not reach very high doping. Moreover, the depositing materials on graphene signi cantly alter its electronic transport properties.
Chemical doping has been successfully used to dope graphene for TCE applications. For example, in Reference [14] CVD graphene doped with nitric acid is used as TCE for display and touchscreen applications, while in References [43, 44] graphene doped with AuCl3 is used for the fabrication of solar cells. As mentioned before, chemical doping has some important and not negligible issues in terms of time stability of the carrier concentration.
Another attempt to produce TCE with graphene is by using multi-layer graphene. If more layers of graphene are stuck together the resulting con-ductivity can be signi cantly enhanced with respect to the monolayer, still maintaining the transmittance above to 90% in the visible range [43]. In Reference [50] multi-layer graphene for the fabrication of LEDs has been demonstrated. However, this technique very is inconvenient to use because it involves many transfer processes in order to obtain the desired multi-layer graphene structure with the risk of damaging the whole device at every transfer step.
As we will show later in this thesis, space charge doping can overcome many of these problems and can have thus potential applications in the eld of TCEs.

Optical microscope

The optical microscope is a useful tool for a rst characterization of the graphene and ZnO samples. The images of the samples are recorded with a Leica DM2500 and a CCD camera with 5X, 10X, 50X and 100X objectives in the bright eld imaging mode. For the case of anodic bonded graphene, the optical microscope characterization is useful to determine the number of layers of the sample through the optical contrast while for CVD graphene and zinc oxide it is useful to determine the quality of the surface of the sample.

Atomic Force Microscopy

Atomic force microscopy (AFM) was invented in 1986 [84] as an evolution of the Scanning Tunneling Microscope (STM). It is a tool which is able to scan the surface of a sample to obtain a topography at extremely high resolution. Contrary to STM which measures the tunneling current between the sample and the scanning tip, AFM measures the forces at the atomic scale and thus it makes possible to scan conductive and insulating samples [85]. The basic principles is described in the following.

Electronic transport measurements setup

Space charge doping, electronic transport and Hall measurements are all carried out under high vacuum (< 106 mbar) in a custom made continuous He ow cryostat. It allows to control the temperature of the sample in the range 3 420 K, thus allowing to perform the doping of the samples and the low temperature transport measurements in situ. The cryostat is held inside an electromagnet capable of reaching a magnetic eld of 2 T. The doping and resistivity measurements are controlled by a LabVIEW program which coordinates the measurement instruments and saves the data into a le.

The principles of space charge doping

To understand the working principles of space charge doping it is worthwhile to take a closer look at the structure of glasses in general and of ionic transport inside glass.

Glass atomic structure

It is known that glasses are formed by an amorphous network of certain glass-forming oxides which respect some rules [95]. Commercial glasses are formed by three-dimensional networks of SiO2, as in the case of soda-lime glass, with B2O3 also contributing in the borosilicate glasses. The building block of the glass atomic network is the oxygen tetrahedron which surrounds silicon atoms and each tetrahedron shares a corner with other tetrahedra. The angle between the different tetrahedra varies in an unpredictable way, and that is why the atomic structure of glass is considered amorphous. Network modifiers are added to the glass during the production in order to enhance certain properties. Sodium oxide (Na2O) and calcium oxide (CaO) are the main compounds used as modifiers. The introduction of these species breaks some Si-O-Si bridges creating non bridging oxygen units which serve as anions for the Na+ and Ca2+ cations which are incorporated in these sites. Oxygen atoms are linked covalently to the network and Ca2+ ions posses a very low mobility, making Na ions the most mobile species in glass. Figure 3.1 is a simplified and schematic representation of the structure of glass at the atomic level.

Link from AN to reinforcement learning

Q-learning [Watkins and Dayan, 1992] is a model-free reinforcement learning technique for a Markov decision process. The agent receives a reward at each state, and accordingly optimizes the action-value function Q(s; a) by value iteration. The goal is to predict the next action for maximizing future rewards. The policy is xed, and is generally a softmax on the updated Q values.
This learning rule has been shown to explain conditional associative learning behavior in many experimental paradigms, and to exhibit specic neural correlates (see section 1.2.1). A direct parallel can be made between learning these behavioral Q values and learning synaptic weights between event selective neural populations in a decision network. The stochastic hebbian learning rule of [Fusi {A1,A2,A3,A4}.

Learning and forgetting associations over several episodes

The forget and learn pattern of the AN is illustrated in 4.1. This gure shows a simulation of the recurrent session where an uncued episode switch marks the new occurrence of one of the three recurrent task-sets.
The evolution of AN synaptic weights corresponding to each task-set is displayed in gure (A). After an episode switch, the new task-set is learned, and the previous task-set is unlearned. Other transients are plotted in red. They correspond to synaptic strengths which are incorrect according to the any of the task-sets, and thus to exploratory stimulus-action associations. The corresponding performance is displayed in (B). The stochasticity in the model only comes from the action-selection rule studied in section 3.2. As soon as the imbalance between synaptic weights becomes greater than an order of 1=, the dynamics of the model is nearly deterministic. The remaining stochasticity exclusively emanates from the greediness parameter . In the whole chapter, this parameter equals 0 for illustration purposes. The AN encodes a single task-set at once and there is no possible memory of multiple task sets.

Sensorimotor associations are learned one by one

In this subsection we discuss and insist on the fact that stimulus-action associations are learned one by one in the AN. A task-set can be encoded by this network, but this encoding simply reects one by one encoding of stimulus-action associations.
From the learning rule 3.5, we remark that potentiation and depression aect only the synaptic weights emanating from the activated stimulus-selective neural population. There is no interdependent learning of associations from two distinct stimuli. As a consequence, the AN cannot retrieve interdependently the action associated with a stimulus from knowing another stimulus-action association, which is a measure of task-set retrieval ability.
We explore the eect of this independent associative learning through the synaptic dynamics and the corresponding performance after a rst correct trial, displayed in gure 4.2 (B) and (E). For each simulated episode, sT is the stimulus presented at the rst correct trial T, associated to action aT . From this trial, we consider the next trial T + k for which the presented stimulus is dierent: k = min k2N (sT+k 6= sT ). As stimulus presentation is randomized, this happens after k trials with probability 2 3 ( 1 3 )k􀀀1. We are interested in the value of the synaptic weight from this stimulus sT+k selective neural population to the correct action aT+k selective neural population. This synaptic weight is JAN sT+k!aT+k . In each task-set, an action is associated to a single stimulus, so aT+k 6= aT .


Table of contents :

1 Introduction 
1.1 Task-set, from the what to the how
1.1.1 What is a task-set ?
1.1.2 The prefrontal cortex is the locus of cognitive control
1.2 A computational model for human executive control and adaptive behavior
1.2.1 Reinforcement learning
1.2.2 Previous studies on reward-based decision making
1.2.3 Bayesian inference
1.2.4 The model of Collins and Koechlin
1.3 Building blocks of a representational model of task-set implementation in the PFC
1.3.1 Fusi andWang’s biological realistic network model for decisionmaking
1.3.2 Hebb rule for synaptic plasticity
1.3.3 Experimental evidence for the behavioral Hebbian learning rule
1.3.4 Theoretical work of Mattia Rigotti and Stefano Fusi
2 Experimental results 
2.1 Experimental design
2.1.1 The experimental task
2.1.2 Debrieng
2.2 Behavioral results
2.2.1 General behavior
2.2.2 Quantitative measures of behavior
2.2.3 Behavioral results of Experiment 1
2.2.4 Behavioral results of Experiment 2
2.2.5 Summary on the behavioral results of Experiment 1 and
3 Model description 
3.1 The network architecture
3.1.1 The associative network
3.1.2 The task-rule network
3.2 Network dynamics
3.2.1 Decision-making dynamics in AN
3.2.2 Network dynamics in TN
3.3 Plasticity
3.3.1 General form of plasticity rules
3.3.2 AN plasticity rules
3.3.3 TN plasticity rules
3.4 Comparison with other models
3.4.1 Fusi andWang’s biological realistic network model for decisionmaking
3.4.2 Link from AN to reinforcement learning
3.4.3 TN dierence with the study from M. Rigotti et al
4 Synaptic dynamics of the AN 
4.1 Learning and forgetting associations
4.1.1 Learning and forgetting associations over several episodes
4.1.3 Detailed analysis of the forget and learn pattern
4.2 Sensorimotor associations are learned one by one
4.3 Eect of a noisy trial on AN connectivity
4.3.1 A noisy trial at the beginning of an episode
4.3.2 A noisy trial at the end of an episode
4.3.3 Conclusion on the eect of noisy trials on AN connectivity .
4.4 Conclusion on the learning dynamics of the AN
5 Synaptic dynamics of the TN: the formation of task-sets 
5.1 Synaptic encoding of temporal contiguity
5.1.1 A framework from Ostojic and Fusi
5.1.2 Application of this framework to the TN model
5.1.3 The speed-accuracy trade-o
5.2 Introduction: an example of TN activity
5.3 The TN is able to encode the task-sets of the recurrent session .
5.4 Eect of the inferential bias from the TN to the AN
5.4.1 Inference bias from the Perfect TN
5.4.2 Inference bias with the Plastic TN
5.5 Eect of the inferential bias from the TN to the AN in a noisy environment
5.5.1 A noisy trial at the beginning of an episode
5.5.2 A noisy trial at the end of an episode
5.5.3 The overall eect of noisy trials
5.5.4 Conclusion on the eect of noisy trials
5.6 Learning limits and their eect on the inferential bias
5.6.1 Propagation of a spurious connection after learning in the two-dimensional AN-TN
5.6.2 Learning in the TN without any inferential bias from the TN to the AN
5.6.3 Eect of the inferential feedback from the TN to the AN
5.6.4 Eect of the inferential feedback on rst correct trials
5.6.5 Eect of the strength of the inference bias
5.7 Conclusion
6 Model tting and comparison 
6.1 Methods
6.1.1 Model tting: techniques
6.1.2 Model comparison: quantitative criteria
6.1.3 Model comparison: qualitative criteria
6.1.4 Model specications
6.2 Model selection and comparison: recurrent session
6.2.1 LSE model ts and simulations of the recurrent session
6.2.2 AIC and BIC in the recurrent session
6.2.3 Parameters analysis
6.2.4 TN dynamics with tted parameters
6.3 Model selection and comparison: open-ended session
6.3.1 LSE Model ts and simulations of the open-ended session
6.3.2 AIC and BIC in the open-ended session
6.3.3 Parameters analysis
6.4 The memory eect
6.5 Conclusion
7 Testing model predictions: eects of learning the task-structure on performance 
7.1 Positive bias from learned rules: task-set retrieval
7.1.1 Model-based classication of trials
7.1.2 Trials and episodes distributions
7.1.3 Model prediction: probability of making a correct choice on the AFC trial
7.1.4 Model-based classication of subjects
7.1.5 Link between the model-based classication of subjects and the increment parameter values, as well as mean performance
7.1.6 Link to the post-experiment debrieng classication of subjects
7.1.7 Reaction times
7.1.8 Summary of the task-set retrieval prediction due to positive bias from learned rules
7.2 Prediction for an incorrect bias from learned rules
7.2.1 A rewarded noisy trial
7.2.2 The model-based classication of trials
7.2.3 Trials and episodes distributions
7.2.4 Model prediction: probability of making an incorrect choice on the AMN trial
7.2.5 Summary of the incorrect task-set retrieval prediction due to an incorrect bias from learned rules
7.3 Discussion
7.4 Prediction for an incorrect bias from two overlapping rules
8 Neuroimaging analysis 
8.1 Description of Functional Magnetic Resonance Imaging
8.1.1 BOLD physiology
8.1.2 Model-based fMRI
8.2 Model-based fMRI: methods
8.2.1 Pre-processing
8.2.2 General Linear Model, rst-level analysis
8.2.3 Second level analysis
8.2.4 Region of Interest (ROI)
8.3 Retrieving previous studies results on prediction error with GLM 1
8.4 AN synaptic strength of the chosen association when making a decision
8.5 Consistency between AN encoding and TN belief when making a decision
8.6 Investigating task-set retrieval at the feedback time
8.6.1 Positive linear eects in a dorsomedial prefrontal node
8.6.2 Positive linear eects in a dorsolateral prefrontal node
8.7 Comparative analysis of the two sessions
8.7.1 Dorsomedial prefrontal cortex is activated specically when a task-set is monitored at the time of decision
8.7.2 Dorsolateral prefrontal cortex is activated specically when a task-set is retrieved
8.7.3 Dorsomedial prefrontal cortex is also activated specically when a task-set is retrieved
8.8 Conclusion
9 Discussion 
9.1 General conclusion
9.2 Computational complexity
9.3 The question of the inference bias from TN to AN
9.4 Expected and unexpected uncertainties
9.5 Tracking the statistics of the environment
9.6 Behavioral shift
9.7 Where are the AN and the TN in the brain?


Related Posts