Get Complete Project Material File(s) Now! »
Data driven integrated modelling of social dynamics
My research on data driven models of social dynamics has been taking place in the context of European projects. It starts in 2000 with my participation to the European project IMAGES (1997 – 2001). This project aimed at developing a decision-support model for defining agricultural-environmental measures proposed by the European Commission to favour agricultural environmental practices. This model targeted in a first step a better understanding of the dynamics of diffusion of these agri-environmental measures among farmers. As it was an individual based model, various dynamics had to be identified and modelled and an initial population had to be generated consistently with the chosen dynamics. For example, for the particular case of the diffusion of the organic farming measure in Allier, we faced the following questions: how and when had the information about the measure been diffused to farmers? How do farmers decide? How many organic farmers are there initially in the population? How to generate an initial population convenient to study the diffusion?
The project made some progresses in answering these questions, but we also identified shortcomings motivating new directions of research that were put forward in the European project PRIMA project (2008 – 2011). The aim of this project was evaluating how rural municipalities evolve under European policies for agriculture and rural areas, and I have been strongly involved in the cycles of elaboration, implementation and tests of its model. I was in charge of the day to day management of the LISC team involved in this project (comprising a PhD student, post-doctoral student and an engineer) under the supervision of G. Deffuant for the main orientations. Moreover, in the modelling process, I identified how the different data sources could be best used for our modelling purposes. The part 1 of this document is dedicated to my work within this project, and more particularly the implementation of the conceptual model for the rural municipalities of the French département called Cantal.
An important shortcoming in the IMAGES model was that the farm population was not evolving. Indeed, in this model the farms include precise economic parameters (Deffuant, Amblard et al. 2000; Deffuant and al. 2001; Deffuant, Huet et al. 2002) but the farms and the population of farmers remain the same during the 10 years of the simulation. However, during this period, a lot of farmers retire while much less new farmers settle. The average size of farms increases because the abandoned farms are bought by other farmers. Yet, the size of the farms is an important parameter to determine the economic impact of adoption of organic farming: the larger the farm; the better the capacity of the farmer to feed his livestock without external intakes, the higher is the benefit of the conversion. This shortcoming points out the need for a dynamic population ageing and changing labour status. That is why, when we elaborated the PRIMA project, we considered this question of the evolution of the population as central.
Modelling the evolution of countryside populations turned out to be very rich and interesting. While for decades the countryside in many regions of Europe was synonymous with inevitable decline, nowadays, some areas experience a “rebirth, even in areas where, until recently, development was not considered possible » (Champetier 2000). A recent EPSON (European Observation Network for Territorial Development and Cohesion) project report (Johansson and Rauhut 2007), concludes that « since the 1970s a global process of counter-urbanization has become increasingly manifest ». However, this general rebirth of the countryside hides deep heterogeneities. It can be observed in the Cantal « département » in France where some municipalities are increasing and others are decreasing.
Part 1 of this document presents the main steps of this modelling work. I organise them in a chronological order corresponding to the tasks for designing and parameterising the model. Only the three first chapters have been published: the conceptual model (chapter 1.1) has been approved by the European Commission as a deliverable; it has also been a part of chapter of a book (the other part being chapter 1.3); chapter 1.2 has been published in PLOS One. The chapters have been written at different stages of the model elaboration, and some minor parts are different from one chapter to the other. That is especially the case of chapter 1.3 in which the heuristic to search for a job or a residence is deprecated (some comments have been added to the paper to point out the deprecated methods). The other chapters are up to date.
Chapter 1.1 presents the conceptual model we have designed (Huet, Dumoulin et al. 2012) as a basis for implementations of various case studies in Europe. We follow prescriptions of recent reviews dedicated to land use and land cover change modelling which recommend hybrid approaches (Boman and Holm 2005; Birkin and Clarke 2011; Birkin and Wu 2012), and more particularly coupling microsimulation and agent-based modelling. This choice allows us to include some individual dynamics, poorly known and about which no direct data are available, such as the residential location decision (Coulombel 2010) and also to derive other dynamics from data when it is possible.
Our conceptual model considers individuals, members of households located in municipalities of a region, and their state transitions expressing demographic and activity events: birth, finding a partner, moving, changing job, quitting their partner, retiring, dying … The municipalities include offers for jobs and dwellings which constrain the possible state transitions. Because we are interested in understanding better the dynamics leading to the development or, on the contrary, to the decline and possible disappearance of municipalities and settlements, two sets of cruxes can be identified in the model: the individual dynamics which determine the needs for residence and jobs; the dwelling and the job offers dynamics at the local (i.e. municipality) level. The municipality offers for jobs and dwelling can be parameterised with usually available data. The individual dynamics however is much more difficult to define and parameterise.
Thus, chapters 1.2 to 1.4 are dedicated to the parameterisation of individuals of this conceptual model for simulating the evolution of municipalities from the French département of the Cantal (150 000 inhabitants in 260 municipalities on 5726 km2). In practice, the purpose is to find out some submodels that correctly describe the evolution of the chosen objects. The design and the choice of relevant submodels is data driven, and if the link to data is straightforward in the basic microsimulation, it is not so easily manageable with individual based approach. Indeed in the dynamic microsimulation (which remains rare (Birkin and Wu 2012), the most common way to introduce change into the demographic structure is to apply static ageing techniques consisting in reweighting the age class according to external information. Such approaches avoid considering functions of evolution of the behaviour of the individual and their parameterisation. The multi-agent modelling, (Berger and Schreinemachers 2006) holds the promise of providing an enhanced collaborative framework in which experimental designers, modellers, and stakeholders may learn and interact, but the fulfilment of this promise, depends on the model empirical parameterization. Although multi-agent models have been widely applied in experimental and hypothetical settings, only few studies have strong linkages to empirical data (Fernandez, Brown et al. 2005) and the literature on methods of empirical parameterisation is still limited.
Chapter 1.2 focuses on the problem of the initialisation of the population, which is to be solved as a first step for every individual-based model. In theoretical studies, the population can be drawn from an arbitrary distribution. It cannot be the case for a model aiming at reproducing the evolution of a particular population. We need building a population as close as possible from the data of reference using a set of indicators chosen for their relevance with the general purposes of the model. In case of a human population, the reference data generally comes from censuses. A particular problem has to be solved by a model considering households and individuals. Indeed, some decisions, such as the residential move, concern households while other processes, as ageing or labour status, are specific to the individual only. In this case, the problem is hence building an initial virtual population fitting simultaneously reference data about households and individuals.
The classical method consists in starting from a sample of households, subset of the population for which all the attributes of each household and its individuals are known, and associate weights to each of these households in order to get the best fitting with available regional statistics. This is done using the classical Iterative Proportional Fitting process (Deming and Stephan 1940). However, this is not possible when no initial sample is available, which was our case. We propose a method starting from aggregated data and creating on the one hand the right number of individuals with their own properties and on the other hand the right number of households with their adequate size. Then, a heuristic allows filling the households with the created individuals while respecting the constraints given by available data about the relationship between individuals and households. Chapter 1.2 describes in more details the method and evaluate its efficiency. A more recent work, comparing an improvement of the classical sample-based IPF method with our sample-free method, shows this latter tends to be slightly better (Lenormand and Deffuant 2012).
Chapter 1.2 presents only a part of the initialisation of the population; it left out the initialisation of the individual labour status and place of work. The initialisation of the place of work is based on a new algorithm modelling the commuting. Several papers are dedicated to this work: one presenting the algorithm and several use cases (Gargiulo, Lenormand et al. 2011), an improvement of the algorithm solving the problem of closure of the system and making the algorithm universal (Lenormand, Huet et al. 2012b), and a comparison with other universal algorithms (Lenormand, Huet et al. 2012a). We did not select these papers in the present document because they are devoted to the initialisation of the model whereas we preferred to focus more this document on the dynamic modelling.
Chapter 1.3 focuses on the design and the parameterisation of the individual dynamics in the labour market. After a first step in which we collected various possible sources, we chose the European Labour Force Survey (LFS) and the National Censuses as our main data sources. They avoid making a lot of assumptions because a large part of their variables have the same definitions in both surveys. They contain data on age and situation (student, retired, actives, occupied or not, inactive)… allowing us to make a connection between the two sources of data. Moreover, they are “official” data sources which are regularly used by policy-makers and stakeholders. Hence, their variables correspond to the common knowledge they have about the social system. This makes the communication around the model easier and clearer.
We consider the basic classical statuses: student, unemployed, employed, inactive and retired. Moreover, we give the individual attributes describing her job: the socio-professional category (SPC) and the activity sector. In France, the socio-professional category (SPC) is available in the LFS and the French Censuses. The job offers of every municipality use also the SPC as a description of the job. The activity sector completes the description of the jobs. For example in France, we consider 24 different possible jobs (6 SPC in 4 activity sectors). The Labour Force Survey, particularly the Employment Survey which is the French declination of this survey, allows extracting the probabilities of transition between this various statuses. The European Labour Force Survey (LFS) is a continuous survey following the state of individuals over several years (3 years and more recently 18 months) during which they are interviewed several times. It is based on a very large representative sample and gives the weights for projecting it at different scales. We use these weights for extracting data for municipalities less than 50000 inhabitants which is more relevant in our study of rural areas.
We also extract from this survey the probability of the first profession of a young individual depending on the profession of the father. Depending on this first profession, we then extract the age distribution for entering the labour market.
Chapter 1.4 presents the parameterisation of the demographical dynamics. They are related to the formation and the disruption of couples, to the birthrate and the residential mobility sometimes leading to out-migration. We don’t have enough data at the Cantal level for using them to directly extract dynamics. Therefore we have to design them, and link them with the dynamics defined from data. For example, regarding a “giving birth” process, we have to decide if an increase of births is due to the increase of the number of births per individual or to a structural change of the population (more people in age and condition to have children). We made hypotheses allowing us to distinguish between these two possibilities. Then, for each case, we check if the number of births given by the data of reference is a possible result of the model. From this first phase, we conclude we need increasing the number of births per individual. We also perform an analysis of the variance of the number of births in order to identify the parameters having the biggest impact on the possibility for the model to be compatible with the reference data. More generally, the method is as follows. We assume different hypotheses in the dynamics and study their capacity to produce results close to data of reference chosen as they are directly impacted by the tested dynamics (for example the number of births for testing the “giving birth” process). In practice, we check that data of reference is a possible result of the model considering a large set of different possible values of the parameters of all the dynamics. In a second step, we study the sensitivity of other indicators to chosen dynamics considering only a subset of them. Indeed, this last study is restricting to selected parameters (and implicitly their related dynamics) on the basis of an analysis of variance. Applying this approach to the elaboration of the demographic processes, we conclude that:
• The implementation for Cantal requires an increasing number of births by individual to reproduce the number of births.
• A two-step dynamics should be considered for couple formation to reproduce the migratory and natural balances in Cantal: first, the annual decision of an individual searching for a partner; second, the searching strategy in terms of effort to meet a convenient partner. In practice, to fit the indicators of reference for the Cantal, a single should be limited in her motivation to search for a partner (i.e. the probability to search) at the same time the level of effort produced when she has decided to search one year has to be restricted (i.e. the maximum annual number of trials to meet someone convenient for her). If these conditions are not respected, couples, and then children, are too numerous.
• A constant probability for couples to split appears sufficient, to match natural balances and migratory balances in Cantal.
• A dynamics based on a limited spatial search for partner and dwelling (i.e. research at a maximum distance) and a probabilistic avoidance of the largest municipalities as a possible place of residence is necessary to reproduce the spatial characteristics of the evolution of the population.
In addition, in chapter 1.4, we collect some information about the relevant segments of value for each parameter. At the same time we identify the indicators which can probably be correctly reproduced by the model and those which cannot.
The work on this model is still in progress. A lot of investigations are still needed for a better understanding of the impact of basic dynamics and their interactions. The problem of modelling the evolution of the population, coming from the IMAGES project, led us to many others. Actually, the IMAGES project can draw a link between the two parts of this document, because the research presented in the second part can also be seen as initiated with questions and problems raised by this project.
Theoretical individual based models of social influence
During the IMAGES project, I developed a submodel computing the economic impact of organic farming adoption depending on the type of farms (Huet and Deffuant 2001; Deffuant, Huet et al. 2002; Lenormand, Huet et al. 2012a). This model, validated by the organic farming technician of the Allier Agricultural department showed that a large part of Allier farmers got an economic benefit if adopting the organic farming measure. However, very few farmers have adopted the measure, apparently because of an important cultural resistance and/or lack of information. This led me to study the dynamics of filtering or rejecting messages that could be the cause of the low adoption level. A theoretical approach appeared more relevant to better understand these mechanisms. Indeed while the decision-support models are generally applied, driven by data describing the problem, a theoretical approach can focus on the impact of a particular dynamics without being constrained by the data. It can be done with a model coupling several processes as we did in (Deffuant, Huet et al. 2005), for the adoption of agro-environmental measures by farmers on various prototypical case studies, with stylised farmers and stylised measures. It is more often done using very simple models involving only one or two dynamics as I did with Margaret Edwards on a binary decision model (Edwards, Huet et al. 2003; Edwards, Ferrand et al. 2005). In this approach, the dynamics is simple enough to explore extensively model trajectories.
The chapters of part 2 are devoted to the questions of filtering information and/or cultural resistance studied through a theoretical modelling approach. Four of them have been published in scientific journals, one has been presented in a conference and the last one is a discussion we wrote recently and which has not been published.
Chapter 2.1 proposes a simple model of information filtering. The model considers an object defined by a set of features, each feature being characterised by utility (or attitude, supposed shared by all individuals). The model supposes that individuals tend to ignore the features of an object which are not important enough or which contradict their current view. A feature which has not been ignored is saved in memory, and changes the individual’s global attitude towards the object. The global attitude is the sum of an a priori attitude toward the object and the attitudes towards the saved features.
The model assumes that an individual has filters which select only important features. The importance of a feature is assessed by comparing the absolute value of attitude towards the feature with a threshold. When the attitude towards the feature and the global attitude towards the object are of the same sign (congruent feature), the threshold is smaller than when the signs are different (incongruent feature).
Our individuals are all in contact with a media, communicating randomly chosen features of an object. An individual can hear about a feature from the media or from a peer with which she regularly discusses. An individual only communicates about her known congruent features because she is reluctant to talk about her incongruent features. As we know little about the incongruence threshold, we decided to consider two variants (Deffuant and Huet 2007):
• A constant threshold which is supposed to be an attribute of the individual expressing when someone considers something as important; it is called the Constant Incongruence Threshold model (CIT);
• A dynamic threshold which depends on the current global attitude value; it is called the Dynamic Incongruence Threshold model (DIT).
We firstly compared our model with the case of individuals informed by the media but not discussing and exchanging features between them. The media diffuses in a random order the features of the object. In our study, we consider a neutral object, meaning that the sum of all the feature attitudes is zero. It comprises:
• A set of negative features with an absolute attitude higher than the incongruence threshold of the individual –called major features;
• A set of positive features with an absolute attitude comprised between the congruence and the incongruence threshold –called minor features.
All the individuals have an initial attitude valued 0 and considered as positive, making the positive features congruent and the negative ones, incongruent. With the constant incongruent threshold model, negative features are always saved by the individual; on the contrary, the positive features are saved only if the sign of the individual’s global attitude is positive – if the individual’s global attitude becomes negative, the positive features are ignored.
The rational model assumes that all features are saved and the object is considered as neutral, whatever the order of the features. However, in our model, when the negative features are at the beginning, the object is finally perceived as negative, because once the major negative feature is saved, the positive features are ignored. That is why we say the model exhibits the primacy bias. We compared the two versions (constant or dynamic incongruence threshold). They slightly differ in their impact but both exhibit the primacy bias.
Table of contents :
Data driven integrated modelling of social dynamics
Theoretical individual based models of social influence
Part 1. Data driven integrated modelling of social dynamics
Chapter 1.1 The SimMunicipality model
Main entities, state variables and scales
Process overview and scheduling
Chapter 1.2 Generating the initial population
Materials and Methods
Chapter 1.3 Parametrisation of the individual activity dynamics
Designing and parameterising the individual activity
Lessons / Experience
Chapter 1.4 Parametrisation of the unknown laws of demography
The Cantal and its demography
How to model couple and birth dynamics in Cantal
How to model moving
Part 2. Theoretical individual based models of social influence
Chapter 2.1 Disregarding information – a model exhibiting the primacy bias
Diffusion of a two-feature object
Diffusion of more than two features
Chapter 2.2 Disregarding information – a double modelling approach
The individual based model (IBM)
Bird’s-eye view of the IBM for complex « interaction » cases
The aggregated model helping to predict the IBM
Chapter 2.3 Attraction-Rejection – designed from theories
2 Overview of the model
3 Analysis of several examples
4 Systematic analysis of the number of clusters
Discussion and conclusion
Chapter 2.4 Attraction-rejection – a double modelling approach
Number of clusters in the ABM when the uncertainty varies
Aggregate model at the limit of infinite population
Discussion and conclusion
Chapter 2.5 Attraction-Rejection – designed from experiments
2 The Dynamical Model of Interacting Individuals
3 Typical Evolutions of the population
4 Systematic experiments
5 Discussion and conclusion
Conclusions and perspectives
A methodological point of view on data and design
Data and design in data-driven modelling
Data and design in theoretical modelling
Developing the loop between data-driven and theoretical modelling