Dimension Reduction and Clustering with NMF-EM

Get Complete Project Material File(s) Now! »

Sociological considerations

It is pretty obvious that a high quality public transportation system allows citizens to minimize their use of personal motorized vehicles to travel across the city.
Thus, [Litman, 2016] showed that it also have a positive impact on user’s health. Indeed, he proved that it reduces vehicles accidents and pollution emissions while increasing passenger’s mental health and fitness, as they walk more to access stations and stops than people using their vehicle. Moreover, an efficient network permits disadvantaged people with no car to access more neighborhood and thus more services (such as shops or health services) and to improve their lifestyle. The authors of [Gendron-Carrier et al., 2018] also showed that when a new subway line is opened, a decrease in pollution particles is measured. In [Zion and Lerner, 2017], mobility pattern are used to understand the different neighborhoods of the city, and then understand the sociology of the city.

Economical considerations

In the monocentric city model we mentioned earlier, commute trips by private mode have often been studied. Indeed, in [Arnott et al., 1993; Anderson and De Palma, 2007] for example, the authors studied congestion at one city gates and in its parking during peak periods. Although American cities – on which most of these studies are based – have generally little developed infrastructures of public transportation, it is not the case of European cities. Unfortunately, we found only a few references dealing with the impact of a developed public transportation network on the economic health of the city.
Some papers [De Palma et al., 2015, 2017] focused on congestion issues in public transportation for commuters. Both these papers highlight the commuters. Indeed, trains with larger capacities will allow more comfort to peak-period passengers, but an optimal timetable will also allow travelers willing to avoid crowd to arrive earlier or later to their destination in less crowded trains. This last proposition permits a congestion spread. However, [Litman, 2015] studied the differences between cities with a large rail infrastructure, cities with a small rail infrastructure and cities with no rail infrastructure. The author stated that a bigger rail infrastructure – thus cities with a more efficient network – implies bigger ridership by capita, less traffic fatalities and less households budget allocated to transportation. Then, budgets are higher for other goods or services.
All these results show that an efficient transportation network implies a healthier and wealthier local economy. Another example is given by [Pang, 2018], which shows that a dense network facilitates the employability of low-skilled workers.
The economic studies mentioned here show that an optimal public transportation network not only have a better attractiveness for citizens, but also appears to be an axis of economic development for cities. Everything that has been described in this thesis so far motivates the detailed study of transport data.

Smart Cities and Urban Data

In traditional studies, yearly data are used. But since the development of digital miniaturization, every conceivable type of object contains a computer that generates huge quantities of data. And as [Batty, 2013] present it, we are now able to better understand how cities function on much shorter term than before. As seen previously, traditional studies focus on location of land use and long-term functioning in cities. However, with these new ubiquitous sensors, it is easier to study movement and mobility than before. In the same way we call our phones « smart » since they are small computers, we can now boast about smart cities. This term first appeared in 1994, but is more and more used in many papers. A full definition of smart cities is obtained by combining the works of [Dameri and Cocchia, 2013; Nam and Pardo, 2011; Harrison et al., 2010] to name but a few. The goal of a smart city is to offer the best life conditions to its citizens and visitors.
To achieve it, the city needs to be instrumented, that is being able to collect a large amount of data through the use of senses, meters, kiosks, personal devices, cameras, appliances, smart phones, implanted medical devices and even social networks. The authorities then need to be interconnected by implementing a platform where these data can be stocked and communicate about it among the several city services. Finally, the city needs to become intelligent, by including analytics, modeling, computing and visualization. All these can then serve to solve more and more recurring urban issues, such as road congestion, noise and air pollution, energy and water consumption and waste treatment.

READ What were the missionary goals and strategies?

Table of contents :

1 Introduction (French)
1.1 Contexte et motivations
1.1.1 Considérations sociologiques
1.1.2 Considérations économiques
1.1.3 Villes intelligentes et données urbaines
1.1.4 Transdev
1.2 Résumé substantiel des chapitres
1.2.1 Segmentation
1.2.2 Régression et Prévision
2 Introduction (English)
2.1 Context and motivations
2.1.1 Sociological considerations
2.1.2 Economical considerations
2.1.3 Smart Cities and Urban Data
2.1.4 Transdev
2.2 Summary of the chapters
2.2.1 Clustering
2.2.2 Regression and Forecasting
3 NMF as a pre-processing tool for clustering
3.1 Introduction
3.2 The data
3.3 Results obtained by EM
3.4 Results obtained by NMF
3.5 Conclusion
Analyse de données volumineuses dans le domaine du transport
4 Dimension Reduction and Clustering with NMF-EM
4.1 Introduction
4.2 Factorization of mixture parameters and the NMF-EM algorithm
4.2.1 Factorization of mixture parameters
4.2.2 The NMF-EM algorithm
4.2.3 The NMF-EM algorithm for mixture of multinomials
4.2.4 Discussion on the choice of H and K
4.3 Simulation study
4.4 Application to ticketing data
4.4.1 Description of the data
4.4.2 Passenger profile clustering
4.4.3 Stations profile clustering
4.4.4 Passengers profile clustering on another network
4.5 Conclusion
5 Forecasting and anomaly detection
5.1 Introduction
5.2 Data presentation
5.3 Modelization
5.3.1 Linear model
5.3.2 Generalized Additive Model
5.3.3 Random Forest
5.4 Confidence intervals
5.5 Application: impact of the 2018 SNCF social strike on one network
5.5.1 Introduction
5.5.2 Model selection
5.5.3 Results
5.6 Conclusion