Mobile Data Traffic Prediction

Get Complete Project Material File(s) Now! »

Mobile Data Traffic Prediction

To what degree is the Internet traffic predictable? It is a question that has led to a number of attractive issues and has been continuously investigated since the invention of the Internet [50]. In this section, we review the state-of-the-art on the prediction of mobile data traffic. Our discussion is organized from two perspectives:
Aggregated mobile data traffic. In this perspective, we consider the mobile data traffic from the viewpoint of a mobile network operator. Such data traffic is aggregated over many mobile devices within the same cell, the same close geographical area, or the same service/application.
Individual mobile data traffic. Here we discuss in an individual viewpoint, i.e., the mobile data traffic that is generated by a single mobile device.
For each perspective, we briefly introduce the data traffic characterization and particularly present the practical prediction techniques. It is worth noting that, in this section, we focus on the studies on the Internet traffic and exclude those on other traffic (e.g., voice calls).

Literature on Aggregated Mobile Data Traffic

The investigation on aggregated mobile data traffic is mainly driven by the analyses on world-wide large-scale operator-collected datasets. For instance, such datasets that have nationwide populations are deeply mined in the relevant studies by Paul et al. [27] (USA), Hoteit et al. [51] (France), and Xu et al. [37, 38] (China).

Characterization

There are two major aspects with respect to the characterization, i.e., temporal dynamics and spatiotemporal correlation.
There is a general agreement on the regularity of the temporal variation of aggregated mobile data traffic [23]. Almost at the same time, Paul et al. [27] and Shafiq et al. [33] separately investigate the temporal evolution of aggregated mobile data traffic of cell towers and popular applications. They both find that such traffic follows a daily repetitive pattern over weekdays: in general, the traffic has low demand during nighttime and high demand during daytime. The same repetitive pattern is also observed by Xu et al. [37, 38]. Is is also remarked in [27, 33, 52] that the traffic over weekdays and weekends have different repetitive patterns and demands; a larger data traffic demand exists on weekdays than weekends. An interesting fact is that the temporal variations observed by Paul et al. [27] and Shafiq et al. [33] have different peak hours, which is also observed from other network traffic [23]. For this, a possible explanation is that such temporal variation under a higher temporal resolution partially depends on the area of study.
The spatiotemporal correlation exists among the data traffic generated by cell towers over many users in the same area. In the pure spatial perspective, the distribution of the data traffic is spatially heterogeneous: it varies over different regions as revealed by Paul et al. [27] and Xu et al. [37, 38]. Further, the latter authors find that the cell towers have similar data traffic profiles regarding their regions (i.e., resident, transport, office, and entertainment) and such profiles of adjacent cell towers are correlated. In the spatiotemporal perspective, the two papers show that the spatial heterogeneity above also varies over time: pick hours depend on regions. The former authors leverage a quantitive measure (i.e., the Moran’s I statistic) to evaluate spatiotemporal diversity of the data traffic. They find that in general, the imminent loads of adjacent cell towers are more correlated when these loads are high, but the correlation is relatively weak and almost disappears around midnights. Recently, [53] further investigates the spatiotemporal correlation and propose an approach to infer the hidden spatial and temporal structures of aggregated mobile data traffic.
Also, several studies reveal the spatial heterogeneity aggregated over applications. The earlier work by Trestian et al. [40] already shows that the Internet traffic over services and applications is consumed differently at home and work locations. Hoteit et al. [51] find that the data traffic loads of cell towers have different inner diversities among TCP- and UDP-based services. Later, the extended analysis by Shafiq et al. [39] finds that the data traffic aggregated by popular applications is strongly heterogenous over regions. This provides the capability of categorizing cell towers into four classes (web browsing, email, audio, and mixed traffic) with respect to the major applications in their data traffic loads.

Prediction

Some efforts have been put on the prediction of aggregated mobile data traffic. They aim at converting the observed dynamics and correlations above to practical prediction techniques. In the following, we review the proposed prediction techniques according to the level of the aggregation.
Cell-level data traffic. There is a common observation on the fact that the data traffic of cell towers has a high degree of both theoretical and practical predictabilities. Regarding the theoretical viewpoint, Zhang et al. [31, 32] investigate the limits of the theoretical predictability by observing the traffic of 7; 000 cell towers in China. They find that under the temporal resolution of 30 minutes, aggregated traffic (voice, text, and data) can be well predicted from the historical demand of the preceding 15 hours; the theoretical predictability of the data traffic is lower than that of the data flow of voice calls or text messages. They also find that the knowledge of the traffic demands of adjacent cells towers can enhance the theoretical predictability, but in a less degree on the data traffic than the others, which supports the quantitative evaluation on the spatiotemporal correlation by Paul et al. [27]. Their results ensure the capability of time series prediction techniques on the prediction of such traffic.
Regarding the practical prediction techniques, Xu et al. [37, 38] show that the cell-level data traffic is predictable via a linear combination of four primary components corresponding to human activities. Zang et al. [54] propose a mixed machine learning approach composed of K-means clustering, Elman Neural Network, and wavelet decom-position. An alternative prediction approach is proposed by Yi et al. [55]; it builds a complex network among cell towers, measures the traffic on the very important ones, and predicts the others’ traffic using Support Vector Regression – another machine learn-ing method. It can recover the whole picture of the traffic demand from only 8% of the total cell towers. In the opposite viewpoint, Nika et al. [36] perform an empirical study on data hotspots using a large-scale operator-collected dataset of 5; 327 cell towers, and show the availability of standard machine learning methods on the prediction of future hotspots (cells towers) of the traffic demand from the past history.
Application-level data traffic. The early paper by Keralapura et al.[56] proposes a technique to cluster users and their browsing profiles. The authors find that user behav-ior in terms of Internet surfing can be captured using a small number of clusters. Such heterogeneity of aggregated mobile data traffic is also investigated by Ying et al. [57]. Later, Shafiq et al. [33] uses a Zipf-like model to capture the distribution of application-level mobile data traffic and finds that the regularity makes the temporal variation of the traffic highly predictable from the history of the past demand using a simple Markovian method. Recently, Zhang et al. [58] design a mixed application-level traffic prediction framework that leverages the -stable modeled property and dictionary learning to sep-arately deal with the temporal variation and the spatial sparsity of the traffic. Marquez et al. [59] extend the analysis in [33] and reveal a strong heterogeneity in difference mobile service demands using correlation and clustering. They show that the temporal usage patterns are quite different from service to service. Besides, several works focus on the traffic generated by special services, such as chatting (e.g., WhatsApp [60] and WeChat [61]), video streaming [62], and mobile cloud [63].
In summary, the proposed techniques extend the technical bound on the prediction of mobile data traffic: they not only leverage the legacy tools that used for analyzing wired network traffic (e.g., the entropy, Markov property, -stable modeled property) to capture the temporal variation but also import several state-of-the-art machine learning tools to utilize the spatiotemporal correlation.

READ  How can aluminum matrix nanocomposites improve sustainability in the automotive industry?

Literature on Individual Mobile Data Traffic

A relatively small body of literature is on the investigation of individual mobile data traffic, which is also driven by data mining. Differently, the relevant studies utilize both large-scale operator-collected datasets, e.g., by Paul et al. [27] and Oliveira et al. [34, 52], and small-scale mobile crowdsensing datasets, e.g., by Jo et al. [35].

Characterization

The characterization from the individual viewpoint is performed by Paul et al. [27], Jo et al. [35], Li et al. [42], Oliveira et al. [34, 52], among others.
There is a general agreement on the heterogeneity of the data traffic, with respect to the user population and the time. It is shared by Paul et al. [27] and Oliveira et al. [34, 52]. They show that most of the total data traffic is generated from a small group of « heavy » users.
Regarding the temporal variation, both the authors above find that, in general, each user is highly active only in a few hours per day, and similarly, the temporal variation is different on weekdays and weekends, as in aggregate mobile data traffic. The latter authors [34, 52] find that individual mobile data traffic also follows daily repetitive patterns and the users also have peak and non-peak hours in terms of data traffic. In particular, they find that the variation of different hours within the same day is stronger than that of the same hours overs different days.
As to the spatiotemporal correlation, Paul et al. [27] point out that a user is usually active at only a few of his common locations. Jo et al. [35] mine a small dataset of locations and services of 124 users over 16 months and they identify the spatiotemporal correlations of service usage patterns.
Other dynamics with respect to social features are also revealed. For instance, Oliveira et al. [34, 52] find that the distribution of individual mobile data traffic is slightly heterogeneous over the age and gender; Li et al. [42] focus on the major smartphone operating systems and discuss the traffic dynamics and major application in each system.

Table of contents :

1 Introduction 
1.1 Predicting Per-user Mobile Data Traffic
1.2 Utilizing Operator-collected Datasets
1.3 Contributions and Thesis Outline
2 Background 
2.1 Mobile Data Traffic Prediction
2.2 Operator-collected Mobility Data Utilization
2.3 Summary
3 Datasets: Characteristics and Challenges 
3.1 Human Behavior Collection
3.2 Operator-collected Large-scale Datasets
3.3 Application-based Mobility Datasets
3.4 Challenge of Completeness
3.5 Challenge of Mobility Measurement
3.6 Challenge of Data Processing
3.7 Summary
4 CDR-based Trajectory Completion 
4.1 Terminology
4.2 Completing Instant CDR-based Trajectories
4.3 Completing Slotted CDR-based Trajectories
4.4 Summary
5 Per-User Mobile Data Traffic Prediction 
5.1 Terminology and Definitions
5.2 Characterizing Individual Mobile Data Traffic
5.3 Constructing Per-user Spatiotemporal Behavioral Data
5.4 Investigation through Temporal Dynamics
5.5 Investigation through Spatiotemporal Dynamics
5.6 Additional Investigation of Human Mobility
5.7 Summary
6 Conclusion and Outlook 
6.1 Summary of the Thesis
6.2 Limitation and Outlook
6.3 Concluding Remarks
Bibliography

GET THE COMPLETE PROJECT

Related Posts