Live Video Streaming Traffic Studies
Many papers have studied video network bandwidth usage over the Internet. In particular, the authors of [IP11] have dealt with more than five years of users web traffic data to examine different characteristics of Internet usage. They highlighted the increasing importance of video content (up to 28% within the five years). In [FMM+11], the YouTube traffic generated by mobile devices is compared to the traffic generated by regular desktop computers. Their results showed access patterns, which are similar across the sources of traffic. In the example illustrated by Figure 2.4 we observe that people using different devices and networks are interested by the same type of content in YouTube: short videos. In both environments, half of the population watches videos shorter than 4 minutes and smaller than 20 MB. In [ZLAZ11] the total amount of YouTube videos allows the authors to draw conclusions about the bounds of total bandwidth and storage space that is necessary for YouTube.
This study emphasizes the critical needs of resources for video systems. The video traffic generated by YouTube is analyzed from the standpoint of an ISP in [AJZ10]. Overall, these studies have emphasized the importance of services like YouTube over the whole Internet traffic and the exploding needs of resources to serve the population. For our first contribution, we use similar techniques to analyze the behavior of people who contribute to a live video service as well as the bounds of total bandwidth usage for live videos delivery.
Many measurement campaigns have been conducted to understand the motivations of contributors to UGC platforms. In particular, the YouTube system has been extensively studied since 2007 [CKR+07]. Typically, a study of YouTube uploaders behavior is given in [DDH+11], where it is explained that the most popular uploaders upload copied content. However, to the best of our knowledge, only few papers have addressed data traces with user-generated in live platforms. Two of them deal with “gamecasting”, i.e. gamers capturing and broadcasting their activity within a game. In the first one [KSC+12], eSport and Twitch users behavior are discussed. During a 100 days trace, evidence of the relationship between peaks of popularity in the platform with the major eSports events were shown and are illustrated by Figure 2.5. A prediction of session popularity based on its early popularity is proposed, while in our contribution we offer prediction based in the past sessions. Typically, scheduling of tasks related to the session processing and delivering should be done as soon as the session starts, therefore at this point of session life cycle early popularity is not available. The authors of [SI11] study XFire, which is a social network for gamers featuring live video sharing. The authors focus on analyzing the similarities between the activity of gamers in XFire and their activity in the actual games. Another study dealing with live video sharing is [VAJ+06]. The authors analyzed 28 days of data from two channels associated with a popular Brazilian TV program that aired in 2002. Our contribution differs fundamentally in a quantitative manner since we evaluated several thousands of channels.
Adaptive Bit Rate Streaming
The DASH standard [Sto11] is a popular ABR design. This standard has been the common choice of various academic studies and industrial implementations. As we previously presented in Section 2.2, ABR can demand extra computing power for the transcoding of different video representations. This overhead was evaluated on DASH for low latency scenarios. The overhead represents only 13% of the overall video streaming process [BCF14].
Even with a relatively small overhead for one live session, it can generate an important stress on infrastructure in the case of UGC live streaming, where the scale of concurrent sessions can be of thousands broadcasters. We explore in our contributions the trade-off between the benefits of the ABR and this overhead of computing power for the transcoding operations.
A CDN live DASH approach is analyzed by [LSRT14]. Both theoretical formulation and practical implementation are given. The focus of this work is to maximize viewers QoE subject to under provisioned CDN infrastructure. An implementation of DASH assisted by P2P has reduced the servers outgoing network bandwidth up to 25% thanks to the peer assistance [LMT12]. An optimization based on viewers QoE for the DASH standard for VoD services was explored in [JdV14]. Similar our transcoding contributions aim viewers QoE but on live services. We consider service providers assisted by CDN that have the needed infrastructure to satisfy all viewers. And in one of our contributions the bandwidth burden on the service providers servers is reduced by migrating it to the CDN.
A dynamic scheduler for transcoding jobs of DASH that allows highpriority process and load balance in a cloud environment was designed in [MSZ14]. Although near to live transcoding, their objective was different from ours. They aim for video completion time, system load balance, and video playback smoothness while in our contributions we target viewers QoE and resources costs reduction.
Filters Used to Clean Up Traces
We observed in the measurements that a significant number of channels were typical from a broadcaster who tests the service. Two main behaviors were identified. The first one is a broadcaster who launched a channel for only one session with a duration lesser than ten minutes overall in the three months. In other words, there is only one occurrence of this channel over the whole set of traces. The second type of “tester” is the one who set a channel with sessions longer than ten minutes, but the channel has remained with no viewer at all during the analyzed period.
Status of Live Streaming Services
The first question to answer is how big the live streaming system can be. To approximate how much bandwidth is used by each of the systems we summed up the bit rates multiplied by the number of viewers. In the case of YouTube Live, where the bit rates are not available, we attributed the average value of 2 Megabit per second (Mbps) from Twitch sessions as the bit rate for all YouTube Live channels.
In Figure 3.3 we present the results of the bandwidth approximation. The calculated line indicates, for both services, the sum of average bit rate (2 Mbps) multiplied by the number of viewers in each channel. The estimation line indicates for Twitch the sum of each session bit rate multiplied by the number of viewers. It was not possible to make an estimation for YouTube Live since its API does not offer the sessions bit rate. Both services had peaks of bandwidth of more than 1 Terabit per second (Tbps) on the 14th day. On Twitch these peaks near and over 1 Tbps are frequent. This information about the volume of bandwidth consumption is not only important for the live streaming services themselves but also for ISP and operators, who need to deliver all this content information to end users. Also remind that this content is live, and therefore it can not be pre-fetched or previously cached, and there is new content at every moment.
Zipf’s Law in UGC Live Streaming
The distribution of popularity found on UGC systems and VoD typically follows the Zipf’s law [AH02]. The Zipf’s law function is given by Equation 3.1. The function variable α is the Zipf rank exponent. This exponent will dictate the popularity homogeneity. The bigger is the exponent, the bigger is the difference of content popularity. For example, the difference between 0.5 exponent and 2, is that the difference between the popularity of the content ranked as first and second (as well for other ranks) is bigger for exponent 2. Fi(x) = Aix−α (3.1).
We checked with our traces whether live videos followed Zipf’s law as well. First, we represent in Figure 3.8 examples of popularity distribution found on YouTube Live traces at two different hours picked on January 6, 2014. With the traditional logarithmic scales, we then produced an approximation of the Zipf parameters using a fitting curve process on the R software [R C14].
Identifying Popular Broadcasters Sessions
As previously explained, the most popular sessions should be identified as early as possible, if possible immediately when they start, in order to decide the delivery mean and to dimension the infrastructure (transcoding and delivery). Furthermore, the results related to the Zipf distribution of popularity indicate that the most popular channels are more popular (hundreds of thousands more) than the long tail, which puts even more pressure on identifying them early. We selected the 1% most popular channels of both services. We defined them as simply popular.
The most obvious characteristics of channels are the length of their sessions, the interval between sessions and the number of sessions that we observed during the three months. Intuitively, the most popular channels can be identified from these three characteristics. For each characteristic, we distinguish three “bins”. To select the partition bins, we took the total group of channels and divided equally into three parts for each characteristic. We then applied the same limits of the total division for the popular group. The description of the characteristics and partition limits used are described at Table 3.3. The results obtained by the partition are depicted by Figure 3.10.
Table of contents :
1.2 Live Streaming Services Challenges
1.3 Summary of Contributions
1.4 Thesis Organization
1.5 List of Publications
2 State of the Art of Live Streaming Systems
2.3 Live Video Streaming Traffic Studies
2.4 User-Generated Content
2.5 Multimedia Delivery Architectures
2.5.1 Video Delivery Models
2.5.2 Composing Hybrid Delivery Models
2.6 Video Transcoding
2.7 Adaptive Bit Rate Streaming
3 Live Streaming Sessions Data Set
3.2 Live Streaming Providers
3.2.2 YouTube Live
3.3 Data Retrieval
3.4 Filters Used to Clean Up Traces
3.5 Status of Live Streaming Services
3.5.1 How Big are the Systems?
3.5.2 Are they 24/7 Services?
3.5.3 Zipf’s Law in UGC Live Streaming
3.6 Identifying Popular Broadcasters Sessions
3.6.1 Broadcasters Characteristics
3.6.2 Video Quality and Popularity
4 Mapping Sessions to Servers
4.3 Mapping live video sessions on broadcasting servers
4.3.1 Popularity predictability discussion
4.3.2 Number of servers versus bandwidth usage trade-off
4.3.3 Taking video sessions popularity into account
5 Mixing Data Center and CDN for Delivery
5.2 Model for Hybrid Delivery
5.3 Theoretical Optimization Problem
5.4 Motivations for Hybrid Delivery
6 Transcoding for Adaptive Streaming
6.2 DASH Sessions Data Set
6.3 Which Channels to Transcode
6.3.1 Trade-off and Problem Definition
6.3.2 An On-the-Fly Strategy
6.3.3 An At-Startup Strategy
6.4.3 Playing with Strategies Parameters
7.1.1 Live Sessions Data Set
7.1.2 Cloud Delivery
7.1.3 Hybrid Delivery
7.1.4 DASH on Live Streaming
7.1.5 Data Set Applications
7.1.6 Additional Contributions
7.2.1 Model Extension
7.2.2 Statistical and Learning Mechanisms
7.2.3 Middleware Integration
A Algorithms in Pseudo-code
B Résumé Étendu en Français
B.2 Service de diffusion directe de vidéo en ligne
B.3.1 L’ensemble de données des sessions en direct
B.3.2 Livraison par le nuage
B.3.3 Livraison hybride
B.3.4 Direct avec DASH
B.3.5 Applications de l’ensemble de données
B.3.6 Contributions additionnel
C Live Sessions Data Set Applications
C.2 CDN Fairness on Live Delivery
C.2.3 Maximizing the CDN revenue
C.3 Transcoding Live Adaptive Video Streams in the Cloud
C.3.2 Current Industrial Strategies
C.3.3 Transcoding CPU and PSNR Data Set
C.3.4 Optimizing Stream Preparation
C.3.5 A Heuristic Algorithm
C.4 Appendix Conclusion