Chasing Linux Jitter Sources for Uncompressed Video

Get Complete Project Material File(s) Now! »

Chasing Linux Jitter Sources for Uncompressed Video

As discussed in chapter 1, the development of network-based transport for media streams — such as SMPTE 2022-6 and SMPTE 2110 — is an enabler for software-based media processing on commodity servers, because those have Network Interface Cards (NICs), and, therefore, can receive, pro-cess, and transmit network packets. However, those streams have high data-rates — around 1:5 Gbit=s for SMPTE 2022-6 — and high packet-rates — around 135000 packets per second for SMPTE 2022-6 — causing Packet Inter-arrival Times (PIT) in the order of a 7:41 µs. Moreover, chapter 1 de-tailed the reasons justifying those streams undergoing Constant Rate (CR) packet transmission, and justifying their sensitivity to packet jitter. As a consequence, it is critical to understand and to be able to quantify the jitter introduced by software-based packet processing of CR packet streams, when it is performed on commodity servers running general-purpose Operating Sys-tems (OS). An understanding of this jitter informs of the suitability of using commodity servers for software-based media-processing, and informs on the buffering capacity required at a SMPTE 2022-6 or 2110 receiver consuming a software-processed media stream.
In this chapter, the term Video Processing Function (VPF) generically designates a piece of software receiving a packet-based media stream.

Related Work

Understanding jitter on general-purpose OS’es has, especially, been stud-ied for real-time or High-Performance Computing (HPC) applications. OS jitter quantifies how unpredictable the performance of a running applica-tion will be. An experimental analysis of the effects hereof on CPU-bound tasks in a distributed HPC environment is given in [61] – which shows that jitter affects the overall performance of multi-stage workloads, where each stage is running on parallel nodes. Specifically, jitter significantly impacts the synchronisation steps between each stage, incurring a significant waste of computing capacity. In-kernel methods to quantify accurately the contri-bution of each jitter source to the overall system jitter are developed and evaluated in [62, 63].
For hard real-time applications, a deterministic lower bound on the per-formance is required. A recurring problem is determining the variability of the response time i.e., the total elapsed time from when an interrupt request is raised, and until the corresponding application-level thread is scheduled. From this perspective, [64] compares Real-Time Operating Systems (RTOS) and general purpose OS’es, in the context of embedded systems used in ex-perimental nuclear physics.
Aside from the analysis in [65] of periodic networked systems with events in the order of 100 µs on a FreeBSD-based Commercial Off-The-Shelf (COTS) server, little attention has been given to characterising jitter on periodic events.
Yet, with SMPTE 2022-6 receivers expecting a CR stream giving rise to a packet arrival time with a periodicity in the order of 7:41 µs, if a VPF is to be successfully executed on a COTS server, a granular understanding of its jitter properties is required.

Statement of Purpose

This chapter characterises the jitter, introduced by a COTS x86 server running a Linux-based operating system, upon reception of network packets corresponding to a SMPTE 2022-6 video stream. This includes an analysis of the packet reception path in the Linux kernel, an enumeration of identified jitter sources, and an experimental quantification of the relative contribution of each of these.

Chapter Outline

The remainder of this chapter is organised as follows: section 2.1 describes the data-path taken by a packet, from wire to application, enumerating the potential sources of jitter that can be encountered. Section 2.2 motivates and introduces the experimental setup used to quantify these sources of jitter, which is then used for producing the results presented in section 2.3. This chapter is concluded in section 2.4.

From wire to application

The jitter sources along the path of a packet through a COTS server, depicted in figure 2.1, from its arrival at the Network Interface Card (NIC) until it is delivered to an application, are analysed in this section.

From Wire to Interrupt

When a packet arrives at the NIC, it is decoded and copied into RAM using Direct Memory Access (DMA). DMA allows external devices, such as NICs, controlled access to a portion of the CPU’s RAM.
DMA uses the system PCI bus, which is a shared resource with potential contention for access – and hence, is a potential source of jitter. Another source of jitter is access to the RAM itself, since the NIC (hardware), and the NIC driver (part of the operating system) will be competing for access hereto. Finally, multiple layers of cache, which are shared with all the processes of a CPU, can introduce further jitter during the phases of copying data to and from RAM.
Once a packet has been copied into RAM, the NIC raises an interrupt to signal to the CPU that a new packet is available. Interrupts are also raised through the system PCI bus, where contention may again introduce jitter. However, some NICs also implement Interrupt Rate Throttling (ITR), which delays or suppress some interrupts from being raised, so as to avoid interrupt overload at high data rates. While this feature does reduce the OS per-packet processing cost, it does constitute an additional source of jitter, especially among packets received in a periodic stream. Illustrating this by a simple example, if ITR suppresses 9 out of 10 interrupts, then packet number 1 in a stream will incur a further delay of receipt of another 9 packets before an interrupt is raised, and it can be processed, whereas receipt of packet number 10 will cause an interrupt to be raised immediately.

From Interrupt to Application

A raised interrupt triggers a call to the kernel Interrupt Service Routine (ISR). The time from an interrupt is raised, and until the beginning of the execution of the ISR can vary, e.g., due to other higher-priority or non-masked interrupts, or the need to awaken the core executing the ISR from suspension. Thus, this constitutes a potential jitter source.
Execution of the ISR is the first event, which can be timestamped in software by the operating system. In Linux, specifically, this is the irq_entry event. Then the ISR calls the New API (NAPI) component, which attempts to reduce the load induced by network activity on the CPU during high load scenarios, by processing packets in bursts. Thus, this also constitutes a potential jitter source.
NAPI calls the NIC driver, which fetches the packets from RAM (where they had been placed using DMA by the NIC) – and hands these off to the set of kernel components constituting the network stack, see figure 2.1, for further processing (decoding received packet headers, extracting metadata corresponding to the different network layers, etc). This processing is subject to optimisations such as memory prefetching, cache hits, etc., and therefore also constitutes a potential jitter source.
The processing step in the networking stack is to identify if a given packet matches an open socket – i.e., if there’s an application able to receive the payload of the packet. If there is, and if the application process is sleeping, or is waiting for data from this socket, it is awoken by the kernel – which requires (i) a call to the scheduler and (ii) a context switch. These two operations also constitute a potential jitter source.

Network-Independent Jitter

In addition to the jitter sources within the data-path itself other sources of jitter — henceforth network-independent jitter — exist. Essentially, those consists of events that temporarily interrupt packet processing anywhere on the path discussed in section 2.1.2 and illustrated in figure 2.1.
First, the Linux kernel’s scheduler can preempt running processes. Sus-pending a running process from execution will cause jitter, as the process will not be able to perform any action during the time it is not scheduled.
Second, hardware interrupts take precedence over any other kernel-space or userspace task. Thus, a non-masked interrupt being raised will trigger the kernel ISR, interrupting any other execution on the CPU core charged with handling that interrupt. This can introduce jitter in any part of the stack – noting that a high-priority interrupt being raised can delay the execution of the ISR corresponding to a packet arrival.
Completely transparent to the operating system, System Management Interrupts (SMI) literally steal control of a CPU from the OS, for doing low-level house-keeping tasks. With no direct proof of their execution provided to the operating system kernel, SMIs are both a potential jitter source and are very hard to detect. One possible way to detect SMIs is to run an infinite loop polling the current time and to detect gaps in those measurements.

READ  Similar third-party payment systems in China

Experimental Setup

To quantify the contribution of the potential jitter sources, identified in section 2.1, to the overall jitter of a VPF receiving a SMPTE 2022-6 stream, each is studied in an isolated environment.

A Packet Sink VPF

In order to eliminate any application impact (such as memory bandwidth consumption, CPU cache pollution, etc), a “packet sink VPF” with minimal application behaviour is used: on receipt of a packet, the application gen-erates a timestamp, drops the packet without inspecting the payload, and computes the sequence of packet inter-arrival times T . The resulting time series can then be analysed to quantify the jitter introduced by the server.
To differentiate between hardware-level and kernel-level jitter, another source of timestamps is needed – at the ingress of the kernel. For this purpose, the Linux kernel event tracing subsystem1 is used, to record events at key steps of the packet data path. In particular, this allows recording timestamps for ISRs triggered by interrupts raised from the NIC, thus providing a second time series related to packet arrivals.

Quantitative Scope

The test setup is illustrated in figure 2.2, where the stream at the ingress of the server is assumed to be CBR. Understanding to what extent this assumption is true is necessary, to be able to interpret the recorded time-series meaningfully.
Thus the COTS server in figure 2.2 was substituted by a SMPTE 2022-6 hardware network analyser2 . The measurement results are depicted in figure 2.3 and show a standard deviation = 0:6µs in the distribution of the packet inter-arrival times. The stream received in the experimental setup is, therefore, CR with a precision of 1 µs , and any sub-microsecond packet delay variation observed, therefore, cannot be attributed to the server hardware or software in this setup.

Hardware setup

The hardware setup is as follows, with reference to figure 2.2:
(a) A commercial SDI to SMPTE 2022-6 converter configured to output a SMPTE 2022-6 stream encapsulating a 1080i 29.97 frames per second video test pattern is used as CR Generator.
(b) A server with two Intel(R) Xeon(R) CPU E5-2690 v4 is used as the COTS server.
(c) An Intel(R) XL710 with a 40 Gbit=s optical interface is used as the Ingress NIC.
(d) The interconnection between the packet generator and the COTS server is implemented by a Cisco Nexus 9000 fully non-blocking switch.

Experiments and Results

This section experimentally quantifies the contribution of each source of jitter, as enumerated in section 2.1. In the setup of section 2.2, all known sources of jitter eliminated, a baseline is established. Th2.3.1 Baseline: Minimal Jitter
To eliminate external jitter sources, some of the available CPU cores are isolated from the scheduler, and assigned statically and exclusively to executing the (user-space) VPF, handling NIC interrupts, and handling other (non-NIC) interrupts. ese sources

Baseline: Experiments and Results

To differentiate between network-independent jitter, which was defined in section 2.1.3, and the jitter introduced by the processing of incoming packets, a dummy program is implemented; it consists of a loop, busy waiting for 7:41 µs and generating a timestamp at each iteration. Therefore, this dummy program has no interaction with the network stack and is able to provide measurements of the network-independent jitter of the system. In that setup, the sequence of timestamp should increase by 7:41 µs at every iteration, unless the dummy program is somehow interrupted. In that case, the time series T corresponding to the difference between a sampled timestamp and the next one in the loop would show some spikes in the same order of magnitude of the network-independent jitter.
Figure 2.5 shows the results obtained with the dummy program running for one million iterations — corresponding to 7:41 s in the baseline configur-ation. Four spikes in the order of 15 µs can be observed, which gives an idea about the minimum jitter that can be observed on such a system, independ-ently from the network stack.
In that same setup, figure 2.6b depicts the time series of packet inter-arrival times as measured by the VPF, while figure 2.6a shows the time series of the duration between two consecutive ISR, this data being obtained with the kernel tracing subsystem. Confronting both figures as well as fig-ure 2.5 suggests that the jitter seen by the VPF is a mixture of (i) network-independent jitter as shown by the similarity of the spikes in figure 2.5 and figure 2.6b, and (ii) network jitter as shown by the similarity of the 1 µs-wide noise around the 7:41 µs average in figure 2.6a and 2.6b.
Figure 2.7a and figure 2.7b give finer-grained information about the jitter introduced by the network stack itself i.e., from ISR to the VPF. For ex-ample, there is a small but noticeable amplification in the standard deviation between the distribution of T at the ISR level and the distribution at the VPF-level which corresponds to the jitter introduced by the network stack. Moreover, the clustering and discrete patterns observed in figure 2.7b can be plausibly explained by associating each cluster to a succession of events that happened during the packet processing. In other words, each cluster could correspond to a possible code path.

Table of contents :

1 Introduction 
1.1 Professional Broadcasting
1.1.1 Media Distribution: From Internet Protocol Television (IPTV) to Over-The-Top (OTT)
1.1.2 Media Production: The Serial Digital Interface (SDI)
1.2 Media Production on Commodity Hardware
1.2.1 Packetising SDI
1.2.2 Software-based Media Processing on Commodity Servers: Challenges and Limitations of SMPTE 2022-6
1.3 Thesis Contributions
1.4 Publications and Software Production
2 Chasing Linux Jitter Sources for Uncompressed Video
2.1 From wire to application
2.1.1 From Wire to Interrupt
2.1.2 From Interrupt to Application
2.1.3 Network-Independent Jitter
2.2 Experimental Setup
2.2.1 A Packet Sink VPF
2.2.2 Quantitative Scope
2.2.3 Hardware setup
2.3 Experiments and Results
2.3.1 Baseline: Minimal Jitter
2.3.2 Baseline: Experiments and Results
2.3.3 ISR Start Of Execution
2.3.4 Linux Scheduler induced jitter
2.3.5 Interrupt Throttling jitter
2.4 Conclusion
3 OP4T: Bringing Advanced Network Packet Timestamping into the Field 
3.1 Related Work and Limitations
3.1.1 Performance
3.1.2 Programmability
3.2 Hardware Architecture
3.2.1 Packet Flow
3.2.2 Timestamp Acquisition
3.2.3 Reconfigurable Packet Processor
3.3 Implementation
3.3.1 Overview
3.3.2 DMA Core integration
3.3.3 P4 Packet Processor and Partial Reconfiguration
3.3.4 Discussion
3.4 Case Study: OP4T for Software Switch Testing
3.4.1 Scenario
3.4.2 OP4T-SST Packet Processor
3.4.3 Precision and Cross-connect
3.5 Evaluation
3.5.1 FPGA Resource utilisation
3.5.2 Experimental Setup
3.5.3 Results
3.6 Conclusion
4 High-Accuracy Packet Pacing on Commodity Servers for Constant-Rate Flows 
4.1 System Model
4.1.1 Time sequences
4.1.2 (b; f)-paced streams
4.2 Limitations of a pure software approach
4.2.1 Software Execution Model
4.2.2 Timers
4.2.3 Timer limitations: drift
4.2.4 Latency
4.2.5 Quantitative analysis of the impact of SMIs
4.3 Pacing with a Pacing-Assistant
4.3.1 Assisted Pacing
4.3.2 PA-based free-running pacing
4.3.3 PA-based frequency-controlled pacing
4.4 Analysis
4.4.1 Safety
4.4.2 Free-running pacer period
4.4.3 Frequency-controlled pacer period
4.4.4 ALT-Jitter
4.5 Constructing a PA and a frequency-controller
4.5.1 Constructing a Pacing-Assistant
4.5.2 Constructing a frequency controller: basic version Fb
4.5.3 Constructing F: NW-regularized version, Fr
4.5.4 Implementation considerations of algorithms 1 and 2
4.6 Experimental Evaluation
4.6.1 Setup and methodology
4.6.2 Results
4.6.3 Experimental qualification of F
4.6.4 Operational perspective
4.7 Discussion
4.7.1 Practical impact of jitter reduction
4.7.2 Quantitative impact of drift compensation
4.8 Conclusion
5 vMI: Software Architecture for Transparent High-Performance Media Transport 
5.1 Motivation
5.1.1 SDI-based media production: analysis
5.1.2 Processing high-throughput packet streams for mediaproduction
5.2 Overview of the vMI framework
5.2.1 Main Concepts
5.2.2 The Flow of a vMI frame
5.2.3 Disaggregated media-processing
5.3 High-performance vMI frame transport
5.3.1 Interprocess vMI frame sharing
5.3.2 Kernel-bypass networking
5.4 Implementation
5.5 Evaluation
5.5.1 Experimental methodology
5.5.2 Microbenchmarks
5.5.3 Full media-processing pipeline
5.6 Conclusion
6 Conclusion 
A Mathematical Proofs for Chapter 4
B Résumé en français
List of Figures
List of Tables
List of Algorithms


Related Posts