State of the Art and Challenges

Get Complete Project Material File(s) Now! »

In this chapter, we survey the related work in IoT, explain why we believe IoT networks can be deterministic, and expose the open issues and challenges.
.
The IoT Standards

A Diversity of Applications

As IoT is a vast term, many protocols can be associated with it. In this section, I give a brief overview of the diﬀerent protocols classified by category of appli-cations presented in Chapter 1, namely Home Automation, Smart Metering, Environmental Monitoring, IIoT. This categorization might not be accurate for each type of application and is only used here to give the reader a first understanding of the variety of possible scenarios IoT can cover.
Home Automation: In Home Automation a small number of devices are deployed within a small range (e.g < 15m) to provide monitoring (temper-ature, humidity, noise…) and actuation (opening the windows, turning oﬀ the lights…). We only consider the devices communicating wirelessly, al-tough Home Automation also includes devices communicating using wires. The most broadly adopted wireless standards used in Home Automation are IEEE802.11 (WiFi) and IEEE802.15.1 (Bluetooth). WiFi-based networks consume a lot of energy (i.e. > 10mA on average) and need to be powered with wires (or recharged frequently, as it is done with WiFi-enabled phones). Bluetooth based networks consume less energy, especially since the Bluetooth Low Energy (BLE) appearance, and targets applications that require a small amount of sensing points. Bluetooth devices can be powered by battery and can last a few years when using a low data rate. In its latest versions, Blue-tooth can form mesh topologies but only with a limited number of devices. In this mode, the lifetime of the devices that relay information does go beyond a few weeks. Then comes 802.15.4-based networks, technologies such as Thread or Zigbee oﬀer years of operation with devices powered by batteries. Those technologies can be densely deployed and cover larger areas than Blue-tooth using a single gateway as they are meant to form a mesh topology. The last one I would like to mention is Z-Wave. Unlike the previously presented technologies that use the 2.4 GHz ISM frequency band, Z-Waves uses sub-GHz frequencies and thus can form links with longer range. Z-Wave can also form self-organizing mesh topologies to increase range and reliability. Again, this list is not exhaustive and many other technologies exist. I am only pre-senting the mostly adopted ones to illustrate the diversity of approaches.
Smart Metering: As said in Chapter 1, Smart Metering applications do not require low latency and high data-rate. Usually, only a few bytes are generated periodically (i.e. every hour) per meter. Technologies include Ultra Narrowband-based solutions like Sigfox, or Spread Spectrum-based solutions like LoRa. The Smart Utility Networks (SUN) task group is also working on a PHY amendment to address such applications. Note that Smart Metering can also be done using Power Line Carrier (PLC) but this is out of scope.
Environmental Monitoring: Those applications are very similar to Smart Metering as they are not producing a lot of data and rarely need to be densely deployed. As they are located in remote areas, they do not have a strong requirement in being able to cope with external interference or multi-path. Remote locations are however more likely to be to far for any cellular connec-tivity or power supply and thus require low energy consumption. They also need to be extremely solid and reliable as they need to work for years without human intervention in harsh condition (temperature, humidity, rain, wild life, …).
Industrial IoT: Unlike previously presented applications, IIoT requires the combination of high reliability, years of battery lifetime, and be able to work even when densely deployed in radio environment where multipath and external interferences are present. This is the type of applications we focus on in this work. To the best of my knowledge, the only solution that can provide such guaranties today are TSCH-based technologies. We will define what it is and why we think it is the best suited solution available in Section 2.2.

Standardization and Interoperability

Many Standards Developing Organizations (SDOs) exist in the IoT world. The goal of these organizations is to design technical standards and protocols so that multiple technologies can talk the same language and thus, interop-erate. The usual process is that private companies innovate and implement their own technology. As pioneers, they would lead this new technology for a few years and obtain recognition and market shares for that. Then other companies will implement their own version of the technology and multiple similar solutions will coexist for a time. Finally, those companies, together with other technology experts, will merge eﬀorts and create standards so that one technology gets adopted by all and that the diﬀerent implementations can interoperate.
Standard and protocols are usually grouped into abstraction layers where each layer is responsible for a set of tasks (e.g. formating, compressing, rout-ing, signaling). In networking, the complete set of layers is called a Network-ing Stack (or just Stack). The most widely used stack models are the Open Systems Interconnection (OSI) model and the Internet Protocol Suite, com-monly known as the TCP/IP model. Each layer knows the tasks it has to do and what to expect from its upper and low layers. This layering distri-bution makes the protocols interchangeable. The main SDOs in IIoT are the Institute of Electrical and Electronics Engineers (IEEE), responsible for the lowest layers (close to the hardware), the Internet Engineering Task Force (IETF) responsible for routing and networking, the European Telecommuni-cations Standards Institute (ETSI) famous for Machine-to-Machine (M2M) standards, and the International Society of Automation (ISA) oriented to-wards control system regulations.
A wide range of standards and protocols exist in the IoT world, but until recent year, no entire networking stack existed to answer the Industrial IoT requirements. Such a stack requires to be IPv6-ready to be able to commu-nicate with other devices on the Internet, Low Power to operate during years on battery, and Highly Reliable to oﬀer Quality of Service (QoS) guarantees.
One of the main idea behind the Internet of Things is that devices need to be addressable in order to interact with the rest of the Internet. Being able to communicate with other devices on the Internet opens the way to a wide range of applications. As an example, Internet allows to bypass the lim-its of geographical distances, two machines can exchange information across continents to optimize a delivery process. The Internet Protocol (IP) is the main addressing protocol on Internet. Its first version (IPv4) was released in 1981 and we are now moving towards the latest version, IPv6. In 2007, an IETF working group called “IPv6 over Low-Power Wireless Personal Area Networks” (6LoWPAN), was created to bring IP capabilities to low power and constrained devices [4]. This resulted in the creation of the 6LoWPAN adaptation layer, a set of protocols and methods to enable eﬃcient transport of IPv6 packets over IEEE802.15.4 frames. At that point, the IPv6-ready goal was achieved, but we were still lacking the reliability and low power con-sumption.
Right after, the IEEE task group ”4e” (TG4e) was created, chartered with defining an amendment to IEEE802.15.4 for the MAC layer to better support the industrial markets. Among other mechanisms, the TG4e chose to incorporate time slotting and channel hopping techniques embodied in a MAC layer mode called Time Slotted Channel Hopping (TSCH). In 2016, those changes got merged into the standard’s new version, IEEE802.15.4-2015. This was, however, not enough to yield a fully standardized IIoT stack, as there was still no standard for the networking layer to access and reserve MAC resources. A component was missing to bridge the gap between the IEEE802.15.4 TSCH and the networking layer. The IETF 6TiSCH Work-ing Group (WG) was created in 2013 to bridge that gap and propose a fully standardized stack for the Industrial IoT. At the time of writing, two Request for Comments (RFC) were published in the working group [5], [6] and the 6TiSCH stack architecture is about to be published [7].
In the next sections we explain the concepts and mechanisms of TSCH and detail all the layers of the 6TiSCH stack, the Industrial IoT networking stack.

Time Slotted Channel Hopping

Time Slotted Channel Hopping (TSCH) is a channel access method for shared medium networks. TSCH is designed for applications that require reliability and ultra long battery life.
.
History & Description

In 2006, a startup company called Dust Networks introduces TSMP (Time Synchronized Mesh Protocol) [8], a protocol for self-organizing Wireless Sen-sor Networks (WSN). In a TSMP network, the devices – called motes, as small particles of dust – are synchronized to each other and communicate respecting a time schedule. Time is divided into slots (timeslots), similar to other Time Division Multiplexing (TDM) systems. The devices know when to sleep, transmit or receive, and stay asleep most of the time, allowing extremely low energy consumption. On top of the time scheduling, TSMP devices use Channel Hopping, a technique in which transmissions are distributed over time and radio channels. Hopping through channels increases the communi-cation reliability over noisy environment.
In 2008, the base concepts of Dust Networks’ TSMP got included into two low-power wireless industrial standards, WirelessHART (2008) and ISA100.11a (2009) under the name Time Synchronized Channel Hopping. These standards have been very successfully rolled out in the industrial mar-ket (industrial process monitoring, factory automation). WirelessHART is an interoperable wireless standard designed to provide reliable, cost-eﬀective, high-quality system for industrial wireless sensing applications [9]. Wire-lessHART got widely adopted as it was backward-compatible to the legacy (wired) HART1.
In the mean time, the IEEE 802.15 Task Group 4e was chartered to define a MAC amendment to the existing standard 802.15.4-2006 to “better support the industrial markets”. They chose to reuse the TSMP concepts and named it Time Slotted Channel Hopping (TSCH). The amendment was published in April 2012 and got incorporated into the IEEE802.15.4-2015 standard [10].
The 6TiSCH working group [11] is now standardizing the use of IPv6 on top of IEEE802.15.4-2015 TSCH technology.

A Slotted Structure

In TSCH, time is divided into time slots (also called slots) that typically last around 10 ms. Time slots are grouped into a slotframe that repeats over time (as depicted in Fig. 2.1). The way time slots are organized into slotframes is called a schedule. To each time slot is associated an action: sleep, transmit, or receive so that each node knows in advance what to do next. If the action is sleep, the node turns its radio oﬀ and waits for the duration of a time slot. When sleeping, the node only consumes a few µA at 3.6 V. If the action is transmit, the node turns its radio on and transmits a frame. If the frame requires to be acknowledged, the node then listens for an acknowledgement frame in the same slot. If the action is receive, the node turns its radio on and listens for a frame. If the frame requires to be acknowledged, it transmits an acknowledgement. A schedule example is depicted in Fig. 2.2. In the first time slot, node D transmits to node B and node C transmits to node A. In the second time slot (about 10 ms later) node B transmits to node A. We will see in Section 2.3 how the schedule is managed.
As depicted in Fig. 2.4, slot are long enough (e.g. 10 ms) for both the transmission of a data frame and the transmission of an acknowledgement (ACK) frame. Unlike with wired transmissions, there is no way to know if a collision occurred during a transmission when using wireless communica-tion. The only way to know if a transmission failed or succeed is to ask the receiver for confirmation. This is done using an acknowledgement (ACK) frame. When the first sender receives an ACK, it knows the receiver got the message. Of course, there is no way for the receiver to know if the first sender correctly received the ACK. Rather than sending a second ACK, nodes usu-ally assume that if the first message went through, the probability of success of the ACK message is high (as the delay between the two messages is short and the frequency does not change).
TSCH defines two types of cells:
• A Shared Cell is a cell in which multiple communications can occur. When using shared cells, node need to expect collisions and thus imple-ment mechanisms to prevent them.
• A Dedicated Cell is a contention-free cell. The cell is dedicated to a pair of nodes, in one direction so it has a fixed source and a destination. When using a dedicated cell, a source node can assume that no other transmitter will use that same dedicated cell with the destination node.
The main source of the nodes’ energy consumption is the radio; there is a direct link between the number of cells that are scheduled and the energy consumed. Typically, nodes stay in sleep mode most of the time allowing them to consume only a few tenth of µA at 3.6 V on average, and achieve years of battery lifetime. Xavier Vilajosana et al. [12], propose a realistic model to estimate a node’s energy consumption based on the number and type of cells that are used in the schedule. We use those results in Chapter 6 to evaluate the relation between energy consumption and latency in diﬀerent environments and network configurations.
On top of time division, TSCH also does frequency division. The available frequency band is divided into channels by the physical layer. Each channel is an available resource the MAC layer can use. Multiple channels can be used at the same time, meaning that in each time slot, all the available channels can be used in parallel. This results in the notion of a cell, where a cell is a part of the schedule that can be identified with a slot oﬀset and a channel oﬀset (as depicted in Fig. 2.3).
Each channel is not equally aﬀected by external interferences or multi-path fading, thus using multiple channels augments the radio environment diversity and reduces the probability of failure [2]. For instance, if the third channel is used continuously by another technology (e.g. WiFi) and all other channels are free, using all the channels available will give a probability of failure of 1/number of channels. To make sure all channels are used and well distributed, TSCH uses (2.1).
Where, channel_of f set is the channel oﬀset of that cell and number_channels is the total number of channels available. The Absolute Sequence Number (ASN) is a global time slot counter allowing all nodes to share the same sense of time. The ASN value increases at every slot: two consecutive cells on the same channel oﬀset will not use the same channel, and that a same cell will not use the same channel in two consecutive slot-frames (unless the slotframe length and the number of channels are not mu-tually prime). At every time slot the channel changes, this is called “channel hopping”. The resulting frequency diversity greatly increases reliability and stability of the links [2] making TSCH a perfectly suited MAC layer technique for a reliable IoT.

Time Synchronization

Because every communication in the network is scheduled, node must stay synchronized. Nodes locally keep track of time with an internal clock. This is typically done using a crystal oscillator (although on-chip ring oscillators were proven to work for millimeter-scale motes [13]). Those clocks are never per-fectly accurate and drift in time with respect to one another. Nodes therefore need to periodically resynchronize.
Clock drift is usually measured in parts per million (ppm), that is how many clock “ticks” are oﬀ in a million. A typical crystal drift in WSN de-vices is between 10 ppm and 50 ppm. A 10 ppm clock drift corresponds to a maximum drift of 10 µs per second or + − 0.864s per day. Although this number might appear small, it makes a diﬀerence as a slot duration is usually in the order of magnitude of 10 ms. This is even more important as the clocks of two neighbor nodes might drift in opposite direction (one going 10 ppm slower, the other 10 ppm faster) making the nodes’ relative clock drift twice the absolute clock drift.
What is the maximum desynchronization two nodes can tolerate and how nodes can resynchronize? Fig. 2.4 depicts the diﬀerent steps performed within a cell when a transmission occurs. The transmission starts exactly af-ter a fixed delay (TxOffset). If the destination node (DST in Fig. 2.4) starts receiving at the same TxOffset, it might miss the beginning of the frame as its own time view might not be the same as the transmission source (SRC in Fig. 2.4) one. Thus, the destination starts listening GuardTime before TxOffset and keeps listen until it receives a frame, for a maximum listen-ing duration of GuardTime. If the relative desynchronization is higher than GuardTime, the transmission fails. Another way of reducing the impact of clock drift is to compensate for it using its estimated drift. Quartz manufac-turers provide a drift estimation depending on temperature. To take this into account, WSN hardware typically comes with a temperature sensor so that nodes can compensate the clock drift in software. Adding a GuardTime and compensating for temperature helps reduce the impact of clock drift until some limits.
The nodes needs to resynchronize periodically. In TSCH, resynchroniza-tion is done by exchanging frames. Each IEEE802.15.4 frame is times-tamped: when a node receives a frame, it calculates the delta between the reception time and the frame timestamp and can recalibrate its clock. How of-ten a node’s clock needs to be recalibrated depends on the drift and GuardTime values. With a 30 ppm drift and a 1 ms GuardTime (typical values), two nodes need to resynchronize every 1ms/30ppm = 33s. If no data frame is sent dur-ing that period, nodes exchange dedicated keep-alive frames for resynchro-nization. Considering a transmission of 4 ms (typical duration) every 33 s would lead to a 4/33 = 0.12% duty cycle. The cost of resynchronization in TSCH is hence very low.
In the next section, we describe the diﬀerent protocols and mechanisms that form what we call an Industrial IoT Stack. We give an overview of the diﬀerent approaches taken by the research community and companies to result in a fully standardized stack for the Industrial IoT: the 6TiSCH stack.

Industrial IoT Stack

A networking stack is the assembly of a series of network protocols. It is called a stack as those protocols can be grouped in layers that are piled up from the higher level (i.e. the application) to the lower layer (i.e. physical).
In this section, we present a subset of the diﬀerent protocols that exist in the IoT, and describe in more details the one that got selected to form 6TiSCH, the networking stack for the IIoT.

The 6TiSCH Networking Stack

6TiSCH is the name of the Internet Engineering Task Force (IETF) working group in charge of designing a networking stack for the Indus-trial IoT.
The goal of the 6TiSCH working group is to provide a standardized net-working stack for the IoT. The stack needs to be Low Power, Highly Reli-able and Internet-Enabled. The 6TiSCH name comes from the combination of IPv6 and the TSCH mode of IEEE 802.15.4, two proven technologies se-lected as they provide interoperability and reliability respectively.
The proposed stack is flexible as there is room to fine-tune several proto-cols in the stack to a particular application. The goal is to propose a reference stack that Industrial IoT designers can adapt to their particular needs. As a proof of concept (PoC), an RFC was published to define a 6TiSCH minimal mode of operation that 6TiSCH-compliant devices should implement [6].
The 6TiSCH stack is similar to the Open Systems Interconnection (OSI) model or TCP/IP layering model. The protocols are separated into abstrac-tion layers, each layer only interacting with its upper and lower layer. By following this separation of work, a stack can easily be adapted to a particu-lar use case, just by switching a protocol of one layer with a more suited one. The 6TiSCH stack is depicted in Fig. 2.5 and is separated in five layers that we will define in the following sections.
In the following subsections we describe, layer by layer, the main protocols and mechanisms that exist to build an IIoT stack and present which ones were selected to be part of the 6TiSCH stack and why.

Physical

Wireless nodes communicate by sending digital information as electromag-netic waves over the air. To transform a digital information into an electro-magnetic waves, the digital information must first be translated into an ana-log signal. This transformation is called modulation. Then the analog signal needs to be amplified and sent over the antenna. A typical low-power radio outputs 0 dBm (1 mW). As explained in Chapter 1, the radio signal power might fade due to obstacles, distance or external interferences. As a result, a receiving node needs to amplify the received signal before demodulating it. To extract relevant information from the signal, the ratio between the sig-nal carrying the data and the ambient radio noise must be kept high. This is quantified using the Signal-to-Noise Ratio (SNR).
The amplifier and modulator components draw a significant amount of en-ergy, making the radio the most power-hungry part in most designs. Those components do not consume energy when oﬀ. The challenge is thus to limit their usage while providing reliable communication. An energy-eﬃcient com-munication stack usually uses those components less than 1% of the time.
The PHY layer is responsible for the activation and deactivation of the radio, preamble detection (identifying symbols that correspond to a known modulation), and energy detection (ED). The chosen standard in 6TiSCH is IEEE802.15.4-PHY as it is the most prominent standard in low-power radio technologies. IEEE802.15.4-PHY proposes multiple modulation schemes.
In 6TiSCH, the default modulation scheme is Oﬀset Quadrature Phase-Shift Keying (OQPSK)[6] on the 2.4 GHz ISM band. In this modulation, the digital data in translated into an analog signal by modulating (changing) the signal phase. OQPSK is a variant of phase-shift keying modulation using four diﬀer-ent values of the phase to transmit. This modulation technique is also used in Bluetooth and RFID. IEEE802.15.4 with OQPSK provides 16 independent channels (numbered from 11 to 26), that is, sending on channel 12 will not impact another communication (using the same technology) on channel 11 or In IEEE802, the DataLink layer in divided into two sub-layers: Medium Ac-cess Control (MAC) and Logical Link Control (LLC). The MAC sub-layer is responsible for controlling how devices gain access to a physical medium (e.g. which channel to use). The LLC sub-layer is responsible for bridging the gap between the MAC sub-layer and the network layer, and also controls er-ror checking and frame synchronization. 6TiSCH uses IEEE802.15.4-2015 with the TSCH mode at the MAC layer and the 6TiSCH Operation Sublayer (6top) for LLC. The 6top sub-layer includes the 6top Protocol (6P) that de-fines the commands and interaction between nodes to reserve resources (cells) and the Scheduling Function (SF) that is internal to each node and defines when to add or delete cells in the schedule. Work is being proposed at the IEEE (802.15.12 PAR) for an LLC that would logically include the 6top sublayer.
The Medium Access Control (MAC) sub-layer provides an interface between the physical layer (PHY) and the LLC layer. The diﬀerent MAC designs can be separated into two diﬀerent paradigms, contention-based and time-divided. In contention-based MAC, nodes need to make sure the radio medium is not used by other transmissions before transmitting. This is usually done using Carrier-Sense Multiple Access with Collision Avoidance (CSMA/CA), a method in which nodes send a short message named Request to Send (RTS) before sending, to make sure no other node is transmitting. If the receiver has no ongoing transmission, it replies with another short message named Clear to Send (CTS). Then the actual data transmission can start. This technique is used in broadly adopted technologies such as WiFi. Contention-based MAC protocols are appealing as they allow multiple devices to share the same radio channel without being pre-coordinated.
One of the first adaptation of existing contention-based MAC protocols for Wireless Sensor Networks is SMAC [14]. S-MAC is designed to reduce en-ergy consumption, while supporting good scalability and collision avoidance. S-MAC consumes from 2 to 6 times less energy than traditional IEEE802.11 MAC protocols. Still, they show a 30% duty cycle, that is way above what is expected for a solution to run during years. Contention-base solution cannot provide strict guarantees with ultra low-power consumption.
The other MAC paradigm is time-division, where devices share the same radio frequency by using it at diﬀerent times. This is known as Time-Division Multiple Access (TDMA). Time is usually divided into slots of the same du-ration. As opposed to contention-based solutions, nodes need to agree on a time-division scheme (e.g the duration of a time slot). TDMA can be coupled with Frequency-Division Multiple Access (FDMA) in which a frequency band in sub-divided into channels and multiple channels can be used at the same time without interfering with each others. This is now a largely adopted tech-nique in Wireless Sensor Networks2.
Before 2012, the IEEE802.15.4 MAC sub-layer was designed for star networks, in which all motes communicate directly with a central coordinator mote. The way the sub-layer was built would not match IIoT use cases for two reasons:
• Link Reliability. As described in Chapter 1, the radio environment is unreliable in nature. The quality of a transmission over one frequency can change from one second to another. Until 2012, the MAC sub-layer was not using Channel Hopping.
• Relay Energy Consumption. To have a mesh network, some nodes need to act as relays (routers). Until 2012, relaying nodes needed to keep their radio on all the time (100% duty cycle).
2 At the time of writing more than 70k networks are deployed in the world using the SmartMesh IP solution that include such technique. This technique is also included in Wire-lessHart, a protocol used in many industries for factory automation.
In 6TiSCH, the chosen MAC layer is the IEEE802.15.4-2015 with the TSCH mode. Using the TSCH mode allows both low duty cycle (<1%) and high reliability (>99.999%) [15]. IEEE802.15.4-2015 describes the way TSCH works and its typology, but does not define how the schedule is built.
The way cells are organized in time and channels is called the schedule. The management of the schedule is a crucial task as it has a direct impact on la-tency, reliability and energy consumption. The number of allocated cells in TSCH is directly related to bandwidth. The more cells a node has, the more opportunities to transmit or receive. Scheduling with TSCH can be done fol-lowing five paradigms: Static Scheduling, Neighbor-to-Neighbor Scheduling, Remote Scheduling, Hop-by-hop Scheduling, and Id-based Scheduling.
In Static Scheduling, a fixed schedule is defined for the entire network, no matter the bandwidth required by each node, no matter the number of nodes. It means that multiple transmissions can occur in the same cell, resulting in potential interference and data loss. Static Scheduling is often used to boot-strap the network or as a fall-back mode (when other scheduling techniques failed to operate).
In Neighbor-to-neighbor Scheduling, nodes negotiate cells in a distributed manner. Each node exchanges messages with its neighbors to allocate/deal-locate cells between each other.
In Remote Scheduling, a central entity (e.g. the gateway) computes a schedule for each mote. The advantage of that approach is that one entity has a global view of the network and can thus take decisions that can optimize the overall performances. The drawback is that it takes more time and network usage, as each node needs to send information to the scheduling entity. The central entity might then have information that is not up to date.
In Hop-by-hop Scheduling, nodes can reserve cells among a path to a destination. In 6TiSCH this is called a Track. A Track is the 6TiSCH in-stantiation of the concept of a Deterministic Path [16].
In Id-based Scheduling, nodes decide which cells to allocate depending on a unique identifier (MAC address or other). If this identifier is obtained from the sender, it is called Sender-based Scheduling, and if obtained from the receiver, it is called Receiver-based scheduling. When two nodes want to communicate and know their identifiers, they choose one of the two identifiers and translate it into cell coordinates (i.e. slot oﬀset and channel oﬀset). How to translate an unique identifier into cell coordinates is not defined. The only constraint is to make sure the mapping to a cell coordinate is unique. Cells allocated using Id-based scheduling are shared. This approach is presented in Orchestra [17] and is now being merged in the 6TiSCH Minimal Scheduling Function (MSF). This technique is eﬃcient as it does not require any wireless communication and negotiation.
While the standard does not define how the schedule is built and oper-ated, scheduling has major impacts on the network performance. In their survey [18], Teles Hermeto et al. provide an extensive description of the diﬀerent scheduling techniques available. They classify the scheduling tech-niques according to their paradigms (i.e. centralized or distributed) and opti-mization goal (e.g. low latency or reliability). They point out that centralized schedules are ideal for static topologies with periodic traﬃc and distributed scheduling is more suited for mobile topologies and traﬃc that is not deter-mined in advance.
As we mainly use centralized scheduling in this thesis, I will now describe the corresponding state of the art. One of the pioneer work in centralized scheduling algorithm over TSCH proposed by the research community is the Traﬃc Aware Scheduling Algorithm (TASA) [19]. In TASA, the schedule is managed by a centralized entity named the Path Computation Element (PCE). TASA is built for periodic convergecast traﬃc (the entire traﬃc is addressed to the gateway) and works over DODAG where node have only one parent. TASA aims at building compact schedules, that is, minimizing the oﬀset of the last cell in the schedule. It starts to allocate bandwidth (cells) to the most constrained nodes (the nodes that carry the most traﬃc). TASA then uses matching and coloring heuristics to find the smallest schedule taking into account the traﬃc and the topology. TASA guarantees optimal schedule com-pactness but does not provide any guarantees in terms of reliability, mainly because it assume perfect links and no retransmissions.
Gaillard et al. [20] proposed an extension of TASA (T ASART X) to take into account retransmissions and fragmented packets. They build a schedule that complies with reliability expectation by adding extra cells that are used in case of consecutive retransmissions. They take into account link quality, packet fragmentation and traﬃc changes.
The same authors further extend their research by proposing Kausa, a KPI-aware scheduling function [21]. The authors consider that multiple ap-plications use the same network, and that each application has its own set of requirement and traﬃc flow. They design Kausa, a centralized scheduling algorithm that builds resource paths that guaranty per flow QoS.
Khoufi et al. [22] use centralized scheduling to build a schedule that is compliant with both latency and reliability requirements. To do so, they intro-duce the concept of debt-based scheduling, that is a mechanism that schedules first the node with the highest debt. They define the debt of a node based on the amount of traﬃc it needs to carry out and its depth in the network (the number of hops to DAG root). We reuse this concept in Chapter 6.
In 6TiSCH, the LLC sublayer is embodied by the 6TiSCH Operation Sub-layer (6top). 6top provides a management interface that enables an external management entity to schedule cells and slotframes named the 6top Proto-col (6P) as well as a structure and formalism for the scheduling mechanisms called the Scheduling Function (SF). 6top defines two types of cells: Hard cells, cells that cannot be modified (i.e read-only), and Soft cells, cells that can be modified.
The scheduling tasks are done by the Scheduling Function (SF) compo-nents. A node may support multiple SFs at the same time, each SF is iden-tified with an SFID. 6TiSCH only defines a set of requirements a SF must have (e.g. the SFID) but does not define how the SF computes the schedule.
As an example, a scheduling function named Minimal Scheduling Function (MSF) is proposed in RFC8180 [6], defined in [23] and described in [24]. MSF defines both the behavior of a node when joining the network, and how the schedule is managed in a distributed manner. MSF uses a combination of Neighbor-to-neighbor Scheduling and Id-based scheduling. During the join process, MSF defines a set of steps a node follows to allocate the minimal set of cells its will use to start communicating with the network. After the joining process, MSF dynamically modifies the schedule to continuously adapt to the application and routing changes as well as to handle the schedule collisions.
The Scheduling Functions (SFs) decide which cells to allocate/deallocate locally, but does not apply those changes alone. To do so it uses the 6top Protocol (6P). The 6top Protocol (6P) defines the commands nodes send to each others to add, delete or relocate cells (redefine the location of the cell in the schedule) [25]. When the SF takes a scheduling decision, it triggers a 6P mechanism that is called a “6P Transaction”. A “6P Transaction” is a series of messages a node and its neighbor exchange to negotiate the modification of their schedules. An example of a 2-step 6P Transaction is depicted in Fig. 2.7 After a 6P transaction, if both node agree, they modify their schedule. It is important that the two nodes keep their schedule consistent. If a node A has a transmit cell and node B does not have the corresponding reception cell, this results in communication failure.

READ 2.1. Inverter Model Derivation

Network

The Internet Protocol version 6 (IPv6) is the latest version of the Internet Protocol (IP) that provides devices addressing across the Internet. Devices are often considered part of the IoT domain when they are able to use IP. Having IPv6 capabilities simplifies the integration of a technology into a pro-duction system.
The IETF “IPv6 over Low-Power Wireless Personal Area Networks” (6LoWPAN) working group was created in 2005 to provide an adaptation layer for IPv6 to work over IEEE 802.15.4 and resulted in a set of mecha-nisms known as 6LoWPAN [4]. Adapting IPv6 to constrained device net-works is not straightforward and the main obstacle is size. The IPv6 Maxi-mum Transmission Unit (MTU) is 1280 B and the IEEE 802.15.4 maximum frame, size is 127 B. 6LoWPAN thus defines mechanisms to compress and fragments datagrams.
not define how neighbors are selected and how the network is formed. In 6TiSCH, the default routing protocol is the “Routing Protocol for Low-Power and Lossy Networks” (RPL) [26]. RPL supports a wide variety of datalink layers and is not only designed for WSN. RPL can build up network routes, adapt to a changing topology and distribute routing knowledge among nodes. RPL forms a Destination-Oriented Directed Acyclic Graph (DODAG), that is a multi-hop routing graph rooted at a central point (named gateway, root or sink). The routing graph is usually created by accounting for link quality and nodes attributes, based on a gradient-based approach. Which parameters to take into account is defined in an Objective Function (OF). The OF chooses the parameters to use based on the routing objective (e.g. create short paths, build redundant paths). Each node thus obtains a Rank that denotes its virtual distance to the DODAG root.
The DODAG construction starts at the root. The root periodically broad-casts a control message named DODAG Information Object (DIO). A DIO contains the rank of the node that sent it, as well other routing configuration parameters. When a joining node receives a DIO, it adds the DIO trans-mitter node to its list of potential parents. After receiving a few DIOs (the number of DIOs to wait for is defined in the OF), the joining node selects one of the potential parent and defines it as its preferred parent. How to select the preferred parent is also defined in the OF. For instance, a node can select its parent given its Rank, or given the Rank that will provide to the joining node. Once the joining node has selected its preferred parent, it redirects all its packets addressed to the sink to that parent. The joining node then com-putes its own Rank and starts broadcasting its own DIOs. Fig. 2.8 depicts the steps of a DODAG construction using hop count as the routing metric.
Once the routing and scheduling are settled, the application can start.

Application and Transport

As the goal is for IoT devices to be able to talk with the rest of the Internet, IPv6 is not suﬃcient. IPv6 provides a way to address other devices, but does not specify the type of data it carries or how this data is formatted. This is the task of the Application layer. The most famous Application layer protocol is the HyperText Transfer Protocol (HTTP). Two applications that use HTTP are able to communicate as they know in advance the format of the data they will manipulate, and what the communication steps are they need to respect for the conversation to be carried out well. Classical wired networks do not have tight restrictions in terms of packet size or energy consumption, and are thus not optimized for minimum packet overhead. To be able to interoperate with such networks, the 6TiSCH stack needs to have techniques to compress the communications without compromising its content.
The Constrained Application Protocol (CoAP) over User Datagram Pro-tocol (UDP) is the default choice for Application and Transport Layers in 6TiSCH. CoAP can be seen as a compression of HTTP with extra features built toward IoT such as IP multicast for group communication. CoAP inte-grates well with IPv6 and 6LoWPAN that are described in Section 2.3.4.
In this section, we saw that the 6TiSCH working group assembled what I believe to be the right protocols and mechanisms to build a stack for the In-dustrial Internet of Things that is Highly Reliable, Low-Power and Internet-Enabled. We will now see what the next steps are for this technology, and what the open issues and challenges are.

Open Issues and Challenges

Benchmarking IoT

To design and validate an IoT communication protocol, one needs to test its capabilities in diﬀerent scenarios (e.g. number of nodes, position of the nodes, data rate, etc) and environments (indoor/outdoor, with or without external interferences, etc). There are three ways this can be carried out: simulation, testbed, and real-world. Unfortunately each solution has its drawbacks. Sim-ulation is fast, flexible and does not requires hardware, but rarely represents the reality well. Testbeds better reflect reality, especially in terms of radio propagation and interferences, but are static in the sense that they represent only one radio environment. Real-world deployments present the same pros and cons than testbeds, with the advantage of being tied to an application. There is no need to make the network behave to fit one application as it is al-ready designed for it (e.g. node positions, temperature and humidity changes, data rate, etc). Ideally, a protocol needs to be tested in as many scenarios and environments as possible. Unfortunately, validation is often done only with a subset of the possible application configuration [28]. There is a need for a consistent way to test, validate and compare deployments and their results. The IoT Benchmark Initiative (IoTBench) was recently created to provide a set of tools and best practices to enable fair comparison and repeatability of experimental results. They define 3 types of parameters and metrics an IoT benchmark should have: Inputs (e.g. number of nodes, data rate), Outputs (e.g. the nodes’ energy consumption) and Observed (e.g. external interfer-ences). I strongly believe this benchmarking initiative is the right approach for providing reliable solutions for the IoT.
During my work, I participated in the deployment of multiple real-world solutions and ran testbed experiments. In order to compare them, I adopted a way of formating “traces” (network statistics and radio connectivity) and extracted Key Performance Indicators (KPIs) that I present in Chapter 5. This trace format is generic enough and can be reused for deployment bench-marking. I then went further and “replayed” those traces to combine the advantages of simulation and experimentation with Trace-Based Simulation. I explain this work in Chapter 6. My goal is for the community to adopt a standard trace format and build an extensive set of traces to be able to test and validate protocols in a wide range of scenarios.

Determinism in IoT

Determinism is the theory according to which, given a system state, if an event occurs it results in an expected system state. In computer networking, this translates into the ability of predicting the performance of a network. Key metrics are typically the lifetime of a network (i.e. the time before one node runs out of battery), its reliability (i.e. how many messages were lost), and its latency (i.e. how much time messages need to go from source to destination). When a network is said to be deterministic, it means that it can provide guar-antees. That is, network operator can commit to contracts named “service-level agreements” (SLAs) to ensure a client that the system will provide the desired quality of service (QoS)[29], [30].
Network Determinism has been around for years. Formed in 2012, the Time-Sensitive Networking (TSN) IEEE Task Group aims at providing deter-ministic services (not only time-related) through IEEE802 networks. Then the Deterministic Networking (DetNet) IETF working group started in 2014 and aims at providing networking layer determinism [16]. They work with the TSN task group to define a common architecture for both layers. The DetNet working group also works with other IETF working groups such as 6TiSCH.
We saw that TSCH can provide high reliability together with low energy consumption. As TSCH uses a time slotted structure, nodes know when to sleep, transmit or receive, thus, most of the network events are known in advance. The uncertainty comes from the radio environment variations or the potential traﬃc changes. We will now see how performances guarantees can be obtain when using TSCH-based networks in term of reliability, latency and energy consumption.

Reliability

End-to-end (E2E) reliability is evaluated by the number of messages that reach their destination over the number of messages sent. For instance, a 90% E2E reliability means that, out of the 100 messages the were generated in the network, only 90 reached their destination. Message loss usually occurs for three reasons:
• Queue full. When a node receives a message and the node’s message queue is full, the node cannot handle more messages and thus drops the new message.
• Maximum number of retransmissions. When a node tries to transmit a message, if the transmission fails, the node increases a retransmission counter. If a maximum number of retransmissions is set and the counter reaches that limit, the node discards the message.
• Disconnection. If a node has messages in its queue and it disconnects (e.g. due to high desynchronization) or runs out of battery, the messages are dropped.
A high E2E reliability (>99.99%) can be obtained by reserving enough resources (no queue full), allowing an infinite number of retransmissions (no retransmission limit reached), and ensuring low-power consumption.
In practice, reserving enough resources a complex task, especially when the number of retransmissions is unlimited. The number of resources (cells) required depends on the number of messages a node needs to transmit (locally generated messages and forwarded messages) but also on the expected num-ber of retransmissions. Let’s consider the scenario depicted in Fig. 2.9 where node A needs to transmit a message to node B at a rate of one message per slotframe, and there is one cell allocated from A to B. If the first transmission fails, node A can save the message in its queue and wait for the next available slot. As only one slot is available per slotframe, node A waits for the next slotframe and now has two messages to transmit and only one cell available. If retransmissions occur multiple times, the node’s transmission queue might fill up and newly generated messages will be dropped. One way of solving this problem is to allocate more cells, to anticipate retransmissions that may occur.
The question is: how to estimate the number of retransmissions? This boils down to: how to estimate the quality of a link ? Nouha Baccour et al. [31] provide an extensive survey of the existing techniques to estimate the quality of a link. Rather than reproducing this survey, we list here the diﬀerent paradigms and give some metric examples. First, we can discern between passive and active estimation.
In passive estimation, a node simply listens to the radio medium and es-timates the quality of its diﬀerent links from what it hears. This is the case when using the Received Signal Strength Indicator (RSSI) that denotes the amount of power a node receives when another node transmits. Using passive measurement is handy as it does not require a node to transmit any informa-tion. It does, however, rely on other nodes transmitting frequently in order to have up-to-date link information.
In active estimation, a node transmits frames to estimate the probability of success or failure of a transmission. The Packet Reception Rate (PRR) is the ratio of the number of packets successfully received over the number of packets sent. A similar metric is the Estimated Transmission Counter (ETX) [32], that represents the expected number of transmissions needed to successfully transmit a frame. It takes into account the probability of delivery and the probability of “reverse delivery” (i.e the ACK probability of success). Such metrics usually provide a more precise understanding of the expected number of transmissions needed on average but do not take into account the maximum number of consecutive successes or failures.
Kannan Srinivasan et al. [33] show that links are bursty, that is, they fluc-tuate between low and high delivery ratios and that existing metrics such as PRR does not denote those fluctuations. They introduce the β-factor, a metric to quantify the burstiness of a link. Knowing the number of estimated consec-utive failures allows to estimate the number of cells needed in the worst-case conditions.
Pottner et al. [34] use Bmax that indicates the maximum number of re-transmissions required to successfully transmit a frame. Bmax is obtained by exchanging data in the network, thus Bmax is not representative during the first exchanges of data but gets more and more accurate as the time goes. The authors use this metric to create a schedule for time-critical applications by allocating enough cells for the schedule to Allowing an infinite number of retransmissions can result in very high reliability but has an impact on energy consumption as more cells are needed to enable retransmissions.

Energy Consumption and Latency

If the data traﬃc is periodic and we know the network topology, average link quality, and payload size, estimating the lifetime and latency of the network is straightforward [35]. The energy consumption is directly linked to the number of cells used as the main source of consumption is the radio. The latency depends on how the schedule is built. As the schedule is typically done as a function of the amount of data to carry (bandwidth needed) and the link quality (how many retransmissions needed), if those parameters are fixed and known in advance, we can estimate latency (as it is done in Ines Khoufi et al. [22]) and energy consumption (using models such as the one proposed by Xavier Vilajosanaet al. [12]). Those conditions may not be met.
The first condition is to have periodic traﬃc. This depends entirely on the type of application. We can distinguish two types of application: Constant Bit Rate, where each node periodically generates data, and Event-based, where events are generated sporadically. In the latter case, for instance if alerts information is needed (e.g. button pressed, value threshold reached) and no resource (i.e. cells) is reserved in the schedule, the Scheduling Function needs to dynamically adapt the schedule (by triggering 6P events) to increase the number of allocated cells. This takes time, and the alert might not be relevant anymore when reaching the destination. The only way to reduce the resource reservation delay is to consider the worst case scenario and over-allocate re-sources, and thus increase the energy consumption of the devices.
The second condition is to have fixed link quality. Using channel-hopping averages the per-channel link qualities resulting in very stable links [2]. As an example, we observe links over days in our deployments [36] and show that routing with at most 5 parent changes per day can be achieved with highly reliable WSN.

Toward Time-Critical Applications

Wire-like E2E reliability is usually the first requirement that is demanded in industrial applications. Today, commercial products provide such reliability guarantees, we thus consider it as a solved issue. Wireless sensor network are now being studied in applications that require time-critical data collection.
Pöttner et al. study time-critical applications over TSCH networks in an oil refinery [34]. They propose a metric called Bmax to quantify the maximum number of consecutive transmission failures to expect on a link, and based on that metric, they estimate the end-to-end (E2E) upstream latency of the packets. The drawback of this technique is that it requires several hours to have an accurate Bmax for each link.
Chang et al. propose a low-latency scheduling function (LLSF) for fast delivery in WSN[37]. They show that it is possible to achieve an average of E2E upstream latency of 320 ms over a 5-hop network.
Schindler et al. implement what they claim to be the first closed-loop wireless feedback control network using completely standards-compliant IEEE802.15.4 TSCH technology [38]. They study the trade-oﬀ between duty cycle and latency in various combination over a 4-hops network.
Finally Yang et al. build an event-detection system using the 6TiSCH stack and study the latency distribution [39] between two mote using diﬀerent duty cycle. Their results are promising and show that per-link low-latency (tenth of ms) is achievable using 6TiSCH.
There is still a lack of understanding about how TSCH behaves in diﬀerent environments and what are the trade-oﬀs between, reliability, latency and network lifetime. In Chapter 6, we estimate the trade-oﬀs between latency and energy consumption taking those conditions into account.

Summary

The IoT world is vast and its applications are flourishing. In this thesis, we focus on Industrial IoT, that is applications that should provide very high re-liability event in harsh environments. Battery powered wireless technologies are more and more used in IIoT as they are easy to deploy and cost eﬀective. To allow devices constrained in energy and computation power to operate in such harsh environments, the right technologies need to be selected. A wide range of standards exists but until recently no fully standardized networking stack existed to answer the IIoT requirements. The 6TiSCH stack was cre-ated to bridge that gap and will soon be fully approved. I believe we are at a cornerstone of Industrial IoT and that in the next following years we will see 6TiSCH adopted by a large number of network operators. Yet, some chal-lenges remain. To be able to provide performance guarantees, the 6TiSCH technology needs to be studied in depth over a wide range of applications sce-narios and environments. In Chapter 6, I explain how I study the limits and trade-oﬀs of TSCH, the mechanism at the heart of the 6TiSCH stack.

Table of contents :

1 Introduction
1.1 Preliminaries
1.1.1 The Internet of Things
1.1.2 Wireless Sensor Networks
1.1.3 The Wireless Impairment
1.1.4 Applications and Market Opportunities
1.2 Contributions
1.2.1 Network Deployments and Data Collection
1.2.2 Data Analysis and Comparisons
1.2.3 Determinism in IIoT
1.3 Thesis outline
2 State of the Art and Challenges
2.1 The IoT Standards
2.1.1 A Diversity of Applications
2.1.2 Standardization and Interoperability
2.2 Time Slotted Channel Hopping
2.2.1 History & Description
2.2.2 A Slotted Structure
2.2.3 Time Synchronization
2.3 Industrial IoT Stack
2.3.1 The 6TiSCH Networking Stack
2.3.2 Physical
2.3.3 DataLink
2.3.4 Network
2.3.5 Application and Transport
2.4 Open Issues and Challenges
2.4.1 Benchmarking IoT
2.4.2 Determinism in IoT
2.5 Summary
3 Methodology and Assumptions
3.1 Real-World Deployments
3.2 Characterizing Networks
3.3 TSCH Limits and Trade-offs
3.4 Summary
4 Real-World Deployments
4.1 SmartMesh IP
4.1.1 An IIoT World Leader
4.1.2 Low-power Wireless Motes
4.1.3 Low-power Wireless Manager
4.2 SolSystem
4.2.1 solmanager
4.2.2 solserver
4.2.3 SOL
4.3 PEACH
4.3.1 Context and objectives
4.3.2 Related applications
4.3.3 Deployment
4.3.4 Hardware Integration
4.3.5 Performance of the Network
4.3.6 Performance of the Motes
4.3.7 After 3 Months
4.3.8 Intuitive Results
4.3.9 Lessons Learned
4.4 SnowHow
4.4.1 Context
4.4.2 Related Work
4.4.3 Deployment
4.4.4 Performance of the Network
4.4.5 Lessons Learned
4.5 EvaLab
4.5.1 Context
4.5.2 Related Work
4.5.3 Deployment
4.5.4 Intuitive Results
4.5.5 Not so Intuitive Results
4.5.6 Conclusion and Lessons Learnt
4.6 SmartMarina
4.6.1 Context
4.6.2 Related Work
4.6.3 Deployment
4.6.4 Results
4.6.5 Lessons Learned
4.7 Conclusion
4.7.1 Summary
4.7.2 Lessons Learned
4.7.3 Challenges & Contributions
5 Characterizing Networks
5.1 Introduction
5.2 Related Work
5.3 Mercator: Dense Connectivity Datasets
5.3.1 Methodology and Terminology
5.3.2 IoT-LAB
5.3.3 Mercator: Testbed Datasets
5.3.4 Deployments
5.3.5 K7: Formating Traces
5.4 Published Datasets
5.5 Observations from the Datasets
5.5.1 Node Degree
5.5.2 Witnessing External Interference
5.5.3 Witnessing Instantaneous Multi-Path Fading
5.5.4 Witnessing Dynamics in the Environment
5.6 Discussion
5.6.1 What is Realistic?
5.6.2 A Word about Output Power Tuning
5.6.3 Waterfall Plot
5.6.4 Directions for Future Work
5.7 Summary
6 TSCH Limits and Trade-offs
6.1 Theoretical Limits
6.1.1 Assumptions
6.1.2 Key Performance Indicators
6.1.3 Objectives
6.1.4 A Canonical Case
6.1.5 With Retransmissions
6.1.6 Conclusion
6.2 Simulating the IIoT
6.2.1 Related Work
6.2.2 6TiSCH Simulator
6.3 Real-World vs. Simulation
6.3.1 Experiment Description
6.3.2 Replaying the Experiment
6.3.3 Simulating the Limits
6.3.4 E2E Upstream Latency
6.3.5 Network Lifetime
6.3.6 Discussion & Future Work
6.3.7 Conclusions
6.4 6TiSCH Performance Estimator
6.4.1 Trace-based Simulation
6.4.2 Inputs and Outputs
6.4.3 Status
6.5 Summary
7 Conclusion and Perspectives
7.1 Contributions
7.2 Perspectives