Conguration management for the UDec architecture

Get Complete Project Material File(s) Now! »

Recursive Systematic Convolutional encoders

Convolutional codes have been widely used in wireless telecommunication standards due to their low complexity. The most common form of convolutional encoder is the non-recursive and non-systematic convolutional encoder presented in Figure 1.3(a). This type of encoder can not be used for Turbo encoding since it is not systematic. A second form of non-recursive encoder (Figure 1.3(b)) introduces a systematic output but it is not suitable for Turbo decoding because of the poor distance properties of the resulting code. Finally, RSC encoders shown in Figure 1.4 introduce the feedback of one of the output. The encoders shown in i, one bit of the input data stream. Figure 1.4(b) presents a double binary RSC encoder in which two bits of the input stream are encoded at each instant i. These encoders have very simple structure that can be implemented with a set of ip- ops and XOR operators. The number of states of the encoder is 2p when p ip ops are implemented. Moreover, the value p + 1 is known as the constraint length of the code. The code rate of a convolutional code is dened by the ratio n=l where n is the number of bits that composes the input symbol Di and l represents the number of bits of the coded symbol with l > n. In the example of Figure 1.3(a), the code rate is 1/2 since at each instant i, the input bit Di is encoded to a two bits coded symbol that consists of P0i and P1i. In the example of Figure 1.3(b), the code rate is 1/3 since at each instant i, the input bit Di is encoded to a three bits coded symbol that consists of P0i and P1i and Si. The code rate of both RCS encoders presented in Figure 1.4 is 1/2.
In order to adapt the code rate, the puncturing technique [2, 4] can be used. It consist in removing some of the parity bits after encoding in order to increase the code rate. Another commonly used representation of convolutional encoding is the trellis diagram [5] which consists of nodes and branches. A node represents the state S of the code while a branch represents a transition from one state to another state due to an input bit or bit pair in case of double binary convolutional code. An example of a trellis diagram corresponding to the Double binary RSC encoder presented in Figure 1.4(b) is given in Figure 1.5. In this example, the constraint length of the code is 4, i.e. p = 3. Thus, the number of states of the encoder is 2p = 8. It can be noted that each state has 2b = 4 possible transitions, where b = 2 is the number of bits per symbol at the input of the encoder.

Parallelism in Turbo decoding

Turbo decoding provides an ecient solution to reach very low error rate performance at the cost of high processing time for data retrieval. Researches targeting the exploitation of parallelism have been conduced in order to achieve high throughput. These parallelism levels that can be categorized in three groups: Metric level, SISO decoder level, Turbo decoder level. The Metric level parallelism concerns the processing of all metrics involved in the decoding of each received symbol inside a MAP-SISO decoder. For that purpose, the inherent parallelism of the trellis structure [11, 12] and the parallelism of the MAP computation can be exploited [11, 12, 13]. The MAP-SISO decoder level parallelism consists in duplication of the SISO decoders in natural and interleaved domain, each executing the MAP algorithm on a sub-block of the frame to decode. Finally, the Turbo decoder level parallelism proposes to duplicate whole Turbo decoders to process iterations and/or frames in parallel. However, this level of parallelism is not relevant due to the huge area overhead of such an approach (all memories and computation resources are duplicated). Moreover, this solution presents no gain in frame decoding latency. The SISO decoder level parallelism hugely impacts the conguration process of a multiprocessor Turbo decoder. Indeed, the number of SISO-decoders that have to be congured and the conguration parameters associated with each SISO-decoder are both dependent of this parallelism level. At this level, three techniques are available: Frame sub-blocking, Windowing, and Shued decoding.

Dynamic conguration in embedded systems

In the context of telecommunication, the multiplication of wireless standards is introducing the need of exible and dynamically recongurable multi-mode and multi-standard baseband receivers. The methods to recongure an architecture are multiple and can be organized in three main categories.
The rst one corresponds to architectures that are congured through a stream of conguration bits that are spread over the architecture components to congure the data and control path. For instance, the congured components can be multiplexers, Lookup Tables (LUTs), Arithmetic Logic Unit (ALU), etc. This conguration method is typically applied to Field-Programmable Gate Array (FPGA) in which LUTs, multiplexers and programmable routing switches are congured through a bitstream load at power-up. Recent FPGA technology also proposes dynamic conguration techniques allowing hardware reconguration at run-time. The conguration load of recent FPGAs represents several Mega Bytes of information that can be loaded from various sources as an external memory, a host PC, a microcontroller, etc. Figure 1.10 shows bitstream chain in a FPGA which is sent from outside and is then spread inside the component. The conguration of the SRAM points of the FPGA can be seen as a huge shift register (in practice, the conguration chain is divided into frames and latches are used).

Dynamic conguration in multi-mode and multi standard scenario

When a Turbo decoder is designed to support several communication standards, the decoder behavior has to be adapted in order to respect the application requirements and to take into account the communication channel quality. In this thesis work, the scenario presented in Figure 1.12 is considered as the worst case conguration scenario that should be met by a multi-mode and multi-standard Turbo decoder in mobility.
In this scenario, the Turbo decoder deals with input frames that have to be decoded for multiple applications that use dierent communication standards or modes. Each application is associated with throughput and BER objectives. Moreover, considering a mobile terminal, the conguration associated to an application has to be adapted temporally depending on the communication channel quality evolution. Consequently, as shown in Figure 1.12, each frame received by the Turbo decoder is associated to a specic conguration which takes into account the application requirements and the channel quality. In order to avoid extra delays between two frames associated with dierent congurations, the con- guration process for a frame (i.e. computing and loading the new conguration) can be performed during the processing on the current frame. Thus, the Maximum Conguration Latency (MCL) for a frame k ensuring a null extra delay between two frames is evaluated using Equation (1.11).

Table of contents :

Introduction
1 Turbo codes and state of the art in channel decoder design
1.1 Context of channel coding
1.1.1 Communication system
1.1.2 Channel code performance
1.1.3 Turbo encoding
1.1.3.1 Recursive Systematic Convolutional encoders
1.1.3.2 Turbo codes interleavers
1.1.4 Turbo decoding
1.1.4.1 The MAP Algorithm
1.1.4.2 Parallelism in Turbo decoding
1.2 Dynamic conguration of exible Turbo Decoders
1.2.1 Dynamic conguration in embedded systems
1.2.2 Dynamic conguration in multi-mode and multi-standard scenario
1.3 State of the art in exible Turbo decoding architectures
1.4 Initial multi-ASIP architecture for turbo decoding
1.4.1 Overiew of the DecASIP processor
1.4.2 Interleaved/deinterleaved address generator
1.4.3 NoC messages
1.4.4 ASIC synthesis results
1.5 Summary
2 RDecASIP: optimized DecASIP for an ecient reconguration
2.1 Initial DecASIP conguration
2.1.1 Conguration memory
2.1.2 Program memory
2.2 Proposed optimizations for an ecient dynamic conguration
2.2.1 Conguration parameters storage
2.2.2 Conguration memory organization
2.2.3 Unied program
2.2.4 Multi-conguration storage
2.3 RDecASIP Implementation
2.3.1 ASIC synthesis results
2.3.2 Dynamic reconguration performance
2.4 Summary
3 Recongurable multi-ASIP UDec architecture
3.1 Flexible UDec architecture
3.1.1 ASIP number and location
3.1.1.1 Ring buses adaptation
3.1.1.2 Butter y topology NoCs adaptation
3.1.2 Platform controller
3.2 UDec conguration infrastructure
3.2.1 Main challenges for an ecient conguration infrastructure
3.2.1.1 Low complexity
3.2.1.2 Multicast, broadcast and selection mechanisms .
3.2.1.3 Incremental data burst transfer
3.2.2 Conguration infrastructure
3.2.2.1 Architecture overview
3.2.2.2 Addressing
3.2.2.3 Transfer protocol
3.2.2.4 Selection
3.2.3 SystemC/VHDL mixed Validation
3.2.3.1 Platform model
3.2.3.2 Model evaluation
3.2.4 FPGA prototype
3.3 Summary
4 Conguration management for the UDec architecture
4.1 Parallelism impact on decoding performance
4.1.1 Sub-block parallelism
4.1.2 Shued decoding
4.2 Pre-computed conguration management
4.3 Run-time conguration generation management
4.3.1 Restricted conguration management
4.3.2 Oversized conguration management
4.3.2.1 Oversized conguration principle
4.3.2.2 Oversized conguration generation
4.3.3 Oversized Conguration management scenario
4.3.3.1 1 frame – 1 conguration
4.3.3.2 Decoding of multiple frames
4.4 Conguration management discussion
4.5 Summary
Conclusion and perspectives
Glossary
Bibliography