A Methodology based on Transportation Problem Modeling for Designing Parallel Interleaver Architecture

Get Complete Project Material File(s) Now! »

Convolutional Code state and Trellis Diagram

Convolutional encoder can be represented as finite-state machine in which the relationship between input, state and output can be explained through state transition table and state diagram. Encoder discussed in previous section consists of three register elements. So the state of encoder, represented by S = (s(1), s(2) ,s(3)) where s(1), s(2), s(3) 1 {0,1}, is the contents of three register elements from left to right. If v is the number of elements then there are 2v possible states in which encoder can be at any time instance. So, the eight possible states for encoder with three register elements are, S0 = (000),S1 = (001), S2 = (010), S3 = (011), S4 = (100), S5 = (101), S6 = (110), S7 = (111).
For the encoder in discussion, state transition table is shown in Table 1. 1. In this table, Sc and Sn are the current and next states respectively and x represents the input message bit which causes this transition. Also the output coded bits generated as a result of transition from Sc and Sn are labeled as cc,n(1) (systematic bit) and cc,n(2) (parity bit). State transition table is represented graphically as a state diagram in which node represents one of the eight states and directed edge represent the state transition between nodes as shown in Figure 1. 3. Label on each edge mentions the input bit that generates the state transition and output bits as input/output. The state diagram completely explains the relationship between state transitions and input/output bits but it does not provide any information about how this relationship evolved with time. Trellis diagram gives the required information by expanding state diagram at each time instance as shown in Figure 1. 4. Two copies of states are represented at time t and t+1 and directed edges are drawn from states at t to states at t+1 to show the state transitions. Due to trellis diagram, convolutional codes are decoded very efficiently by algorithms which operate on code trellis to find most likely codewor.

Decoding Convolutional Codes

As explained in previous section, each state transition from Sc to Sn or trellis edge corresponds to a particular input. So, decoder computes the probability of each state transition to find the maximum probability input bit. An efficient decoding algorithm based on trellis was first presented by Bahl, Cocke, Jelenik and Raviv [BAH74] and is called the BCJR algorithm.
Algorithm is based on the concept that codeword bits ct sent at time t are influenced by the codeword bit ct+ sent before it. Thus, they may also affects the codeword bits ct- sent after it. So in order to estimate the message bits, the algorithm makes two passes through the trellis: forward pass and backward pass. Forward Pass estimates the message bit xt at t based on ct+ whereas backward Pass uses ct- to estimate xt. Suppose yt represents the symbols received for the codeword bits ct sent at time t. yt+ represents the symbols received for the codeword bits ct+ sent after time t and yt- represents the symbols received for the codeword bits ct- sent before t. Then, the probability of the state transition from state Sc at time t-1 to state Sn at time t is given by the following equation, 2t (Sc, Sn) = Αt-1(Sc) 3t(Sc, Sn) Βt(Sn) 1. 1 The probability is a function of following three terms: (I) Αt-1 represents the probability that the encoder is in state Sc at t-1 based on the information about symbols received before t i.e., yt-; (II) Βt(Sn) represents the probability that the encoder is in state Sn at t based on the information about symbols received before after t i.e., yt+; (III) 3t(Sc, Sn) represents the probability of the state transition from Sc to Sn based on the information about symbols received at t i.e., yt; The calculation of Α and Β values are respectively called the forward and backward recursion of the BCJR decoder and are calculated through following equations: 2v −1 Αt-1(Sc) = 1 Αt-2(Si) 3t-1(Si, Sn) 1. 2 i=0 2v −1 Βt(Sn) = 1 Βt+1 (Si) 3t+1(Sc, Si) 1. 3 i=0.
From this equations it is clear that values of both Α and Β or path metrics can be calculated recursively and each step requires multiplication operations over real numbers which increases the decoder complexity in terms of area and cost when implementing in hardware. This problem is overcome by re-formation of original algorithm in the logarithmic domain. The benefit of representing path metrics as log metrics is that now we can replace multiplication with addition. If α,β and γ represents the log metrics of Α, Β and 3 then previous equations transform into the log domain as follows: 4t (Sc, Sn) = αt-1(Sc) + γt(Sc, Sn) + βt(Sn) 1. 4 αt-1(Sc) = log 1 eαt-2(Si) + γt-1(Si, Sn) 1. 5.

Parallelism in Turbo Codes Decoding

As explained in previous sections 2.3, to decode each component code, turbo decoder first calculates α-values (using forward recursion) and β-values (through backward recursion) in order to generate extrinsic values. This approach is called serial implementation of turbo decoder. Serial implementation can be represented through Forward Backward Scheme as shown in Figure 1. 7.a. Implementation of this scheme can best be explained through data access order shown in Figure 1. 7.b. for D = number of data elements used in a code = 8, T = time to decode a code = 2D = 16 and PE = number of processing elements = 1. For first T/2 = 8 time instances, decoder accesses data elements from extrinsic memory from D = 0 to D = 7, calculate α-values and store them into decoder inside decoder memory. For next T/2 time instances from 9 to 16, decoder calculates β and extrinsic values and writes updated extrinsic values into extrinsic memory to complete the decoding of component code. It is important to note that decoder can process only one data elements at each time instance in serial implementation.
Serial implementation has following two drawbacks.
1. The calculation of path metrics (α and β-values) requires that the whole transmitted data has to be received and stored which increases the decoder memory.
2. The approach significantly increases the latency (i.e., time to compute extrinsic values) and make it unsuitable for high data rate application.
To tackle these drawbacks, different parallelism approaches are proposed which are explained below. D-1 FRAME 0 Data 0.

Table of contents :

Chapter 1 PROBLEMATIC
1. Forward Error Correction (FEC) Coding 5
2. Convolutional Codes 5
2.1. Convolutional Encoder
2.2. Convolutional Code state and Trellis Diagram
2.3. Decoding Convolutional Codes
2.4. Turbo Codes
2.4.1. Turbo Encoder
2.4.2. Interleaver
2.4.3. Turbo Decoder
2.4.4. Parallelism in Turbo Codes Decoding
2.4.4.1Recursive Unit Parallelism
2.4.4.2Trellis Level Parallelism
2.4.4.3SISO Decoder Level Parallelism
2.4.4.4Conclusion
3. Block Codes 15
3.1. Encoding of Linear Block Codes
3.2. Low Density Parity Check (LDPC) Codes
3.3. Tanner Graph Representation
3.4. Decoding
3.5. Implementation of LDPC Decoder
3.5.1. Fully Parallel Implementation
3.5.2. Fully Serial Implementation
3.5.3. Partially Parallel Implementation
4. Memory conflict problem 22
4.1. Memory conflict problem for Turbo Codes
4.2. Memory conflict problem for LDPC Codes
5. Conclusion 26
Chapter 2 STATE OF THE ART
1. Introduction
2. Approaches to tackles memory conflict problem for turbo and LDPC codes
2.1. Conflict Free Interleaving laws
2.1.1. Architecture aware Turbo Codes
2.1.2. Structured LDPC Codes
2.2. Run Time Conflict Resolution
2.2.1. Run Time Conflict Resolution for Turbo Codes
2.2.2. Run Time Conflict Resolution for LDPC Codes
2.3. Design Time Conflict Resolution
2.3.1. Design Time Conflict Resolution for Turbo Codes
2.3.2. Design Time Conflict Resolution for LDPC Codes
2.4. Conclusion
3. Node and Edge Coloring of Graph
3.1. Graph Theory
3.2. Conflict Graph
3.2.1. Conflict Graph for Turbo Codes
3.2.2. Conflict Graph for LDPC Codes
3.3. Bipartite Graph
3.3.1. Bipartite Edge Coloring
3.3.1.1Vizing Method to color the edges of Bipartite Graph
3.3.1.2Gabow Method to color the edges of Bipartite Graph
4. Transportation Problem
5. Conclusion
Chapter 3 METHODS BASED ON BIPARTITE GRAPH FOR SOLVING MEMORY MAPPING PROBLEMS
1. Introduction
2. A Methodology based on Transportation Problem Modeling for Designing Parallel Interleaver Architecture
2.1. Problem Formulation for Memory Mapping Problem of Turbo codes
2.2. Modeling
2.2.1. Construction of Bipartite Graph
2.2.2. Transformation of bipartite graph into Transportation Matrix
2.3. Algorithm to find semi 2-factors in Turbo Bipartite Graph
2.4. Pedagogical Example to explain algorithm
3. Methodologies Based on Bipartite Graph to find conflict Free Memory Mapping for LDPC
3.1. Problem Formulation for Memory Mapping Problem of LDPC codes
3.2. Modeling
3.3. Algorithm
3.3.1. Partition the Bipartite Graph
3.3.2. Coloring edges of Bipartite Graph
3.4. Pedagogical Example to explain algorithm
4. Conclusion
Chapter 4 METHODS BASED ON TRIPARTITE GRAPH FOR SOLVING MEMORY MAPPING PROBLEMS
1. Introduction
2. Methodology Based on Tripartite Graph to find conflict Free Memory Mapping for LDPC
2.1. Modeling
2.2. 2-Step Coloring Approach
2.3. Pedagogical Example to explain algorithm
3. Constructing Bipartite Graph for Turbo and LDPC Codes 87
3.1. Construction of Bipartite Graph for Mapping Problem of Turbo Codes
3.2. Construction of Bipartite Graph for Mapping Problem of LDPC Codes
3.3. Bipartite Edge Coloring Algorithm
3.4. Pedagogical Example to explain algorithm
4. Complexity comparison of Different algorithms 96
5. Conclusion 97
Chapter 5 EXPERIMENTS
1. Introduction
2. Design Flow for Memory Mapping Tool
3. Ultra Wide Band Communication System
3.1. Bit Interleaver used in WPAN IEEE 802.15.3a Physical Layer
3.2. Experiments and Results
4. Designing Parallel Interleaver Architecture for Turbo Decoder
4.1. Interleaver used in HSPA Evolution
5. Designing Partially Parallel Architecture for LDPC Decoder
5.1. Partially Parallel Architecture for structured LDPC codes
5.2. Decoder Architecture for Non-Binary LDPC codes
6. Case Study: Designing Parallel architecture for Quadratic Permutation Polynomial Interleaver
6.1. Configurations used in this study
6.2. Experiments and Results
7. Conclusion
Conclusion and Future Perspectives
Bibliography