Previous Works on hierarchical architectures

Get Complete Project Material File(s) Now! »

Case Study: Xilinx Virtex 5

Virtex-5 devices introduced in 2006 by Xilinx were the world’s first FPGAs manufac-tured with the 65nm technology. The Virtex-5 is based upon an architectural approach that Xilinx created to reduce the cost of developing multiple FPGA platforms, each one with different combinations of feature sets. Xilinx has dubbed this architectural approach as the Advanced Silicon Modular Block (ASMBL) architecture. For each ap-plication domain, such as digital signal processing, Xilinx has determined the optimum mixture (ratio) of logic, memory, DSP slices, and so forth. Next, for each application do-main, Xilinx creates a set of components, all based on the same « mix » but with varying proportions. The initial release of the Virtex-5 includes devices that offer a choice of 4 new platforms, each one delivering an optimized balance of high-performance logic, serial connectivity, signal processing, and embedded processing:
¥ LX : Optimized for high-performance logic.
¥ LXT: Optimized for high-performance logic with low-power serial connectivity.
¥ SXT: Optimized for DSP and memory-intensive applications, with low-power se-rial connectivity.
¥ FXT: Optimized for embedded processing and memory-intensive applications, with highest-speed serial connectivity.
FPGA designs made considerable progress over recent years. Today’s designs often fea-ture wide data paths, especially in the case of digital signal processing (DSP) appli-cations. Implementing these designs using 4-input LUTs can require many levels of logic, thereby reducing performance. In order to address this issue, the ExpressFab-ric employed by the Virtex-5 family features LUTs with 6 independent inputs, which can significantly reduce the number of logic levels required to implement large func-tions. Virtex-5 devices are also based on a new diagonal interconnect architecture that facilitates shorter, faster routing. An overview of some of the more significant Virtex-5 architectural features is as follows:
Configurable Logic BlockThe Configurable Logic Blocks (CLBs) are the main logic resources for implementing sequential or combinatorial circuits. Each CLB element is connected to a switch matrix to access the general routing matrix (shown in Figure 1.8). A CLB element contains a pair of slices. These two slices do not have direct connections to each other, and each slice is organized as a column. Each slice in a column has an independent carry chain.
Every slice contains 4 logic-function generators (or look-up tables), 4 storage elements, wide-function multiplexers, and carry logic. Each of these logical functions can be used as a true 6-input LUT or as two 5-input LUTs that share five of their inputs. In addition to its four 6-input LUTs, a Virtex-5 slice also includes fast flip-flops to speed pipelined designs and an improved carry chain architecture to speed arithmetic operations. More-over, some slices support two additional functions: storing data using distributed RAM and shifting data with 32-bit registers. Slices that support these additional functions are called SLICEM; others are called SLICEL. Overall, Virtex-5 family can provide up to 207,360 6-inputs LUT equivalent to 330,000 logic cells (Industry defines a logic cell as a 4-input Look-up Table and a Flip-Flop).

Interconnection Networks and FPGA architectures al-ternatives

Due to various performance requirements and cost metrics, many network topologies are designed for specific applications. Originally founded for telephony, interconnection networks are adopted in computer systems networks, parallel computing, and graph theory. It is inspiring and informative to look back on the multiplicity of interconnec-tion networks.
Networks can be classified into direct networks and indirect networks [J.Duato et al., 1997]. In direct network terminal nodes are connected directly with all others by the network. In indirect network, terminal nodes are connected by one (or more) interme-diate switches. The switching nodes perform the routing. Therefore, indirect networks are also often referred to as multistage interconnect networks.
Direct networks and indirect networks can have different topologies [J.Duato et al., 1997]. It is not the objective of this chapter to discuss functionalities and performance metrics of these different networks. Rather, we give only a brief description of some of the well known network topologies. We use these topologies as examples to formulate the FPGA network problems in later chapters.

FPGA characteristics and challenges

An overall view of conventional, reconfigurable devices shows that 80-90% of the area is dedicated to the reconfigurable interconnect. The remaining area is dedicated to re-configurable logical blocks. This 80-90% area includes switches, wires and configuration memory which are reserved for interconnect. The remaining area is dedicated to recon-figurable logic blocks.
To illustrate the magnitude of this problem, an area profile is given in table 1.1 [G.Lemieux and D.Lewis, 2004]. From this table, it can be seen that FPGAs area profiles are approxi-matively 80-90% for routing and only 10% of the area is used to implement logic directly. Inability to meet timing requirements, power consumption, or logic capacity constraints make it infeasible technically to use FPGA in the most demanding applications. Despite their design cost advantage, FPGAs impose large area overheads when they are compared to custom silicon alternatives. The interconnect and configuration overhead is responsible for the 40x density ratio disadvantage [I.Kuon and J.Rose, 2007] incurred by reconfigurable devices, compared to hardwired logic.

Architecture Modelisation

In our approach, an Island Style FPGA is defined by 3 main elements: Logic Block(LB), Switch Block(SB) and Input/Output Block(IOB). We use an architecture file to describe the connections topology of every block as follow:
Logic Block: This block can be considered as a black box with a fixed number of in-puts and outputs distributed on the 4 sides of the box and connected to the adjacent routing channels as shown in figure 2.3. Every input or output pin has an associated connection vector showing the associated fraction of tracks where it can be connected. Pins connected to global network like the clock are distinguished by the key word global.
An example of LB interface and its corresponding description file is presented in figure 2.3. The structural netlist of the LB interface is generated automatically and cor-responds to programmable multiplexers for the inputs and a set of tristates for the out-puts that drives the routing channels. Figure 2.4 shows an example of input and output vectors.
The internal HDL logic description (netlist) of the LB is written by the designer. Gen-erally it contains a number of different or similar subblocks interconnected together by the internal network. Subblock is generally K-inputs Basic Logic Element (BLE) that con-tains K-LUT, flip-flop and a bypass multiplexer. Figure 2.5 shows an example of 4-BLE instantiated later 4 times in the clustered LB with a cross-bar for local interconnections.

READ Aquaculture development and role in the global food system

Generic mesh FPGA model

The first approach to generate a mesh-based FPGA netlist is to break the problem in two levels of hierarchy. At the bottom level, the main blocks are defined in the architec-ture file as shown previously and the black boxes corresponding to the IOB and LB are written by the designer in high level design language such as VHDL. The top netlist is composed only of IOB, SB and LB and determined automatically by duplicating these building blocks as shown in figure 2.8.
To simplify the problem for the physical layout, we adopt a second approach which consists to compose the FPGA in the form of regular array as shown in figure 2.8. Each tile of the array has a square area and refers to a logic block core, a switch block, and one set of adjacent horizontal and vertical routing tracks.

Robustness of the FPGAs Configuration Memory

If a soft error occurs in a memory element, it is difficult to recover the original data. SRAM hardening can be achieved through redundancy, resistor decoupling, shielding. Hardening with technologies such as: CMOS substrate epitaxy, CMOS on insulator sub-strate [S.Hareland et al., 2001] and resistive or capacitive hardening [W.Wang, 2004] in-duce a degradation of performance. A modified storage cell called Dual Interlocked Cell (DICE) [T.Calin et al., 1996] avoids those drawbacks and errors, thus achieving upset immunity.

Basic SRAM Cell

Each configuration bit in an SRAM-based FPGA is stored on four transistors forming two cross-coupled inverters. Two additional access transistors are used to control the access to a storage cell during read and write operations. This is the typical SRAM cell with six MOSFETs to stores one memory bit, shown in figures 3.2 and 3.3.
First, standard SRAM memory is examined in order to evaluate its immunity and its operating limits under SEU impact. To evaluate the maximum current that SRAM can admit, the error duration is fixed and its current amplitude limit is measured. In this work the Upset induced charge is simulated by a time-dependent current source with a triangular shape. The current wave form depends on two factors: fault duration and current amplitude.
By SPICE simulation, we have found that critical current (i.e., the minimal pulse am-plitude that provokes a positive upset) for the standard SRAM cell presented in figure 3.3 is depending on environment caracteristics and may be:
¥ 180uA for a duration of 200ps or.
¥ 220uA for a duration of 100ps or.

Table of contents :

Introduction 1
1 Overview and Synopsys
2 Research Goals and Motivations
3 Thesis Organization
1 Background
1.1 Introduction
1.2 Field Programmable Gate Array
1.3 FPGA structures
1.3.1 Case Study: Altera Stratix III
1.3.2 Case Study: Xilinx Virtex
1.4 Interconnection Networks and FPGA architectures alternatives
1.4.1 Direct Network Topologies
1.4.2 Indirect Network Topologies:
1.5 Design Automation for FPGA
1.6 FPGA characteristics and challenges
1.7 Conclusion
2 Automating Layout of Mesh Based FPGA
2.1 Introduction
2.2 Adaptive VLSI CAD Platform
2.3 Circuit Design: Architecture generator
2.3.1 Architecture Modelisation
2.3.2 Generic mesh FPGA model
2.3.3 FPGA Tiles
2.3.4 Programming access
2.4 VLSI Layout generator
2.4.1 Tile Layout
2.4.2 FPGA layout
2.5 Embedded FPGA
2.6 conclusion
3 Redundant FPGA Core
3.1 Context
3.2 Robustness of the FPGAs Configuration Memory
3.2.1 Basic SRAM Cell
3.2.2 The Dual Interlocked CEll (DICE) structure
3.2.3 Testing the DICE: Error Injection
3.3 Error Detection and Correction
3.3.1 Parity Check Technique
3.3.2 Hamming Code
3.4 Architecture Features
3.4.1 Motivations
3.4.2 REDFPGA architecture overview
3.4.3 SEU detection and correction in REDFPGA
3.5 Tape Out
3.5.1 Simulation:
3.5.2 Netlist layout comparison:
3.5.3 Electric simulation:
3.6 Configration flow
3.7 Conclusion
4 MFPGA Architecture
4.1 Issues in Reconfigurable Network Design
4.2 Previous Works on hierarchical architectures
4.2.1 Rent’s Rule
4.2.2 Analytical comparison: k-HFPGA and Mesh
4.3 Proposed Architecture
4.3.1 Downward Network
4.3.2 The Upward Network
4.3.3 Connections with the Outside
4.3.4 Interconnect Depopulation
4.4 Rent’s Rule MFPGA based model
4.4.1 Wires growth in MFPGA Rent model
4.4.2 Switch growth in Rent MFPGA model
4.4.3 Analysis and comparison with Mesh Model
4.5 Architecture exploration methodologies
4.5.1 Experimental platform for MFPGA
4.5.2 Area Model
4.5.3 Mesh-based candidate architecture
4.5.4 Benchmark circuits
4.6 Experimental Results
4.6.1 Architecture optimization
4.6.2 Area Efficiency
4.6.3 Clusters Arity Effect
4.6.4 LUT Size Effect
4.7 Conclusion
5 Physical Planning of the Tree-Based MFPGA
5.1 Challenge for MFPGA layout design
5.2 MFPGA Wiring requirement
5.3 Problem Formulation
5.4 Network Floorplan
5.5 MFPGA Full Custom Layout
5.5.1 4-LUT based logic block
5.5.2 Progarmmable interconnect
5.5.3 Physical placement and routing
5.5.4 Configuration Storage and Distribution
5.6 Timing analysis
5.6.1 Delay Model
5.6.2 Critical path extraction and speed performances
5.6.3 Speed performances
5.7 The area gap between MFPGA and ASIC
5.8 Conclusion
Conclusion
1 Summary of contributions
1.1 Automating layout generation of specific Mesh-based FPGA
1.2 Multilevel Hierarchical FPGA architectures
2 Future work
2.1 Tree-based MFPGA architecture improvements
2.2 Delay and power models
2.3 Large tree-based FPGA
Bibliography