Automating layout generation of specific Mesh-based FPGA

Get Complete Project Material File(s) Now! »

Case Study: Altera Stratix III

Based on 65 nm all-layer copper SRAM process, the Stratix-III FPGA family was devel-oped by Altera Corporation [Stratix-III, 2008]; it offers two family variants, optimized to meet different application needs:
¥ The Stratix-IIIL family provides balanced logic, memory, and multiplier ratios for mainstreams applications.
¥ The Stratix-IIIE family is rich in memory and multiplier, for data-centric applica-tions such as wireless, medical imaging and military applications.
The Stratix-III FPGA is an island-style device containing logic array blocks (LABs), em-bedded memories, embedded DSP and multipliers blocks, multi-function I/O elements (IOEs), and up to 12 phase-locked loops (PLLs). All these functional elements are con-nected to each other by a two-dimensional multi-track interconnect fabric. Logic Array Blocks and Adaptive Logic Modules The Logic Array Block (LAB) shown in figure 1.4 is composed of basic building blocks known as Adaptive Logic Modules (ALMs) which can be configured to implement logic, arithmetic, and register functions.
Each LAB consists of ten ALMs, carry chains, shared arithmetic chains, LAB control sig-nals, local interconnect, and register chain connection lines. The LAB of Stratix-III has a by-product called Memory LAB (or MLAB), which adds SRAM memory capability to the LAB. MLAB is a superset of the LAB and includes all LAB features. MLABs support a maximum of 320 bits of simple dual-port Static Ran-dom Access Memory (SRAM). Each ALM in an MLAB can be configured as a 16×2 block, resulting in a configuration of 16×20 simple dual port SRAM block. MLAB and LAB blocks always co-exist as pairs in all Stratix III families, allowing up to 50% of the logic (LABs) to be traded for memory (MLABs).

Interconnection Networks and FPGA architectures al-ternatives

Due to various performance requirements and cost metrics, many network topologies are designed for specific applications. Originally founded for telephony, interconnection networks are adopted in computer systems networks, parallel computing, and graph theory. It is inspiring and informative to look back on the multiplicity of interconnec-tion networks.
Networks can be classified into direct networks and indirect networks [J.Duato et al., 1997]. In direct network terminal nodes are connected directly with all others by the network. In indirect network, terminal nodes are connected by one (or more) interme-diate switches. The switching nodes perform the routing. Therefore, indirect networks are also often referred to as multistage interconnect networks.
Direct networks and indirect networks can have different topologies [J.Duato et al., 1997]. It is not the objective of this chapter to discuss functionalities and performance metrics of these different networks. Rather, we give only a brief description of some of the well known network topologies. We use these topologies as examples to formulate the FPGA network problems in later chapters.

Direct Network Topologies Orthogonal Topology:

Nodes in orthogonal networks are connected in k-ary n-dimensional mesh (k-ary n-mesh) or k-ary n-dimensional torus (k-ary n-cube) formations, as shown in Fig 1.11 Due to simple connection and easy routing provided by adjacency, mesh networks are widely used in most FPGA as described in this chapter. Orthogonal networks are highly regular, therefore, interconnect length between nodes is expected to be uniform to ensure performance uniformity of logic nodes.
We can use only local connections within the array between adjacent or close, array elements. The bisection bandwidth in a mesh topology with n elements is O( n) and hence, never dominates the logical array element size. However, communicating a piece of data between two points in the array requires a switching delay proportional to the manhattan distance between the source and the destination (O( n)). This makes distant communication slow and expensive and can make interconnect delay quite high-easily dominating the delay through the logic element.

Cube-Connected-Cycles Topology:

The cube-connected-cycles (CCC) topology is proposed as an alternative to orthogonal topologies in order to reduce the degree of each node [F.P.Preparata and J.Vuillemin, 1981], as shown in Fig 1.12. Each node has 3 degrees of connectivity, compared to 2 degrees in mesh and torus networks. CCC networks have a hierarchical structure: the 3 nodes at each corner of the cube form a local ring.

FPGA characteristics and challenges

An overall view of conventional, reconfigurable devices shows that 80-90% of the area is dedicated to the reconfigurable interconnect. The remaining area is dedicated to re-configurable logical blocks. This 80-90% area includes switches, wires and configuration memory which are reserved for interconnect. The remaining area is dedicated to recon-figurable logic blocks.
To illustrate the magnitude of this problem, an area profile is given in table 1.1 [G.Lemieux and D.Lewis, 2004]. From this table, it can be seen that FPGAs area profiles are approxi-matively 80-90% for routing and only 10% of the area is used to implement logic directly. Inability to meet timing requirements, power consumption, or logic capacity constraints make it infeasible technically to use FPGA in the most demanding applications. De-spite their design cost advantage, FPGAs impose large area overheads when they are compared to custom silicon alternatives. The interconnect and configuration overhead is responsible for the 40x density ratio disadvantage [I.Kuon and J.Rose, 2007] incurred by reconfigurable devices, compared to hardwired logic.

Table of contents :

Introduction
1 Overview and Synopsys
2 Research Goals and Motivations
3 Thesis Organization
1 Background
1.1 Introduction
1.2 Field Programmable Gate Array
1.3 FPGA structures
1.3.1 Case Study: Altera Stratix III
1.3.2 Case Study: Xilinx Virtex
1.4 Interconnection Networks and FPGA architectures alternatives
1.4.1 Direct Network Topologies
1.4.2 Indirect Network Topologies:
1.5 Design Automation for FPGA
1.6 FPGA characteristics and challenges
1.7 Conclusion
2 Automating Layout of Mesh Based FPGA
2.1 Introduction
2.2 Adaptive VLSI CAD Platform
2.3 Circuit Design: Architecture generator
2.3.1 Architecture Modelisation
2.3.2 Generic mesh FPGA model
2.3.3 FPGA Tiles
2.3.4 Programming access
2.4 VLSI Layout generator
2.4.1 Tile Layout
2.4.2 FPGA layout
2.5 Embedded FPGA
2.6 conclusion
3 Redundant FPGA Core
3.1 Context
3.2 Robustness of the FPGAs Configuration Memory
3.2.1 Basic SRAM Cell
3.2.2 The Dual Interlocked CEll (DICE) structure
3.2.3 Testing the DICE: Error Injection
3.3 Error Detection and Correction
3.3.1 Parity Check Technique
3.3.2 Hamming Code
3.4 Architecture Features
3.4.1 Motivations
3.4.2 REDFPGA architecture overview
3.4.3 SEU detection and correction in REDFPGA
3.5 Tape Out
3.5.1 Simulation:
3.5.2 Netlist layout comparison:
3.5.3 Electric simulation:
3.6 Configration flow
3.7 Conclusion
4 MFPGA Architecture
4.1 Issues in Reconfigurable Network Design
4.2 Previous Works on hierarchical architectures
4.2.1 Rent’s Rule
4.2.2 Analytical comparison: k-HFPGA and Mesh
4.3 Proposed Architecture
4.3.1 Downward Network
4.3.2 The Upward Network
4.3.3 Connections with the Outside
4.3.4 Interconnect Depopulation
4.4 Rent’s Rule MFPGA based model
4.4.1 Wires growth in MFPGA Rent model
4.4.2 Switch growth in Rent MFPGA model
4.4.3 Analysis and comparison with Mesh Model
4.5 Architecture exploration methodologies
4.5.1 Experimental platform for MFPGA
4.5.2 Area Model
4.5.3 Mesh-based candidate architecture
4.5.4 Benchmark circuits
4.6 Experimental Results
4.6.1 Architecture optimization
4.6.2 Area Efficiency
4.6.3 Clusters Arity Effect
4.6.4 LUT Size Effect
4.7 Conclusion
5 Physical Planning of the Tree-Based MFPGA
5.1 Challenge for MFPGA layout design
5.2 MFPGA Wiring requirement
5.3 Problem Formulation
5.4 Network Floorplan
5.5 MFPGA Full Custom Layout
5.5.1 4-LUT based logic block
5.5.2 Progarmmable interconnect
5.5.3 Physical placement and routing
5.5.4 Configuration Storage and Distribution
5.6 Timing analysis
5.6.1 Delay Model
5.6.2 Critical path extraction and speed performances
5.6.3 Speed performances
5.7 The area gap between MFPGA and ASIC
5.8 Conclusion
Conclusion
1 Summary of contributions
1.1 Automating layout generation of specific Mesh-based FPGA
1.2 Multilevel Hierarchical FPGA architectures
2 Future work
2.1 Tree-based MFPGA architecture improvements
2.2 Delay and power models
2.3 Large tree-based FPGA
List of Publications
Bibliography