Network-on-Chips architectures for network monitoring

Get Complete Project Material File(s) Now! »

General overview of the proposed design flow

In order for bitstream relocation to become a viable technique, an automated design flow that makes it transparent to the user is required. Our design flow aims at being user friendly, as no prior knowledge of relocation is needed, and it requires even less skills than a traditional DPR design flow. It is also worth noting that once all inputs have been provided and a configuration has been run, no user intervention is required, a simple building command based on the standard make is enough to provide all the needed outputs to run the design on the target.

Inputs

In order to use our design flow, a designer has to provide several specific files. An RTL description of the static part , in which all reconfigurable in-stances must be instantiated as black boxes is required. In order for the syn-thesis tool not to trim these instances, this attribute has to be specified (in VHDL in this example): attribute box_type of reconfigurable_module : component is « black_box »; In order to be able to constrain the locations of the interfaces with the relocatable regions, this black_box attribute must also be specified to the added_LUTs component (see section 3.2.2) as well as a lock_pins attribute: attribute lock_pins of added_luts_8_v7 : component is « true »;
An RTL description of every dynamic module also has to be pro-vided. The port list of that description must be identical to the reconfig-urable component previously declared as a black box in the static part. As a consequence, all reconfigurable modules must have the same ports. This means that if a reconfigurable module requires less inputs (or outputs) than another one, useless ones would still have to be declared.
A User Constraint File (UCF) is required, that only contains the same information as a traditional non-reconfigurable design flow, i.e. physical-logical ports bindings and timing constraints. It is important to note that, while UCFs usually allow the designer to use location constraints in order to fix instances of the design into specific locations, it is highly discouraged to do this while using our design flow, as our floorplanning algorithm (see section 3.3.1) will add location constraints of its own without taking these ones into consideration, potentially leading to conflicts. It is also worth not-ing that, while traditional DPR design flows require the designer to provide floorplanning information in the constraint file, this is not required with our design flow, as the floorplanning will be automated.
Finally, a configuration file has to be filled with several parameters.
Those parameters include:
• the name of the top module of the static part;
• the names of the reconfigurable component and their instances as stated in the static part;
• the number of relocatable regions in the design;
• the name of the UCF of the design;
• the part, package and speed grade of the targeted FPGA;
• the clock period of the design (as explained in section 3.3.1, only single clock designs are supported for now);
• the name of the hard macro used at the static/dynamic interfaces, as well as its required number for both inputs and outputs;
• finally, the list of every reconfigurable module on which a relocation will have to be performed.

Automated design flow

Once all inputs have been provided, a configuration script has to be run in order to set every parameter in all other scripts according to the ones given by the designer in the configuration file. Then, a simple make command is enough to launch the design flow. The general overview of the design flow can be found in figure 3.1. In that figure, the steps that are integrated in the traditional Xilinx DPR flow are highlighted in blue. The steps that are available in the literature and that we have scripted are highlighted in green with dashed lines. Finally, the steps that have not been addressed before are highlighted in red with thick lines.
First, the static and dynamic parts are synthesized separately , similarly to a usual DPR flow. In our case, the tool we use for that step is Xilinx Synthesis Tool XST [4].
A floorplanning algorithm then uses the information located both in the synthesis reports of the dynamic modules and in the configuration file in order to find a valid floorplan for the design. The main goal of a floorplan-ning algorithm is to find a decent placement of the reconfiguration regions on the targeted fabric. Although many algorithms (see [16, 53, 54, 55, 45, 18, 60, 47, 59]) exist for floorplanning in the case of traditional reconfigurable designs, two specificities of relocatable designs prevent us from using them for relocation. First, these algorithms do not have the constraint of identi-fying identical regions, which means that the output of these floorplanning algorithms will not likely be suitable for relocation. Second, most of these floorplanning algorithms tend to limit the fragmentation of the static part in order to provide a decreased resource waste. This leads to floorplans where reconfigurable regions are located right next to each other. However, as relocation forces us to apply the PRIVATE constraint on relocatable regions (see further), the static part will not be able to use any resource inside them. This can be problematic in the case where regions are adjacent, as it can cause severe congestion problem to the static part if some of its elements have to be placed between the relocatable regions. Thus, we decided to develop a new floorplanning algorithm specially thought for bitstream relocation. More details about that algorithm can be found in section 3.3.1.

New algorithms and techniques for previously missing steps

While existing work allow for a relocation to be possible, there are still some tasks that have not been yet addressed in terms of automation. This means that, while designers can use this technique, it is still quite tedious and require some long eﬀorts, especially since its most diﬃcult part, i.e. the floorplanning, has still not been automated.

Floorplanning algorithm dedicated to bitstream relo-cation

As no algorithm is currently available for floorplanning dedicated to bit-stream relocation, we had to develop a new one. As all regions must be identical, we can divide our algorithm into two steps: pattern (i.e. shape and resources arrangement) choice, and regions selection (i.e. selection of occurrences of the selected pattern will be used as relocatable regions).
General pattern requirements In order for a pattern to allow bitstream relocation, it has to fulfill several requirements due to limitations on the current FPGA fabrics. On Xilinx FPGAs, the reconfigurable fabric is di-vided into clock regions. Clock regions are groups of resources that are all connected to the same dedicated clock network. This means that all the synchronous resources that are located in a same clock region will have to share the same clock. This is also the reason that our design flow only supports single clock designs, as our floorplanning algorithm still does not allow for specific clock regions not to be selected to implement relocatable regions, preventing the designer from having any control on clock domains. It is also important to note that every clock region has the same height. On these FPGAs, the smallest reconfigurable element is called a frame, which is one clock region high (see figure 3.4). A frame actually corresponds to a single word in the configuration SRAM. Since frames are the smallest reconfigurable elements, this means that if a part of a frame is located in a reconfigurable region, the whole frame will have to be reconfigured (see figure 3.5). In traditional reconfigurable designs, only adding parts of frames inside a reconfigurable region would not be a problem since the part of the frame that does not belong to the region would be reconfigured the same way it already was (assuming the reconfiguration is glitch-free, and that involved LUTs are not used in a carry state). However, in case of a relocation, the part of frames that does not belong to the relocatable region would then be reconfigured in another way than it was, likely causing dysfunctions in the design. By forcing each frame to belong entirely either to the static part or to reconfigurable regions, we can then ensure that relocation will not alter the state of static resources. As a result, relocation oriented design flows should always consider only regions that span entire clock regions.

Table of contents :

A Abstract
B Résumé
1 Introduction
1.1 Context
1.2 Objectives
1.3 Thesis structure
2 Background and state of the art on flexible designs on FPGAs
2.1 Network monitoring
2.2 Reconfigurable designs on FPGAs
2.2.1 Structure of FPGAs
2.2.2 Typical design flow for FPGAs
2.2.3 Dynamic Partial Reconfiguration
2.3 Bitstream relocation on Xilinx FPGAs
2.4 Network-On-Chips
2.4.1 Topologies
2.4.2 Routing algorithms
2.5 Conclusion
3 Automated design flow for bitstream relocation on Xilinx FPGAs
3.1 Motivation
3.2 General overview of the proposed design flow
3.2.1 Inputs
3.2.2 Automated design flow
3.3 New algorithms and techniques for previously missing steps .
3.3.1 Floorplanning algorithm dedicated to bitstream relocation
3.3.2 New timing constraining technique
3.4 Tests and implementation status
3.4.1 Floorplanning results
3.4.2 Layout description and supported targets
3.5 Conclusion
4 Using Network-on-Chips for network monitoring
4.1 Choosing the NoC characteristics and dimensions
4.1.1 Overview of the project board
4.1.2 NoC characteristics
4.1.3 Generation tool
4.1.4 Dimensionning the NoC
4.2 Protocol overlay
4.2.1 Motivation
4.2.2 Providing the sequence of treatments to packets
4.2.3 Parameterization information
4.2.4 Multipackets
4.3 NoC/Functional units interface
4.3.1 Interface overview
4.3.2 Buffers
4.3.3 Acquisition module
4.3.4 Unit management module
4.3.5 Release module
4.4 Conclusion
5 Test case: traffic generator
5.1 Model presentation
5.2 Design presentation
5.3 Implementation
5.3.1 Overview
5.3.2 Senders
5.3.3 Outputs
5.4 Results
5.4.1 Static version
5.4.2 Issues with relocation
5.4.3 Validation of relocation on other designs
5.5 Conclusion
6 Conclusion
6.1 Automated design flow for bitstream relocation
6.2 Network-on-Chips architectures for network monitoring .
6.3 Traffic generator test case
6.4 Perspectives and future work
Glossary
Bibliography