Hardware monitoring applied to traffic classification

Get Complete Project Material File(s) Now! »

Boards for traﬃc monitoring

An FPGA is just a computation unit. It receives binary data, processes it, and generates binary data. It has a large number of input and output wires. This is why it is usually integrated on a board, which connects the FPGA to other electronic components.
Some boards are specialized for network applications. They mainly connect the FPGA to network interfaces and memories. Most of these boards are designed to be embedded into a computer, and are able to communicate with the CPU through a PCIe port. Among these boards are:
The NetFPGA 1G [LMW+07] embeds a Xilinx Virtex-II Pro 50 FPGA, 4×1 Gb/s Ethernet interfaces, 4.5 MB of SRAM and 64 MB of DRAM.
The NetFPGA 10G [Net12] embeds a Xilinx Virtex 5 TX240T FPGA, 4×10 Gb/s Ethernet interfaces, 27 MB of SRAM and 288 MB of DRAM.
The Combo 20G [IT13] embeds a Xilinx Virtex 5 LX155T FPGA, 2×10 Gb/s Ethernet interfaces and 144 MB of SRAM.
The NetFPGA 10G represents a hardware cost of $1 675 for academics. The only other hardware required to use it is a normal computer that will communicate with it and be able to reconfigure it. The most obvious diﬀerence between these three boards is the speed of their interfaces, but the FPGA model diﬀers too. Diﬀerent models have diﬀerent numbers of logic units (LUTs, registers, inputs and outputs, memory and DSP units). The most powerful FPGA is on the NetFPGA 10G, and the least powerful on the NetFPGA 1G. The available on-board memory varies as well. Other models of FPGA exist that are much bigger, but they have not been chosen for these boards to keep the cost low.
The supported data rate of the boards only depends on the speed of their interface. Network interfaces are connected to the FPGA in a very direct way. For example on the NetFPGA 10G, each 10 Gb/s optical link is connected to an enhanced Small Form-factor Pluggable (SFP+), which converts the optical signal into an electrical signal. This signal is then connected to an electronic dispertion compensation equalizer, which cleans the signal, and to four RocketIO GTX transceivers, which are special transceivers directly integrated to the Xilinx Virtex 5 FPGA. This way, the signal is seen inside the FPGA as two parallel buses that work on the FPGA frequency and support the 10 Gb/s speed. The actual speed of the bus is even higher than 10 Gb/s to allow to add internal headers to each packet to transmit information between diﬀerent entities implemented on the FPGA. There is one bus receiving data, and one bus sending data. This article [BEM+10] describes more deeply the architecture of the NetFPGA 10G board. All the application has to do to support the maximum speed is to not slow down the buses. This also aﬀects the reliability of the platforms: the boards are designed to support the maximum data rate of their interfaces with the smallest packets. The application developers control all the factors that may slow down the computing as they decide how to use each electronic component.
To guarantee the future scalability of applications developed on these boards, diﬀerent solutions are possible:
Use FPGA boards with faster interfaces: a Combo 100G board is for ex-ample in development [Pus12]. It will have one Ethernet interface at 100 Gb/s.
Reuse the FPGA development to design an Application-Specific Integrated Circuit (ASIC). These specialized circuits are made of basic logic and arith-metic gates to realize a specific function. Contrary to FPGAs, they cannot be reconfigured after manufacturing. This means that all the reconfigura-tion logic that takes a lot of space on an FPGA can be removed, and that basic logic gates can be optimized. This way, ASICs reach higher speeds and are more compact, which would help support higher data rates. A design for an FPGA is usually used as base to design an ASIC. Specialized circuits cost more to design, but if a lot of units are sold, each unit will cost less than an FPGA.

Development principles

Development on FPGA is usually based on one of two languages: VHDL or Ver-ilog. Both are hardware description languages. They look similar to classical programming languages, but do not work the same way. The main diﬀerence is that lines written one after the other do not execute consecutively but con-currently. A file usually describes one entity, with input and output wires, and the description of the processing made by this entity. Entities can include other entities as components by connecting to their inputs and outputs.
Full-featured development suites are provided both by Xilinx and Altera. Each suite works only with FPGAs from the same manufacturer, but development can always be done using indiﬀerently VHDL or Verilog. So code developed for one platform can be used on the other. The important tools for development on FPGA are:
A simulator, which is used to check the proper behavior of an entity described in VHDL or Verilog. All there is to do is to write a testbench in VHDL or Verilog, which will manipulate the input values of the entity during the simulation time. The graphical simulation shows everything that happens on each wire inside the entity. It is also possible to automate the verification by checking the output values against predicted ones. This tool is essential to save time and ensure that no obvious development mistakes were made before testing the entity on the FPGA.
A synthesizer, which transforms an entity written in VHDL or Verilog into a bitfile that can be used to configure an FPGA. This process is made in many steps and can take a long time. The code is first transformed into a list of basic operations that are available on the FPGA. Operations are then mapped on the actual layout of the FPGA, and the tool tries to route all connecting signals on the chip. This is a heavy optimization problem: each bit manipulated by the entity is a wire, and if any wire between two registers is too long, the global clock frequency of the FPGA will have to be lowered, which will reduce the computation speed.
Once the bitfile is configured on the FPGA, its functionality can be tested. But it is a very slow and complicated process. The main problem is that is is diﬃcult to access to the internal values of wires inside the FPGA. This is why the simulation must be done very thoroughly to avoid having to debug hardware.
The development time on FPGA is longer than on processors because it is very low-level. Simple operations can take long to implement. Software devel-opers will also need time to get used to the diﬀerent development paradigms on FPGA. The debugging process can also be very long, especially if the bug is not visible during simulation, because the synthesis operation takes very long, and access to internal wire values of the FPGA is complicated. A comparison of the development of the same application on FPGA and GPU [CLS+08] shows that FPGA development is longer.
To support the specific features oﬀered, NetFPGA and Combo boards come with a development framework. The NetFPGA framework is open-source and free. Some design choices are diﬀerent between the NetFPGA 1G and the NetF-PGA 10G [ASGM13]. The Combo framework is closed, it is called NetCOPE. It is compatible with all current Combo boards and recently with the NetFPGA 10G [KKZ+11]. Each framework contains the description of the global entity that can be used on the NetFPGA, with the configuration of the connections to all components on the board. The global entity is made to connect to the network interfaces and manage incoming and outgoing traﬃc. An application developer just has to design an entity, which will connect to buses to communicate with the network interfaces, and respect the specified bus protocol. A simulation en-vironment is also provided to simulate the developed entity in a context as close as possible to the actual board. The platform openness is better for NetFPGA because the framework is fully open-source, and open-source projects using the platform are encouraged and listed on their website. But the NetCOPE platform is compatible with more boards. It can be noted too that thanks to the use of VHDL or Verilog on both platforms, and the inherent modularity of hardware description code, migrating from one platform to another is not very diﬃcult. It essentially implies to adapt the bus interfaces in input and output. The existence of an adaptation of the NetCOPE platform for the NetFPGA 10G is an evidence of the similarity of their design [KKZ+11].
The update simplicity of FPGAs is not very good. As the process to gen-erate the bitfile (which is the equivalent of an executable for an FPGA) is very long, even small changes require some time. And as development is more com-plex, the validation phase should be more thorough. The workflow is described more precisely here [RAMV07]. Reconfiguring an FPGA can be made remotely if its reconfiguration interface is connected to a computer on the network. But FPGA implementations can be made much more flexible as it is even possible to configure a programmable CPU on an FPGA, enabling simple software up-dates. For example, this article [LSS+09] implements a specialized CPU made for forwarding traﬃc on a NetFPGA 1G. This is a custom NPU with hardware accelerators specifically designed for the application. Ensuring the security of an application on FPGA is simpler than on a CPU because less external code runs on it. The code that does not belong to the application developers belongs to the used framework. It can be open-source code, but there are also closed-source blocks, called Intellectual Properties (IPs). For these parts, developers have to trust the providers.

DDoS detection algorithms

Depending on the technique used, DDoS detection can be deployed at diﬀerent levels:
At the server level, only one machine is protected. All inbound and outbound traﬃc of the server is watched. This allows very fine-grained monitoring. By knowing the role of the server, a list of expected types of communication can be made, as well as a description of the way each client is expected to behave for each type of communication. If a client behaves in an unexpected way, trying to join unexpected ports or sending an unusual amount of traﬃc, an alarm can be raised. It signals a malfunction or an attack.
An example of such a system is described in this article [PYR+13]. It is located on a router close to the server to protect. It checks all inbound and outbound packets by maintaining state machines. There is one state machine for each communication (for example an Secure SHell (SSH) ses-sion or an HyperText Transfer Protocol (HTTP) file download). If the communication brings the state machine into an unusual state, an alarm is raised.
This technique is very accurate, but it is diﬃcult to manage for network administrators: each legitimate service provided by each server to protect must be listed and precisely analyzed. The task of protecting a server under attack is also very heavy for the router, because it receives all the attack traﬃc targeted to the server, and it has to maintain state machines about this traﬃc. So there is an important risk for the router to fall under the DDoS attack, while trying to protect the server. If this happens, the server will become unreachable, and the attack will have succeeded.
At the access network level, a whole set of machines on the same subnetwork can be protected at once. The main interest of staying near the network border is that attacks are easier to detect. Indeed DDoS attacks come from diﬀerent locations on the Internet. So a router in the core network does not see the whole attack, but only a small part. Other packets use diﬀerent ways to reach their target. Another advantage of the access network is that for each protected machine, both inbound and outbound traﬃc can be watched. As routing in the core network is not symmetric, there is no guarantee that a router will see both inbound and outbound packets.
Techniques made to protect a large number of machines from DDoS attacks monitor the traﬃc in a more global way than at the server level. This makes these approaches more scalable.
At the core network level, the goal is to detect all transiting attacks, the tar-get does not matter. Although attacks are easier to detect in the access network because they are more focused, they are more dangerous too, and more diﬃcult to handle. So detecting and mitigating them in the core network is very interesting, because the load is distributed among many routers. Detection at the core network level requires less capture points but with higher data rates. The detection probes are more complex to design, but the mitigation is simpler at this level.
As the amount of traﬃc transiting in the core network is huge, approaches at this level must be highly scalable. As the attack is distributed, some collaboration between core routers located in diﬀerent places can help detect attacks. It can even be interesting for diﬀerent operators to share data about attacks they detect. This is called inter-domain collaboration. It can be very powerful because operators could act globally. It is also very challenging because operators are usually unwilling to share data about their network with potential competitors. They will not share their number of customers or the details of their transiting traﬃc, but it might be possible to let them share aggregated data if they have strong enough incentives. An example of incentive could be the ability to oﬀer a protection service to their customers.
Depending on the level, and on the wanted features, two categories of detection mechanisms can be considered [AR12]. The first mechanism consists in detecting attacks using signatures that are known a priori. Past attacks are observed and discriminating features are extracted. These features are called signatures, and when they are seen again in the traﬃc, an alarm is raised. This technique does not allow to detect new attacks that never happened, but they have a very low false positive rate, because signatures match only a very specific behaviour, known to be an attack.
The second mechanism consists in finding anomalies in the traﬃc. It learns usual features of the traﬃc when there is no attack, and detects when the features seem to vary in an unusual way. This technique usually detects both attacks and network malfunctions. It is capable of detecting attacks that were not known before, just because they do not resemble normal traﬃc. The drawback is that normal changes in the traﬃc, like a sudden surge in popularity of a server, might trigger alerts [LZL+09].
As attackers are very inventive and try to avoid all detection mechanisms, attacks are not very often repeated the same way, so signature-based mechanisms are not very eﬃcient. This is why the research currently focuses on anomaly detection.

Table of contents :

A Abstract
B Résumé
B.1 Introduction
B.2 Choisir une plateforme de développement
B.3 Surveillance logicielle pour la sécurité
B.4 Surveillance matérielle pour la classification de trafic
B.5 Plateforme de test avec accélération matérielle
B.6 Conclusion
1 Introduction
1.1 Context
1.2 Objectives
1.3 Traffic monitoring
1.3.1 Topology
1.3.2 Time constraints
1.3.3 Traffic features
1.3.4 Detection technique
1.3.5 Calibration
1.4 Acceleration challenges
1.4.1 Large data storage
1.4.2 Test conditions
1.5 Thesis structure
2 Choosing a development platform
2.1 Criteria
2.1.1 Supported data rate
2.1.2 Computation power
2.1.3 Flexibility
2.1.4 Reliability
2.1.5 Security
2.1.6 Platform openness
2.1.7 Development time
2.1.8 Update simplicity
2.1.9 Future scalability
2.1.10 Hardware cost
2.2 Commodity hardware
2.2.1 Handling traffic
2.2.2 CPU computation
2.2.3 GPU computation
2.3 Network processors
2.3.1 Principles
2.3.2 Development platforms
2.3.3 Use cases
2.4 FPGAs
2.4.1 Composition of an FPGA
2.4.2 Boards for traffic monitoring
2.4.3 Development principles
2.5 Conclusion
3 Software monitoring applied to security
3.1 State of the art on DDoS detection implementation
3.1.1 Monitoring platforms
3.1.2 DDoS attacks
3.1.3 DDoS detection algorithms
3.2 Flexible anomaly detection
3.2.1 Problem statement
3.2.2 Algorithm for DDoS detection
3.3 A flexible framework: BlockMon
3.3.1 Principles
3.3.2 Performance mechanisms
3.3.3 Base blocks and compositions
3.4 Implementing DDoS detection in BlockMon
3.4.1 Algorithm libraries
3.4.2 Single-node detector implementation
3.4.3 Alternative compositions
3.5 Results
3.5.1 Accuracy
3.5.2 Performance
3.5.3 Going further
3.6 Conclusion
4 Hardware monitoring applied to traffic classification
4.1 State of the art on traffic classification
4.1.1 Port-based classification
4.1.2 Deep Packet Inspection (DPI)
4.1.3 Statistical classification
4.1.4 Behavioral classification
4.2 Using SVM for traffic classification
4.2.1 Proposed solution
4.2.2 Background on Support Vector Machine (SVM)
4.2.3 Accuracy of the SVM algorithm
4.3 SVM classification implementation
4.3.1 Requirements
4.3.2 The SVM classification algorithm
4.3.3 Parallelism
4.4 Adaptation to hardware
4.4.1 Architecture
4.4.2 Flow reconstruction
4.4.3 The RBF kernel
4.4.4 The CORDIC algorithm
4.4.5 Comparing the two kernels
4.5 Performance of the hardware-accelerated traffic classifier
4.5.1 Synthesis results
4.5.2 Implementation validation
4.6 Conclusion
5 Hardware-accelerated test platform
5.1 State of the art on traffic generation
5.1.1 Traffic models
5.1.2 Commercial generators
5.1.3 Software-based generators
5.1.4 Hardware-accelerated generators
5.2 An open-source FPGA traffic generator
5.2.1 Requirements
5.2.2 Technical constraints
5.2.3 Global specifications
5.3 Software architecture
5.3.1 The configuration interface
5.3.2 The configuration format
5.3.3 The control tool
5.4 Hardware architecture
5.4.1 Main components
5.4.2 Inside the stream generator
5.5 Generator use cases
5.5.1 Design of a new modifier
5.5.2 Synthesis on the FPGA
5.5.3 Performance of the traffic generator
5.6 Conclusion
6 Conclusion
6.1 Main contributions
6.1.1 Development platform
6.1.2 Software monitoring applied to security
6.1.3 Hardware monitoring applied to traffic classification
6.1.4 Hardware-accelerated test platform
6.2 Acceleration solutions comparison
6.3 Perspectives
Glossary
Bibliography