Chapter 3 Method for Securing FPGA Configurations on the Virtex 4 Platform
This chapter outlines the design and implementation of the platform dependent configuration integrity checking core (PDCIC) presented in . In this chapter, the configuration readback controller, hash generator/comparator and challenge-response components in this core are outlined. These components are later extended to support the design of the cross-platform integrity checker presented in this thesis. This chapter is intended to provide insight into the origins of the multi-platform configuration integrity checker‘s design outlined in Chapter 4.
FPGA Configuration Readback Control
The FPGA configuration data, which must be monitored for malicious modifications, is obtained using active readback from the FPGAs internal configuration access port (ICAP). The ICAP module provided by the FPGA manufacturer (in this case Xilinx) provides an interface to the FPGA‘s internal status and configuration registers [10,11,14]. To begin active readback of the FPGA‘s configuration, several commands needed to be written to a number of different configuration registers. These commands specify the parameters of the data that is to be read back. These readback initiation commands correspond to commands 0 though 19 in Table 3.1. Once written, the ICAP will begin to clock out the internal configuration data of the FPGA starting at the address written to the frame address register (FAR) in command 17. The ICAP will then proceed to read back the number of bytes specified by the write to the FDRO register in command 19. Once all the desired configuration data has been read back, the ICAP is shutdown though the use of commands through 25 outlined in Table 3.1.
A finite state machine (FSM) was designed to write the commands shown in Table 3.1, one byte at a time, to the ICAPs input data port. This FSM also is used to process the readback configuration data output by the ICAP on its output data port. The layout of the internal registers written to / read from by these commands can be found in . The operation of this FSM is outlined in Figure 3.1, which was described in .
Detection of Malicious Configuration Modifications
The strategy employed by the PDCIC for detecting when and where a malicious attack on the FPGA‘s configuration has occurred involves continuously computing hash values on blocks of configuration data. As these hash values are computed, they are compared against ―trusted‖ hash values for the same blocks of configuration data which are known to represent the unaltered configuration of the device. Any difference between the two hash values would represent a maliciously altered portion of the FPGA‘s configuration data. To compute these hashes values, first a hash function needed to be selected. This hash function must not only provide sufficient latency and resource utilization characteristics, but excellent resistance against brute force and design-based attacks.
MD5 Hashing Algorithm
The hash algorithm selected for the PDCIC is the MD5 hash computation algorithm. Typically, the MD5 hash algorithm takes an input message of arbitrary length (in 512 bit chunks of data) and produces a fixed length (128 bits in the PDCIC) unique representation of that message. The characteristics one would look for in selecting such an algorithm are speed of computation, low resource requirements and the level of security provided by the algorithm. For hashing algorithms, there are two main security related properties which are desirable. The first is the algorithm‘s ―one-wayness‖ . The second, and in relation to the security of the components in the PDCIC being the most important, is the ability of the hash function to minimize collisions
. A collision is defined to be two distinct sets of input messages that produce the same hash value as a result. The MD5 algorithm exhibits both of these properties, and because it also demonstrates sufficient slice utilization and computation latency characteristics, it was chosen for this design . The MD5 module used in the PDCIC was obtained from . Its operation was controlled by a FSM, whose operation is outlined in Figure 3.2, described in .
Hash Value Granularity Considerations
When considering the granularity at which hash values should be computed, there are several factors that must be considered. These factors include, but are not limited to, the precision of the locality of the attack that can be detected, the overhead of the hash computation algorithm which is incurred and amount of memory consumed by the system.
The smallest granularity that could be selected on the Virtex 4 platform would be an n-length partition of the readback bitstream, where n is the minimum size of an input message to the MD5 hashing algorithm (512 bits in this case) . Choosing such a small granularity would allow one to precisely determine the area(s) of the FPGA‘s configuration that were modified, as well as provide fast configuration restoration times. The price a system designer would pay for choosing such small granularity size would be the huge overhead from hash computations that would be incurred. If this granularity were set at one hash per readback configuration frame, this would also result in a very large memory requirement. On the LX25 device, the readback configuration bitstream contains 6256 frames. As a result, all 6256 of the corresponding 128-bit trusted hash values for these frames would have to be stored in memory. Conversely, choosing the largest possible granularity size would result in one hash value that would represent the entire CLB section of the readback bitstream. Such a granularity would minimize the overhead of hash computations and reduce the amount of memory needed to store the resulting hash values. Unfortunately, it would also make run-time repair of the malicious alterations difficult to perform. This is because the entire FPGA would have to be reconfigured to correct malicious alterations, even if they are as small as a few bits. A granularity this large would also make it difficult to determine exactly which portion of the FPGA is being attacked. This would prevent the systems designer from making design alterations to combat attacks that are being employed. To achieve a middle ground, the the PDCIC set the granularity at a block of 20 frames for the Virtex 4 platform. This corresponds to 1 block of 16 CLBs per hash, resulting in 168 hashes being used to represent the entire CLB configuration.
Attack Locality Determination
When a malicious alteration of the FPGA‘s configuration is detected by the hash comparator of the PDCIC, the number of the hash containing altered configuration data is output. In order to identify the location relative to layout of the FPGA which was attacked, it is necessary to understand how the hash values produced correspond to physical portions of the FPGA. From Figure B.4, it can be seen that frame addresses increase from left to right and bottom to top on the FPGA. As a result, frames are readback starting at the lower left corner of the FPGA and ending in the upper right corner. It would then follow that the index of resulting hash values would increase in the same manner. The layout of the physical dimensions of the chip, in proportion to hash value indices, is outlined in Figure B.5.
Ensuring Reliable Operation
An attacker with the ability to maliciously alter the configuration data of an FPGA can potentially disrupt the operation of the PDCICs components, and thereby leave the configuration of the FPGA and any design it contains exposed. Thus, a method was put in place that allows an entity external to the FPGA to poll the health of the configuration integrity checker. The method selected was a classic challenge response subsystem. In the challenge-response component of the PDCIC, a challenge is issued from an external entity containing an input message of arbitrary length prepended with the length of the message. This message is then concatenated with a secret key to which only the entity issuing the challenge and the challenge-response subsystem have access. The message formed by this concatenation is then processed by the MD5 hash function, and its result returned in the form of a response to the challenger. In the PDCIC, the secret key used by the challenge-response subsystem was selected to be a hash of all 168 current hash values. This requires the PDCIC to store 168 128-bit words in the form of the current hash values. In addition, the 168 trusted hash values are also stored. Storing both the current and trusted hash values not only provides the challenger the ability to poll the systems health, but also allows this external entity to determine if the current FPGA configuration has been altered.
This is possible because the secret key used in forming the response is a hash of the 168 current hash values. Any divergence between these values and the trusted hash values would result in a different secret key being produced. This would then result in a different response, which is received by the challenger. An outline of this system can be seen in Figure 3.3, which was taken from .
Coverage of Future Work
In , several areas of potential future work to improve the design of the PDCIC were outlined. These areas include improving upon the initial conditions the system is dependent upon, increasing the frequency at which the system operates, and increasing the portability/parameterizability of the system. The work presented in this thesis focuses primarily on increasing the portability and parameterizability of the components that were inherited from the PDCIC. However, adjustments to conditional stipulations put on portions of the design, such as the challenge-response subsystem, were addressed as well.
Multi-Platform Configuration Integrity Checker Design and Implementation
This chapter describes the design methodologies used in developing the components of the cross-platform configuration integrity checker presented in this thesis. First, the dynamic data identification and masking strategy is outlined. The methodology for automating this process and extending it across multiple FPGA platforms is then presented. Next, the approach used to extending the configuration readback, hash generation/comparison and challenge-response components of the design, outlined in , across multiple platforms is described. The chapter is concluded with a discussion of the design and implementation of the serial I/O subsystem employed by the cross-platform integrity checker.
Dynamic Data Identification and Masking
As with any hardware design, a design running on an FPGA contains two parts, a static portion and a dynamic portion. The dynamic portion of a design contains values that are constantly changing according to the operation of the static portion of the design. As a result, this dynamic portion of the design must not be considered when checking the design‘s readback configuration for malicious alterations. If these dynamic portions of the configuration are considered, upon detection of a change in the design‘s configuration, it would be extremely difficult to determine the source of the alteration. Moreover, it would then be impossible to tell if the configuration of the design changed due to these dynamic components or due to a malicious attack. Once these dynamic portions of a design are masked from the configuration data being monitored for malicious attacks, one can reliably determine when a design‘s configuration data has been maliciously altered.
Dynamic Data Identification Strategy
To determine where dynamic data is located in the configuration bitstream of a Xilinx FPGA, an approach was developed which takes advantage of Xilinx logic allocation (.ll) files. These logic allocation files can be generated using the Xilinx Bitgen software (with the ―-l‖ option specified)
. This software is typically used to take a user‘s design that has been synthesized, mapped, and routed, and generate a bitfile (.bit), which can be used to configure the target FPGA with the
user‘s design . Logic allocation files provide the frame address and bitstream offsets of all data that is considered to be dynamic in FPGA bitstreams. As a result, they are a critical component in the dynamic data identification strategy outlined.
In the method developed, a design was created which occupies one column of combinational logic blocks (CLBs) on the FPGA being mapped for dynamic data. This design was developed in such a way as to occupy all flip-flop resources in this CLB column. Because these resources are utilized, their relative locations, and therefore the locations of all dynamic data in this column, will be displayed in the resulting logic allocation file. This provides the information necessary to appropriately mask out all possible dynamic data locations in this column of CLBs. It should be noted that not all flip-flops in every column are utilized as dynamic data in every design, resulting in small static portions of the configuration that are not included in hash computations. When these flip-flop locations are left unutilized, it typically means that they are not a part of the design being implemented, and therefore it is a valid assumption that leaving them unprotected by the configuration integrity checker does not present a security threat to the system.
In this strategy it is assumed that LUTs in the column be analyzed are not configured to act as RAM modules. If this assumption were not present, the bitstream locations of the dynamic data in these modules would need to be determined as well. This would be advantageous if a dynamic data masking strategy custom to a particular design was being developed. However, it is not advantageous to mask every possible location of this data in the design-independent version of the dynamic data masking strategy. Even if LUTs contained in each CLB column had not been configured as portions of a RAM module, they would still be masked using this approach. This would result in large static portions of the FPGAs configuration that are not included in the hash value computation for each block of hash data. Because this data is not considered in these computations, these static portions of the design contained in the configuration would be left unprotected from malicious attacks.
After the location of all dynamic data in this particular column was determined, the design was iteratively constrained to occupy each CLB column on the FPGA being mapped. The resulting absolute frame addresses, as well as relative (to the frame being addressed) bitstream offsets were copied from the generated logic allocation file for each column. The frame addresses, which correspond to locations of the dynamic data in each CLB column, are by default in the Xilinx specific format used to read and write to/from the FPGA‘s frame address register. An example of this format for the Xilinx Virtex 4 family is shown in Figure 4.1.
The frame address and bitstream offsets obtained from the method outlined in Section 4.1.1 were then be converted to frame addresses relative to configuration bitstream that is read back from the FPGA‘s ICAP. The conversion method performed is outlined in Equation 4.1. This conversion process was automated using lightweight software. A detailed description of this automation process is contained in Section 4.2.2. An example of both the Xilinx formatted frame addresses and resulting relative frame addresses for the LX25 device on the Virtex 4 platform can be seen in Figures B.1 and B.2, respectively. (4.1)
Once all the relative frame addresses that contain dynamic data were obtained, all that was needed to be done was to find the bit locations inside these frames that corresponded to bits of dynamic data that must be masked out. These bits are provided in the logic allocation file, and form regular patterns inside frames that contain dynamic data. These patterns are uniform inside each platform, and only differ in length for platforms such as the Virtex II and II Pro platforms that do not have uniform frame lengths. To demonstrate the difference in bit patterns across multiple platforms, examples of these bits that must be masked for both the Virtex II XC2VP30 and Virtex 4 LX25 devices can be seen in Figures A.2 and B.3, respectively.
After all frame addresses relative to the FPGAs readback configuration bitstream were determined, a FSM was created to mask the corresponding data. This FSM systematically determined when data that was being read back belongs to a frame that must be masked. If needed, this dynamic data was then selectively masked from the configuration data that was being read back before being used in hash value computation.
GET THE COMPLETE PROJECT