Integrated Circuit and Logic Design
In 1948, the most significant step in modern electronics happened with the invention of the transis-tor by Bell Laboratories. Together with the solid-state diode, the transistor opened the door to microelec-tronics. The term microelectronics itself is defined as the area of technology applied to the realization of electronic systems made of extremely small electronic parts or components, and it is often related to the term integrated circuits (IC).
Gordon Moore, co-founder of Intel, published an article in 1965 [Moo00] stating that the number of components per integrated circuit would double approximately every two years. This observation is broadly known as Moore’s Law. This law proved valid since 1965 and, equally importantly, has guided the microelectronics industry’s research and development.
Many other engineering fields involve tradeoffs between performance, power and price. Nevertheless, as digital circuit technology nodes shrink, transistors become faster, consume less power and are typ-ically cheaper to manufacture [WH10]. This unique characteristic of integrated circuits has not only revolutionized electronics, but also shaped our lives. The fast-growing pattern of integrated circuits is depicted in Table 2.1.
More than 80% of the microelectronics industry is composed by digital circuits [Raz06]. Commercially-wise, the majority of the cryptographic systems are made of digital chips. This chapter draws the line from the basic transistor and its power characteristics until the theory behind logic gates that constitute a cryptosystem.
Digital VLSI design is commonly divided into five abstraction levels [WH10]:
— architecture design, that describes how the system operates;
— microarchitecture design, that details how the system architecture is partitioned into registers and functional units;
— logic design, that specifies how functional units and sub-modules are constructed
— circuit design, that describes how transistors are used to implement the logic; and
— physical design, that contains the final layout of the chip.
An alternative way of seeing the digital design partitioning is shown in Fig. 2.1. The radial lines on the -chart represent three design domains: behavioral, structural and physical. These domains can be used to describe the different phases of a digital design that create a level of design abstraction that starts at a very high level and descend eventually to the individual components that form the top level abstraction.
The Behavioral Domain describes the system’s functionality and contains static and dynamic compo-nents. The static component describes the operations, while the dynamic portion describes their se-quencing and timing. Thus, differences in algebraic functionality, pipelining, and timing are all changes in behavior. For example, at the Functional Block Level, the Behavioral Domain describes the design in terms of register transfers, although the behavioral description might also contain timing information.
The Structural Domain describes the design’s abstract implementation, usually represented by a struc-tural interconnection of a set of abstract modules. For example, at the Functional Block Level, the Structural Domain describes the design in terms of ALUs, MUXes, and registers. Nevertheless, this domain also describes the digital system’s control logic portion.
The Physical Domain describes the design’s physical implementation, changing the abstract structural components into physical components. The domain deals with constraints on the physical partitioning and design’s layout, as well as physical component geometry. As an example, at the Functional Block Level, the Physical Domain describes the floorplanning that the circuit must have in order to implement the required logic from upper level descriptions. In this domain, speed, power and area constraints are measured and better evaluated than in higher domains.
The growing adoption of higher level of abstraction and complexity of a digital design pushed academia and industry to create Hardware Description Language (HDL) tools to describe both the behavior and the structure of the design, including physical and geometrical information as well [WT85].
Another way of structuring the different types of integrated circuit design is by categorizing according to the design methodology (also known as design styles) [GBC+96]. These styles are primarily based on the viability of the IC design and its target application. The following parameters typically define the choice of a particular design style [WH10]:
— performance – speed, power, system flexibility;
— size of die (hence, die cost);
— time to design (hence, engineering cost);
— ease of verification and test generation (hence, engineering cost).
Since the whole IC design process is complex and long, VLSI designers have to choose the most suitable design style to match its requirements and specifications. The use of design constraints as listed above helps the designer to opt for the best silicon approach. The design methodology of a digital VLSI is commonly divided in two main categories: Standard Circuit and Application-Specific Integrated Circuit (ASIC). The most important (and commonly used) design styles are listed as a tree in Fig. 2.2.
For several reasons, the best approach to solve a system design problem might be to use a standard architecture, such as a microprocessor or digital signal processor (DSP). Market solutions with built-in RAM or EPROM are broadly available in the market. Microprocessors provide a great level of flexibil-ity, basically transforming the hardware into software design. This methodology is called hardwired and represents one type of standard circuit design style. The second standard circuit type is called mask pro-grammable logic and represents the type of read-only memory whose contents are programmed by the IC manufacturer rather than the VLSI designer. This design style typically uses rewritable non-volatile memory (such as EPROM) for the project’s development phase, switching to the masked version of the design when the code is finalized.
Often, the cost, performance or power consumption of a microprocessor do not achieve the system’s requirement. The next VLSI circuit family that meets a more dense, faster circuit is the application-specific programmable logic. In this family, the most important design style is represented by the Programmable Logic Array (PLA). This type of chip implements two-level sum-of-product programmable logic arrays with limited routing capability, consisting of an AND and an OR logic gate to compute any function expressed as a sum of products. Any transistor of the AND and the OR gates are capable of being programmed to be present or not. Besides that, a PLA cell contains a NOR structure programmable by a floating-gate transistor, a fusible link, or a RAM-controlled transistor.
The second type of application-specific programmable logic family is the Field Programmable Gate Arrays (FPGA). FPGAs are a completely reprogrammable IC even after the chip is shipped to final customer. They consist of an array of logic cells, or blocks, surrounded by programmable routing resources. Al-though first generation FPGAs used fuses or anti-fuses to program the internal blocks and personalize their logic, second generation FPGAs use static RAM or flash memory to configure the routing and logic functions. The first generation is one-time programmable, while the latest FPGA family allows reprogrammability.
Second generation FPGAs are composed of configurable logic blocks (CLB). Among consecutive CLBs, metal tracks are placed vertically and horizontally, forming the configurable routing paths between blocks. CLBs use programmable lookup tables to compute any logic function of several inputs and outputs. Configurable I/O cells surround the core array of CLBs. The main benefit of choosing an FPGA is that it provides the latest technology node with millions of logic gates that easily operate at rates of gigabits per second. They frequently embed extra components for system integration, such as microprocessors, external memory, and hardware accelerators.
The logic styles presented so far do not require a fabrication run. Instead, the VLSI designer can focus on design only, diminishing the non-recurring engineering (NRE) cost. Another way of doing so is to make use of a semi-custom logic called gate-array (or sea-of-gates). This method consists of constructing a chip with a common base array of nMOS and pMOS transistors and later personalizing it by altering the metallization (metal and via masks) placed on top of the transistors. Differently than logic array style, the gate-array logic is reprogrammable (like FPGAs). This allows the VLSI designer to correct logic errors and avoid process variability that might cause late bugs and delay the project.
Standard cell logic style represents another semi-custom IC type. In this design, a technology IC digital library provided by a foundry or library vendors is available as the basic building blocks of the chip. Also called cell-based design, this design methodology achieves smaller, faster and low-power chips than FPGAs, but incurs higher NRE costs to produce the custom mask set. Standard cell design is therefore only suitable for high volume fabrication.
Finally, a full-custom design is another type of application-specific produced IC, where all circuit details have to be designed, all transistors have to be dimensioned and the layout have to be carefully drawn and verified by an analog simulator afterwards. In other words, a full-custom design and production are fully controlled by the VLSI designer. Even though it allows optimal results, the design effort is highly prohibitive for large ICs. A classical example of a full-custom design is the commercial microprocessor, in which top-notch performance is a requirement.
Given all the IC families described above, a VLSI design choice must be based on the most cost-effective approach taking into account speed, power and area figures. If off-the-shelf solutions (microprocessors, microcontrollers, for example) meet the designer’s requirement targets, they should always be pre-ferred. If not, FPGA is the next logic candidate, especially for low-volume ( 100 000) applications. When the production volume is high enough to justify the costs, or when low power is at stake, stan-dard cell design is the preferable option. Mixed-signal, radio-frequency (RF), and high speed digital designs are examples of IC that require a cell-based design approach. Table 2.2 summarizes the design methodologies cost in terms of several different criteria, from low to very high.
The CMOS Transistor
A transistor can be viewed as a device with 3 terminals, or a 2-input/1-output switch. One input, the control, acts by enabling or disabling the current transfer from the other input to the output, depend-ing on the voltage applied to the control terminal. Early integrated circuits first adopted the bipolar junction transistor, developed by Bell Labs. The next generation came to production in 1960, replac-ing the bipolar transistor: the Metal Oxide Semiconductor Field Effect Transistor (MOSFET). MOSFETs have three major advantages:
— draining almost zero current while idle;
— their miniaturization could be more easily achieved; and
— consuming only nanowatts of power, six orders of magnitude less than their bipolar counter-parts.
MOSFETs belong to two sub-types, differing by the silicon substrate type: n-type (called nMOS transis-tors) and p-type (called pMOS transistors). That is the reason that they became to be named Comple-mentary Metal Oxide Semiconductors, or CMOS.
Each CMOS transistor consists of a conducting gate, an insulating layer of silicon dioxide (SiO2, also called glass) and the silicon substrate, also known as body or bulk, that is usually grounded. An nMOS transistor, depicted in Fig. 2.3a, is built on a p-type body and has regions of n-type semiconductor ad-jacent to the gate called the source and drain. A pMOS transistor, as shown in Fig. 2.3b, is the opposite, having p-type source and drain regions on a n-type body. In a CMOS technology where both transistor types are present, the substrate is either n- or p-type. To build the other transistor type, a special well with dopant atoms has to be added to form the opposite body type. The + and + regions indicate heavily doped n- or p-type silicon.
A voltage applied to the gate of the CMOS device controls the current flowing between source and drain. More specifically, the nMOS transistor creates a current flow from source to drain when a positive voltage is applied to the gate. Otherwise it acts as an open switch, i.e., no current flows through the two opposite terminals, and the nMOS transistor is OFF. The gate functions as a control input: it acts on the electrical current flow between the source and drain. As the gate voltage raises, it creates an electric field that attracts free electrons to the underside of the Si–SiO2 interface. If the voltage raises enough, the electrons outnumber the holes and a thin region between the two terminal is created, transformed into an n-type semiconductor. A conducting path of electrons flow from source to drain, and we say that the transistor is ON.
In the pMOS, the behavior is the again opposite. By applying a positive voltage, the body is held reverse-biased and no channel is created, therefore the transistor remains OFF. When the gate voltage is lowered, positive charges are attracted to the underside of the Si–SiO2 interface. A sufficiently low gate voltage inverts the pMOS channel and a conducting channel is formed from source to drain, and the transistor is ON.
The voltage value at which a conducting channel is created between source and gate is called the thresh-old voltage. The positive voltage is usually called DD or POWER and its value varies depending on the technology library. In popular logic families of the 1970s and 1980s, DD was usually set to 5 volts, whereas nowadays voltages around 1 volt predominate. The low voltage is called GROUND or SS and its value is usually 0 volts. Bringing it to the digital world, DD is referred to a logic (or bit) 1, and SS represents the logic 0. Fig. 2.4 (where g, s, d stand for gate, source and drain, respectively) shows the symbol of each CMOS transistor and their “switch” behavior depending on the voltage applied to the gate.
Basic and complex CMOS logic gates are constructed with two networks: the upper part, called pull-up network, connects to the power supply ( DD); the lower part, called pull-down network, connects to ground ( SS). Both are connected together to the output of the logic gate. For the logic to work as expected, the pull-up network is composed by pMOS transistors only, while the pull-down counterpart contains only nMOS transistors. Fig. 2.5 depicts the general schematic of a general CMOS logic gate.
The pull-up and pull-down networks in the inverter each consist of a single transistor. The NAND gate uses a series pull-down network and a parallel pull-up network. More elaborate networks are used for more complex gates. Two or more transistors in series are ON only if all of the series transistors are ON. Two or more transistors in parallel are ON if any of the parallel transistors are ON. By using combinations of these constructions, CMOS combinational gates can be constructed.
In general, when we join a pull-up network to a pull-down network to form a logic gate as shown in Fig. 2.5, they both will attempt to exert a logic level at the output. From this table it can be seen that the output of a CMOS logic gate can be in four states. The 1 and 0 levels have been encountered with the inverter and NAND gates, where either the pull-up or pull-down is OFF and the other structure is ON. When both pull-up and pull-down are OFF, the high-impedance or floating Z output state results. This is of importance in multiplexers, memory elements, and tri-state bus drivers. The crowbarred (or contention) X level exists when both pull-up and pull-down are simultaneously turned ON. Contention between the two networks results in an indeterminate output level and dissipates static power. It is usually an unwanted condition.
The simplest form of a CMOS logic circuit is an inverter, that represents the logic formula out = in that consists of a pMOS pull-up transistor and an nMOS pull-down transistor, as depicted in Fig. 2.6.
When the input voltage in = 0, the gate of the p-channel transistor is at DD below the source potential DD , i.e., the pMOS is ON. No current flows through the n-channel transistor, that is OFF. If we increase in to the threshold voltage and then to DD, the operation reverts: the n-channel transistor will conduct while the p-channel is now OFF. It is easy to note that the output of the CMOS inverter will always get the opposite voltage of in.
The NAND gate is an important metric when evaluating the actual complexity of a design. A common digital integrated circuit design complexity factor is measured by number of logic gates, that eventually reflect the size of the overall circuit. The problem arises from the fact that measuring an ASIC circuit area makes it difficult to compare to another ASIC design targeted to a different technology node. Even among FPGA implementations, the internal FPGA’s look-up tables and surrounding logic are shrunk to a given process technology. The size of the NAND gate at a given technology gives a technology-independent way of measuring the design complexity, and also a fair comparison between two circuit designs. Instead of comparing logic instances, the total circuit area (technology-dependent) is divided by the smallest 2-input NAND gate of the digital IC library. The result can be fairly compared to any other circuit. Some engineers prefer to use the total circuit area (logic + routing), while some others do not use routing information as it can vary from one CAD tool to another. Besides that, a fair comparison is achieved by comparing ASIC designs and FPGA counterparts separately.
The flexibility of CMOS logic is such that a compound gate can be created from different connec-tions and arrangements between the pull-up and pull-down transistors. This allows more complex logic gates to be tailored to the best possible circuit layout. If a certain logic pattern is highly likely to be used for a digital design, the digital library may incorporate this logic into a basic digital cell. For example, the derivation of the circuit for the function = ( ) + ( ) is shown in Fig. 2.8. This function is called AND-OR-INVERT-22 (AOI22) and it is usually provided by commercial digital library foundries. It performs the NOR of a pair of 2-input NANDs.
In addition to the logic levels 0 and 1, it is possible to construct a CMOS logic gate that has a third logic value, denoted Z. The enable input EN controls whether the primary input is passed to the output or not. If EN = 1, the gate acts as a normal buffer. If EN = 0, the input is effectively disconnected from the circuit, leaving the output of the gate “floating”. A tri-state buffer is usually connected to a bus that allows multiple signals to travel along the same connection, as long as exactly one buffer is enabled at a time. When disconnected, the tri-state buffer output holds neither logic 0 or 1, but instead gives a state of very high impedance. As a result, no current is drawn from the power supply.
A logical first approach of a tri-state buffer is represented in Fig. 2.9 as a transmission gate. It is as simple as two CMOS transistors, but it is imperfect: when EN = 1, the output receives the input, however the signal is not restored. If the input is noisy or degraded, the output will receive the same noise, which should be avoided. If several of these gates are along the same logic path, the overall path delay highly increases.
CMOS I-V Characteristics
A signal strength is defined as a measure of how closely the signal approximates an ideal voltage source, i.e., the stronger the signal, the more it is able to drive higher current. In the CMOS technology, the strongest signals are DD (logic 1) and SS (logic 0).
By analyzing the two basic CMOS components, the pMOS and the nMOS transistors, we can find the best transmitter. In fact, they are both good and bad voltage sources. This comes form the fact that the nMOS transistors are an almost perfect switch when transmitting a logic 0, but imperfect at passing a logic 1. Therefore we say that the nMOS transistor passes a strong 0, but a weak 1. The situation is the opposite for the pMOS transistor. The pMOS is capable to be an optimal source of a strong logic 1, but passes a weak logic 0.
In all our examples so far, the CMOS logic gates are composed of nMOS transistors in the pull-down network and pMOS transistors in the pull-up network. Therefore, nMOS components only have to pass logic 0 to the output, and pMOS components only pass the logic 1, so the output is strongly driven. This considerably simplifies the design layout compared to other forms of logic, where the pull-up and the pull-down switch networks have to be ratioed. Instead, CMOS gates operate in the same manner, no matter the technology node size. As a drawback, the design of a CMOS component must be inverting – the nMOS pull-down network turns ON when inputs are at logic 1, delivering logic 0 to the output.
Nevertheless, the characteristics that made the static CMOS technology extremely successful is undoubt-edly electrical – because there is never a path through ON transistors from DD to SS (in contrast to technologies such as single-channel MOS, GaAs and bipolar), CMOS gates dissipate very low static power.
CMOS Electrical Properties
The CMOS transistor is a majority-carrier component in which the gate controls the current flowing through a conducting channel from the source to the drain. In an nMOS transistor, the majority carriers are electrons; in a pMOS transistor, the majority carriers are holes. Between the gate pin and the tran-sistor’s doped silicon body there is a thin layer of insulating film of SiO2 called the gate oxide. The most important property of the gate oxide is that it is a very good insulator, so almost zero current flows from the gate to the body.
While the nMOS source and drain have free electrons, its body contains free holes but no electrons. Applying a gate-to-source voltage gs that is less than the threshold voltage t, the junctions between body and source or drain are zero-biased or reverse-biased, therefore little or no current flows. This transistor state is called cutoff. Analyzing only one single component, the small current flowing through a cutoff transistor is insignificant, but can become significant if we sum up to several million transistors on a chip. Moreover, the smaller the technology node is, the more predominant the leakage is.
When gs becomes bigger than t, an inversion region of electrons creates a conductive connection between source and drain. Now a small difference between the drain voltage ds and source voltage gd makes the current ds to flow through the channel. This state is called linear, resistive, triode, nonsaturated or unsaturated. The current increases with both the gate voltage and the drain voltage. In this mode, the transistor acts as a linear resistor in which the current flow is proportional to ds.
Table of contents :
1.2 Hardware System Security
1.3 Notations and Conventions
1.4 Finite Fields Arithmetic
1.5 Thesis Outline
2 From a Transistor to a Cryptosystem
2.1 Integrated Circuit and Logic Design
2.1.2 VLSI Design
2.1.3 The CMOS Transistor
2.1.4 CMOS Logic
126.96.36.199 The Inverter
188.8.131.52 The NAND Gate
184.108.40.206 Compound Gates
220.127.116.11 Tri-state Buffers
2.1.5 CMOS I-V Characteristics
18.104.22.168 CMOS Electrical Properties
22.214.171.124 Non-Ideal I-V Effects
2.2 Hardware-Based Cryptosystems
2.2.3 Hardware Design Architecture
126.96.36.199 Throughput and Latency
2.2.4 Cryptographic Hardware Design
188.8.131.52 Iterative Looping
184.108.40.206 Loop Unrolling
220.127.116.11 Pseudo-Random Sequences in Hardware
2.3 Private-Key Cryptosystems
2.3.1 The Data Encryption Standard
2.3.2 The Advanced Encryption Standard
18.104.22.168 AES Rounds
22.214.171.124 AES in Hardware (FPGA and ASIC)
2.4 Cryptographic Hash Functions
2.4.2 Security Requirements of Hash Functions
126.96.36.199 Preimage Resistance
188.8.131.52 Second Preimage Resistance
184.108.40.206 Collision Resistance
220.127.116.11 Overview of Hash Algorithms
2.4.3 The Secure Hash Algorithm 1
2.4.4 The Secure Hash Algorithm 2
2.4.5 Implementation Tradeoffs and Design Methodologies
2.4.6 Known SHA-2 Hardware Optimization Techniques
2.4.7 FPGA-Based Cryptography
2.4.8 SHA-2 in Hardware (FPGA and ASIC)
3 Cryptographic Hardware Acceleration and Power Minimization
3.1 BCH with Barrett Polynomial Reduction
3.1.2 Barrett’s Reduction Algorithm
18.104.22.168 Dynamic Constant Scaling
3.1.3 Barrett’s Algorithm for Polynomials
22.214.171.124 Polynomial Barrett Complexity
126.96.36.199 Barrett’s Algorithm for Multivariate Polynomials
188.8.131.52 Dynamic Constant Scaling in Q[⃗x]
3.1.4 Application to BCH Codes
184.108.40.206 General Remarks
220.127.116.11 BCH Preliminaries
18.104.22.168 BCH Decoding
22.214.171.124 Error Location
126.96.36.199 Peterson’s Algorithm
188.8.131.52 Chien’s Error Search
3.1.5 Implementation and Results
184.108.40.206 Standard Architecture
220.127.116.11 LFSR and Improved LFSR Architectures
18.104.22.168 Barrett Architecture (regular and pipelined)
3.2 Managing Energy on SoCs and Embedded Systems
3.2.2 The Model
3.2.3 Optimizing Power Consumption While Avoiding System Malfunction
3.2.4 The General Case
3.2.5 Probabilistic Strategies
4 Side-Channel Attacks and Hardware Countermeasures
4.1 An Economical Introduction to Side-Channel Attacks
4.2 Differential Cryptanalysis
4.3 Differential Power Analysis
4.4 Power Scrambling and the Reconfigurable AES
4.4.2 The Proposed AES Design
4.4.3 Energy and Security
22.214.171.124 Power Analysis
126.96.36.199 Power Scrambling
188.8.131.52 Transient Fault Detection
184.108.40.206 Permanent Fault Detection
220.127.116.11 Runtime Configurability
4.4.4 Halving the Memory Required for AES Decryption
4.4.5 Implementation Results
4.5 Cryptographically Secure On-Chip Firewalling
4.5.2 Identifying Attack Surfaces on NoCs
18.104.22.168 Request Path
22.214.171.124 Firewall Reprogramming Path
126.96.36.199 Firewall State at Rest
4.5.3 Integration of Security Resources into an SoC
188.8.131.52 Securing the Request Path
184.108.40.206 Securing the Firewall
4.5.4 Access Control Firewalling to On-Chip Resources
220.127.116.11 Endpoint versus NoC Firewalling
18.104.22.168 Cryptographically Secure Access Control
22.214.171.124 CSAC Synthesis Results
126.96.36.199 FPGA Implementation
4.6 Practical Instantaneous Frequency Analysis Experiments
188.8.131.52 The Hilbert Huang Transform
184.108.40.206 AES Hardware Implementation
4.6.3 Hilbert Huang Transform and Frequency Leakage
220.127.116.11 Why Should Instantaneous Frequency Variations Leak Information?
18.104.22.168 Power consumption of one AES round
22.214.171.124 Hilbert Huang Transform of an AES Power Consumption Signal
4.6.4 Correlation Instantaneous Frequency Analysis
126.96.36.199 Correlation Instantaneous Frequency Analysis on Unprotected Hardware
188.8.131.52 Correlation Instantaneous Frequency Analysis in the Presence of DVS 5 Zero-Knowledge Protocols and Authenticated Encryption
5.1 Public-Key Based Lightweight Swarm Authentication
184.108.40.206 Fiat-Shamir Authentication Protocol
220.127.116.11 Topology-Aware Distributed Spanning Trees
5.1.3 Distributed Fiat-Shamir Authentication
18.104.22.168 The Approach
22.214.171.124 Back-up Authentication
126.96.36.199 Security Analysis
5.2 The Offset Merkle-Damgård Authenticated Cipher
188.8.131.52 Security Definitions and Goals
184.108.40.206 Quantitative Security Level of OMD-SHA256
220.127.116.11 Quantitative Security Level of OMD-SHA512
18.104.22.168 Security Proofs
22.214.171.124 Generalization of OMD Based on Tweakable Random Functions
126.96.36.199 Instantiating Tweakable RFs with PRFs
5.2.3 Specification of OMD
188.8.131.52 The OMD Mode of Operation
184.108.40.206 OMD-SHA256: Primary Recommendation for Instantiating OMD
220.127.116.11 OMD-SHA512: Secondary Recommendation for Instantiating OMD
18.104.22.168 Compression Functions of SHA-256 and SHA-512
A Code: Barrett’s Algorithm for Polynomials
B Compression Functions
B.1 Compression Functions of SHA-256 and SHA-512
B.1.1 The Compression Function of SHA-256
B.1.2 The Compression Function of SHA-512