Get Complete Project Material File(s) Now! »
&+$37(5 /HDI &HOO /D\RXW
The main responsibility of a SRAM generator is to instantiate basic components in an array, for the given size. The basic components, called leaf cells, are critical in determining the final performance of the generated SRAM circuit. Therefore, leaf cell design must be optimized both locally and globally for area, power, and speed. Whenever possible, the leaf cell layout must use cell abutting. This technique helps reduce unnecessary routing by simply placing adjacent cells close to each other.
In this chapter, we will discuss the design and layout of basic components in the RAM architecture. The basic components (as discussed in Chapter 2) are as follows:
6-transistor core of SRAM
Bit-line conditioning circuit
Before we discuss each component in detail, we review the overall SRAM structure. Figure 3.1 shows a block diagram of an SRAM and connections of basic components. A RAM cell is connected to two bit lines through word-select pass transistors. Since the pass-transistors used are NFET, they are slow when pulling a line up to logic ‘1’. Charging the bit-lines to a high value reduces the time it takes the pass-transistors to pull-up. Though charging the bit-lines causes a speed-up in access time, it degrades the bit-line signal difference. Therefore, a sense-amplifier is needed to increase the difference and provide a good data output during a read.
Three decoders are activated or deactivated by three active-low control signals – CEN’, WEN’, and OEN’. The CEN’ is used to indicate that the SRAM is currently being accessed and controls the word-line (row) decoders. Thus when the CEN’ is off (high), none of the word-lines are on. Likewise, the WEN’ signal, which specifies that the SRAM is being written to, controls the write-line decoder. Similarly, the OEN’ signal indicates that the SRAM being read from and controls the read-line decoder.
Precharge circuits and sense amplifiers are the other two major components, as shown in Figure 3.1.
From Figure 3.1 it can be seen that there exists two sepearte column decoders – one for read operations and one for write operations. Though a single column address decoder can be used for both read and write operations, we use two separate decoders for read and write operations. The reason for using two column decoders is that the use of two decoders reduces the delay incurred due to the routing from a single decoder. In addition, the actual area of the decoders is small, while routing area is significant.
The 6-transistor (6T) SRAM core shown in Figure 3.2 stores one bit of data. It is composed of a latch and 2 pass transistors. Since the core is replicated by the number of bits, optimum design and layout of this component is critical. The size of the transistors used is the primary factor that determines the performance of the SRAM cell. We determine the optimum transistor sizes through SPICE simulation. Since the most important design criterion for us is power dissipation, we minimize the sizing as much as possible without compromising performance significantly.
There are some issues to be considered when sizing the transistors. The latch inverters (M1, M2, M3, and M4) form a positive feedback loop, so that the stored value is maintained as long as power is available. Since the bit lines are precharged to VDD-Vtn, the cell NFETs (M1 and M3) cannot be smaller than the pass NFETs (M5 and M6) to overcome the current value on the bit line when pulling it to a low value. Note that though a transmission gate may be used for the pass-transistors, only NFETs are used so that the area for a single SRAM cell may be small. It will be shown later that special circuitry (bit-line conditioning and sense amplifiers) is needed to recover from the performance losses due to using just NFETs.
In an array of RAM cells, a single word line is connected to an entire row of RAM cells, forming a long word-line row. Since the word line uses polysilicon (which has high resistivity), it is necessary to keep the two pass transistors (M5 and M6) small. This improves signal integrity on the word lines and reduces power dissipation. Therefore, we keep the size small.
We set all transistor lengths to the minimum, which is 2λ (= 0.35 µ m) for the target 0.35 µ m process. Based on simulation, we set the widths of all transistors to 0.4 µ m, the minimum width for the target process.
The next step is to lay-out the leaf cell. The schematic diagram corresponding to the placement of transistors and the layout for an SRAM cell are given in Figure 3.3. The placement of the transistors is intended for cell abutting.
Note that all the I/O signals of the cell (word, bit+, bit-, VDD, and GND) use abutting. The layout allows both horizontal and vertical cell abutting. Vertically, the cell above this one will be flipped on the x-axis so that the n-well and VDD lines are shared. Similarly, the cell below will also be mirrored on the x-axis so that the n-diffusion and bit lines can be shared. This overlap of layers makes the layout more compact.
Figure 3.4 shows the schematic and layout of the bit-line conditioning circuit. The gates of the NFETs are tied to Vdd, so that the bit-line conditioning circuit is always turned on. This avoids the complexity of generating a precharging signal. It also allows the bit-lines to be equalized when the column is deselected (i.e., between two access cycles). The bit-lines get equalized to the charge value of Vdd-Vtn between two accesses, when the memory array is deselected. When two RAM cells containing opposite value in the same columns are accessed subsequently, the output has to switch first to an equalized state and then to the opposite logic state. Since the capacitance on the bit lines is quite large, the time required for switching the differential from one state to the other becomes a significant portion of the overall access time. Equalization of the bit-lines between the accesses can reduce the access time. The size of the charge transistors must be as small as possible, so that they do not override the value in the latch during read and write operations. Simulation showed that the charge transistors performed optimally when W=0.4 m and L=0.35 m. The layout of the leaf cell allows cell abutting of the bit lines.
During a read operation, the selected latch outputs the stored value onto the two bit-lines. Since the bit-lines are always precharged, the bit-line differential voltage degrades. We use sense amplifiers to improve the differential voltage from the bit-lines. The main advantage in using a differential bit-lines is common-mode rejection, which reduces noise effects and signal degradation.
In our SRAM design, a single sense amplifier is shared among multiple columns. Typically, a single amplifier is used for each column of bit-lines as shown in Figure 3.5. However, in the proposed design, a single amplifier is shared between multiple columns by inserting the column decoder pass-transistors between the bit-lines and the amplifier. This results in area savings and power reduction.
From simulation, it was shown that this configuration performed better than having a sense amp for each column, since it reduces the drive load of the sense amplifier. Also, this configuration allows the sense amplifier to be isolated from the bit-lines at all times except during a read operation. Because the sense amplifiers are not driven by bit-lines at all times, the switching activity is reduced on the sense outputs.
Figure 3.6 shows the schematic and layout of a sense amplifier. A cross-coupled amplifier is used for the sense amp. Once a memory cell is selected for the read operation, the voltage on one of the complementary bit lines will start to drop slightly. Suppose that bit+ is higher than bit-. As a result, one of the NFETs, M3, is turned on, causing sense- to be pulled low. Consequently, one of the PFETs, M2, is turned on, pulling up sense+ output to a high value. The positive feedback of the cross-coupled PFETs accelerates the sensing speed by reinforcing M2’s gate value (sense-) to a high through M3.
The sense amplifier is the key component that limits the speed of read-time. Since the transistor sizing affects the speed of the sense amplifier, simulation was performed for different sizes of transistors. The fastest configuration is when the two NFETs (M1 and M4) are set to W=0.7 m, L=0.4 m and the rest were set to W=0.4 m. The layout shown in Figure 3.5 is the fastest configuration and also uses cell abutting of VDD and GND.
Leaf Cell Simulation Results
After custom layout of the leaf-cells in Cadence Virtuoso, the design rule checker (DRC) was used to verify that all leaf-cell layouts met the TSMC 0.35 m design rules. The leaf-cells were used first to create a layout for a small test circuit to verify operation and measure preliminary performance results. The test circuit consisted of an SRAM cell core connected to the bit-line conditioning circuit through the bit-lines with a sense amplifier to amplify the read output. The sense amplifier is isolated from the bit-lines when the cell is not being read from. Following DRC verification, parasitic capacitances were extracted from the layout. From the extracted circuit, a spice netlist was generated using Analog Artist, and simulations were performed using Avanti HSPICE. The extracted netlist was simulated for the following test case.
Write 0: word = 1, write=1, write_data = 1->0
Read 0: word = 1, write=0, write_data=0->1->0 (shouldn’t affect contents)
Write 1: word=1, write=1, write_data = 0-> 1
Read 1: word = 1, write=0, write_data=1
Turn-off RAM Cell: word=0, write=0
Figure 3.7 presents the simulation graph for a R/W to a single bit. The cell represents the value stored in the latch, while the bit represents the value on the bit lines. The output from the sense amplifier is labeled as sense. From the plot, we can see that the value in the cell node is driving the bit line. Because of the bit-line conditioning circuitry (and also the bit-line capacitance), the bit line is not pulled to a good ‘0’ during a read of ‘0’. However, the sense amplifier recovers the original value after some delay. After 20ns, the RAM cell is turned off, so the bit-line conditioning circuit drives the bit value, while the sense node floats towards an equalized value. Note that the cell still maintains the stored ‘1’, regardless of CEN being off.
Characteristics for this cell are provided in Table 3.1. Power dissipation was obtained using HSPICE’s .measure statement. Static power dissipated was obtained by taking the average of the two power dissipations, under the sense output at a high and the sense output at a low. Dynamic power was taken as the average of power dissipated during a change in the output due to an input change. For this example, the dynamic power dissipation included the average of dynamic power dissipation from both R/W’ and write_data changes. Nodal capacitances for the cell were obtained from HSPICE by adding the captab (capacitance table) option to the .option statement.
Decoders are needed to generate the word and column select signals for the SRAM. The input to the decoder is the address of the selected cell and the control signals. All decoders are implemented in a tree structure, as shown in Figure 3.8. Minimum-width (W=0.4 m, L=0.35 m) sized NFETs are used as pass transistors in the decoder.
When Enable is active (low), a selected decoded output is pulled down to a good logic ‘0’ value due to the use of NFETs. All unselected outputs are floating. When Enable is disabled (high), the selected output is at a poor logic ‘1’, and all unselected outputs are floating. To prevent unselect unselected outputs being floating, pull-up buffers are necessary at each output. The design of pull-up buffers is explained next.
Pull-up Buffers for the Decoder
In addition to pulling up unselected lines, a buffer is also needed to produce a good ‘1’ or ‘0’ for the select lines. A buffer is responsible for both pulling-up unselected lines and buffering the output so that the drive strength is increased. As shown in Figure 3.8, a buffer, in fact an inverter, is added at every decoded output in our design. Note that the selected output is at ‘1’ due to the inversion. The result is that all decoder outputs are zero except for the output that is selected by the input address. It will be placed at the output of the decoder, as shown in Figure 3.9.
Two types of pull-up transistors as shown in Figure 3.10 are considered for the buffer design. Both designs require a pull-up transistor for an unselected line. To compare performance, HSPICE simulation for the two designs was performed.
The design in Figure 3.10 (a) uses an NFET to act as a pull-up resistor to pull-up an unselected line. However, in order not to pull-up the selected line, the driving capability of the pull-up transistor needs to be low. Therefore, the width is set to the minimum size of W=0.4 m and the length is L= 0.75 m.
Figure 3.11 shows the simulation results of the buffer on a 1 kB SRAM. In Figure 3.11, when the decoder is disabled, the decoder outputs, equivalently inverter outputs, are at 0V for both selected and unselected lines. However, when the decoder is enabled, the selected decoder output is at 0.4 V and fails to pull up high. This is due to the fact that the pull-up transistor is too strong to be pulled down to a sufficiently low value. Therefore, this buffer design function properly without reducing the driving capability further. It requires increasing the length (since width is already the lowest), to result in increased area, so that this configuration is not adopted in our design.
Alternatively, the design in Figure 3.9 (b) uses a PFET with W=0.4 m and L=0.75 m. In this case, the gate of the PFET samples the value from the line. If an unselected line is floating to ‘0’, then it will be pulled up by the PFET. Figure 3.12 shows the simulation results for the buffer design in Figure 3.9 (b) on a 1-kB SRAM.
Figure 3.12 shows that when the decoder is enabled, the selected decoder (i.e. inverter) output is pulled up to Vdd (=3.3 V). Note that the inverter input of the selected line is sufficiently low (=0.9 V) to drive the inverter output to Vdd. Since this design works well, it is adopted for our final design. Figure 3.13 shows the schematic and layout of the final decoder buffer design, which includes a pull-up PFET with L=0.75 m, W= 0.4 m and an inverter with L=0.35 m, W= 0.4 m.
Leaf cell layout is critical in to the performance of the SRAM. In this chapter we examined the leaf cell layout and design. The performance of each cell has been measured and verified through SPICE simulations.