Static Random Access Memory (SRAM)


Click on the layout to zoom in

Table of Contents
Abstract 
Introduction 
Organization 
Operation 
Circuit Features 
Results 

Abstract-This report describes two embedded 8K CMOS SRAMS designed to provide on-chip memory for an MP3 decoder chip.  The first array, which stores the data memory, is 8K X 16 bits,  while the second array, which is 8K X 24 bits, stores the program memory.  The access time for both arrays is 1.56 ns using a proposed 10 ns cycle time.  A TSMC 0.25 um process was used to implement both SRAMS.  Details of the organization of these arrays, sense-amp detection, along with examples of the read and write operations will be described.

Introduction-The use of on-chip SRAM in an application such as a DSP provides a fast and reliable way to store data which is valid for random read and write operations.  The program memory provides adequate space for the chip to load the entire decoding program such that the software is available on chip, allowing for faster program execution.  The data memory is designed so that the processor is able to cache it's data operations during execution.
   Srams of these sizes become difficult to design due to the large wire capacitance's that are inherent.  In order to minimize such capacitance's, various layout organizational techniques must be used, to speed up access times.  To minimize the parasitic delays due to wire capacitance both arrays were divided into four blocks.  This technique allows the bit lines to be shorter, therefore greatly reducing the access time for both the read and write operations.  Column multiplexing was used to further reduce the length of the word lines, allowing for even faster access.  Furthermore, the use of column decoders reduces the complexity of the row decoder.
    In addition to the organizational techniques used to improve the access time, differential sense amplifiers were used to quickly sense the voltage difference that will occur on the bit lines during the read operation.  This technique greatly improves the access time during the read operation, since there is no need to wait for the bit line to fully discharge to determine the output.
 
Organization-Both arrays consist of four blocks.  Each block in the data memory consists of 128 wordlines (rows) and 256 columns.  The program memory has blocks that contain 128 rows and 384 columns.  Aside from the difference in the number of columns both arrays are identical.  In fact, all of the periphery circuitry was sized to accommodate the larger array so that the same layouts could be used in both designs.  The block selection was implemented with simple static logic gates using two address bits to select one of the four blocks.  The block select signals were used to qualify the wordlines, in addition to enabling the read/write circuitry associated with each block.
    The row decoding was accomplished using 8 four-input dynamic nor decoders.  Each row decoder was enabled by a pre-decode signal.  The pre-decoder was designed with static logic and used 3 address signals.  Each nor gate then provided 16 unique outputs, which when multiplied with 8 gives 128 signals.  These signals were then and'ed with the block select signals to qualify the wordlines.
    Finally, the column decoding was implemented using a four-input dynamic nor decoder.  This decoder controls the 16-bit multiplexers placed at the output side of each block.
 
Organization Summary
Block Select
Pre-Decode
Row Decode
Column Decode 
2 Address Bits
3 Address Bits
4 Address Bits
4 Address Bits
Total = 13 Address Bits (2^13 = 8K)

 
Operation-The srams support two operations, read and write. Using the clock, chip enable, and write signals, in addition to a 13 bit address the arrays are able to either read or write a word anywhere in its addressable space. The clock is used for both pre-charging the bit lines and enable a read or write operation. During the first half of the clock cycle the bit lines are pre-charging high, and during the second half a read or write is taking place.
The arrays are both active when the clock is high and can be thought of transparent latches. When the clock is high and both the write and chip enable signals are also high, a write operation is occurring. Similarly, when the clock is high while chip enable is high and write is low, a read operation is occurring. The 13-bit address simply specifies where in the array is the data stored.
Timing is important concern in the operation of srams. First, the address must be valid for the entire duration of the reading or writing functions. Second, the duration of the second half of the clock cycle must be long enough such that the sram cells are able to change state (writing to the array). In addition, the first half of the clock cycle must be long enough in order to full pre-charge the bit lines. Using a 10 ns clock cycle, we found that both halves of the clock cycle we adequate for proper operation.

Circuit Features-The bit lines are precharged using NMOS transistors to provide a precharge voltage of (Vdd -Vt). This has the following advantages:  The N-channel Mux does not have a delay in transmitting a level of (Vdd-Vt) from its drain terminal to the source terminal.  The common-mode requirements of the sense-amps are less stringent as the maximum common mode levels at the bitlines are now (Vdd-Vt). Hence the senseamps are biased within their active region throughout the read cycle.

        For each bit/bit_bar line, TWO sets of pre-charge devices were necessary in order to pre-charge BOTH sides of the column multiplexer simultaneously. This was required  because of  charge sharing  that occurred at the MUX output node.
        The sense amp and skewed inverter were designed to signal a valid high/ low by detecting a 100mv difference between bit & bit_bar lines. The differential sensing scheme also provides immunity from common-mode noise pickup  due to parasitic  coupling . From the  timing diagram it can be seen that majority of the delay during read/ write operation is due to the delay in driving the wordline.

Timing Diagram

Results-To obtain the timing results we constructed a bit slice of the sram. A bit slice is simply one bit of memory accessed using the worst case RC delays. Pi models were then used within the schematic to roughly model these delays. HSPICE was then used to simulate the circuit (see waveform). The purpose for using the bit slice, as opposed to simulating the entire array, is that the array is simply to large and the simulator would not be able to handle it. In addition, using a bit slice simplifies the debugging process, since there are fewer nodes to examine.
As you can see from the waveform below, the access time is 1.56 ns. Here we have defined the access time as the delay from when the clock goes high to when the data is valid on the output.

 


Simulated Access Time