Support|documentation

  Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
FIFOs Made Easy



by Peter Alfke, Director of Applications Engineering, Xilinx, Inc.
peter.alfke@xilinx.com (1/15/05)


Virtex-4 FPGAs have a complete FIFO controller in each block RAM.
article link to PDF
Article PDF 318 KB


A FIFO is a memory subsystem where a data sequence can be written and retrieved in exactly the same order. No explicit addressing is required, and the write and read operations can be completely independent, using unrelated clocks.

“First-In First-Out” has been used in accounting for hundreds of years, as well as in data queues since the early days of computers. In 1970, Fairchild Semiconductor introduced the first integrated FIFO, the 3341.

Today, dedicated and much larger FIFO ICs are available, and mid-sized FIFOs are often implemented in Xilinx® FPGAs using the dual-ported block RAMs supported by soft cores for addressing and control.

A FIFO is an ideal subsystem: simple and user-friendly on the outside but complex and demanding in its implementation details. The design seems to be trivial; using a RAM with two independently clocked ports (one for writing, one for reading) plus two independent address counters to steer write and read data.

It may look easy, but the difficulty is found when you look deeper into the challenge – specifically, the decoding and synchronization of the obligatory status outputs indicating the extreme conditions of EMPTY and FULL. Even experienced designers have had problems decoding these two conditions in a fail-safe way, especially when the FIFO operates with two independent clocks of several hundred megahertz.

Because fast asynchronous design is notoriously difficult, Virtex-4™ FPGAs now have a dedicated FIFO addressing and control circuit right inside each block RAM. Using the Virtex-4 block RAM FIFO option, you can be assured of reliable operation at a clock rate up to 500 MHz, without using any logic slices in the Virtex-4 fabric.

Virtex-4 FIFO
The FIFO shown in Figure 1 behaves like a “black box.” You supply the data (4, 9, 18, or 36 bits wide), a continuously running write clock and its enable signal, and a continuously running read clock and read clock enable. Output data has the same width as the input data, unlike the basic block RAM where the two widths can be different.

As the last data entry is being read, EMPTY goes high as a result of the read clock that reads the final data. You are supposed to disable the read operation until the EMPTY output has gone inactive again.

Note that both the rising and falling edge of the EMPTY status signal are made synchronous with the read clock, giving you a totally synchronous interface. If read clock enable stays active after the FIFO is empty, the read error flag is activated, but FIFO content and addressing are not disturbed.

ALMOST EMPTY and ALMOST FULL are programmable status outputs, available as a warning to slow down the read or write process, or as an indication of the data level in the FIFO (“dipstick”).

Implementation Details
Understanding FIFO design details is not necessary. It is all “under the hood,” and works without user intervention. But for the curious reader, let’s briefly explain.

Detecting FULL and EMPTY requires detecting identity of the write and read address pointers, which generally do not share a common clock. Binary counters would generate unacceptable glitches on the comparator output; using Gray-coded counters is the well-known solution to this problem.

The simplest way to build Gray counters is to start with a binary counter and synchronously convert its content into Gray code. The binary address counter values can then be used to calculate the programmable offset for detecting ALMOST FULL and ALMOST EMPTY.

Synchronization Issues
Because EMPTY can only be caused by a read operation, the leading edge is naturally synchronous with the read clock. But the trailing edge is caused by a write operation and is thus synchronous with the “wrong” clock. Moving the trailing edge of EMPTY over onto the read clock domain needs some flip-flops and invites the specter of metastability.

Virtex-4 FPGAs use a conservative synchronizer design that has been demonstrated to work reliably at a 500 MHz read clock rate. We ran a week-long test with ~200 and ~500 MHz asynchronous clock rates, generating EMPTY more than 1014 times without a single failure. The synchronizer delays the trailing edge of EMPTY by a few read clock periods. This latency is acceptable, since it does not affect top performance.

In a similar way, the trailing edge of FULL is synchronized to the write clock. The software default is for FULL to have one write clock latency. We therefore recommend using ALMOST FULL instead.

A well-designed FIFO buffer should never go FULL, and should go EMPTY only when you want to drain the last word from the buffer.

Conclusion
The hard-coded FIFO controller is available in every Virtex-4 block RAM, and uses no additional resources in the fabric. It also saves you from making any complex, timeconsuming, and risky design decisions.

For a detailed description of the Virtex-4 FIFO controller, visit the Virtex-4 User Guide on the Xilinx website at www.xilinx.com/bvdocs/userguides/ug070.pdf.
Verifying the EMPTY Flag Synchronization
The only tricky detail in a FIFO with unrelated read and write clocks is the proper synchronization of the EMPTY and FULL flags that cross clock boundaries. Any design that might thus be exposed to metastabilty problems deserves special attention and scrutiny.

At Xilinx, we tested the EMPTY logic exhaustively by writing data into the FIFO at 200 MHz and reading it out at 500 MHz, which makes it go EMPTY soon after each write cycle (Figure 2). The detection logic was thus exercised, and the trailing edge of the EMPTY flag was re-synchronized to the write clock 200 million times a second.

More specifically, we wrote an ascending data sequence at 200 MHz and read it out at 500 MHz. We wrote the output data directly into a second FIFO at the same 500 MHz. We then read the second FIFO out at the original 200 MHz rate.

The combined dual FIFO forms a synchronous system, but with asynchronous data transfer between the two halves. When we synchronously subtracted the input data from the output data, the difference was constant, indicating flawless transfer at the 500 MHz read/write rate and no flag synchronization problem – even at this high rate.

When the two clock frequencies are uncorrelated, each read clock cycle has a different phase relationship with respect to the write clock. During any second, the active read clock edge steps across the ~5 ns write clock period in ~200 million different phase orientations, thus creating a timing granularity of 0.025 femtoseconds (one quadrillionth of a second). This resolution is millions of times better than any conventional deterministic test methodology can possibly achieve.

We ran this design for a whole week, with more than 1014 operations, without any error.

Printable PDF version of this article with graphics. PDF logo (1/15/05) 318 KB

 
/csi/footer.htm