|
A FIFO is a memory subsystem where a data
sequence can be written and retrieved in
exactly the same order. No explicit addressing
is required, and the write and read operations
can be completely independent, using
unrelated clocks.
“First-In First-Out” has been used in
accounting for hundreds of years, as well as in
data queues since the early days of computers.
In 1970, Fairchild Semiconductor introduced
the first integrated FIFO, the 3341.
Today, dedicated and much larger FIFO
ICs are available, and mid-sized FIFOs are
often implemented in Xilinx® FPGAs using
the dual-ported block RAMs supported by
soft cores for addressing and control.
A FIFO is an ideal subsystem: simple
and user-friendly on the outside but complex
and demanding in its implementation
details. The design seems to be trivial;
using a RAM with two independently
clocked ports (one for writing, one for
reading) plus two independent address
counters to steer write and read data.
It may look easy, but the difficulty is
found when you look deeper into the
challenge – specifically, the decoding and
synchronization of the obligatory status
outputs indicating the extreme conditions
of EMPTY and FULL. Even experienced
designers have had problems decoding
these two conditions in a fail-safe way,
especially when the FIFO operates with
two independent clocks of several hundred
megahertz.
Because fast asynchronous design is
notoriously difficult, Virtex-4™ FPGAs
now have a dedicated FIFO addressing
and control circuit right inside each block
RAM. Using the Virtex-4 block RAM
FIFO option, you can be assured of reliable
operation at a clock rate up to 500
MHz, without using any logic slices in
the Virtex-4 fabric.
Virtex-4 FIFO
The FIFO shown in Figure 1 behaves like
a “black box.” You supply the data (4, 9,
18, or 36 bits wide), a continuously running
write clock and its enable signal, and
a continuously running read clock and
read clock enable. Output data has the
same width as the input data, unlike the
basic block RAM where the two widths
can be different.
As the last data entry is being read,
EMPTY goes high as a result of the read
clock that reads the final data. You are supposed
to disable the read operation until the
EMPTY output has gone inactive again.
Note that both the rising and falling
edge of the EMPTY status signal are
made synchronous with the read clock,
giving you a totally synchronous interface.
If read clock enable stays active after
the FIFO is empty, the read error flag is
activated, but FIFO content and addressing
are not disturbed.
ALMOST EMPTY and ALMOST
FULL are programmable status outputs,
available as a warning to slow down the
read or write process, or as an indication of
the data level in the FIFO (“dipstick”).
Implementation Details
Understanding FIFO design details is not
necessary. It is all “under the hood,” and
works without user intervention. But for
the curious reader, let’s briefly explain.
Detecting FULL and EMPTY requires
detecting identity of the write and read address pointers, which generally do not share
a common clock. Binary counters would generate
unacceptable glitches on the comparator
output; using Gray-coded counters is the
well-known solution to this problem.
The simplest way to build Gray counters
is to start with a binary counter and synchronously
convert its content into Gray
code. The binary address counter values can
then be used to calculate the programmable
offset for detecting ALMOST FULL and
ALMOST EMPTY.
Synchronization Issues
Because EMPTY can only be caused by a
read operation, the leading edge is naturally
synchronous with the read clock. But the
trailing edge is caused by a write operation
and is thus synchronous with the “wrong”
clock. Moving the trailing edge of EMPTY
over onto the read clock domain needs
some flip-flops and invites the specter of
metastability.
Virtex-4 FPGAs use a conservative synchronizer
design that has been demonstrated
to work reliably at a 500 MHz read clock
rate. We ran a week-long test with ~200 and
~500 MHz asynchronous clock rates, generating
EMPTY more than 1014 times without
a single failure. The synchronizer delays the
trailing edge of EMPTY by a few read clock
periods. This latency is acceptable, since it
does not affect top performance.
In a similar way, the trailing edge of
FULL is synchronized to the write clock.
The software default is for FULL to have
one write clock latency. We therefore recommend
using ALMOST FULL instead.
A well-designed FIFO buffer should
never go FULL, and should go EMPTY
only when you want to drain the last word
from the buffer.
Conclusion
The hard-coded FIFO controller is available
in every Virtex-4 block RAM, and uses no
additional resources in the fabric. It also
saves you from making any complex, timeconsuming,
and risky design decisions.
For a detailed description of the Virtex-4
FIFO controller, visit the Virtex-4 User
Guide on the Xilinx website at www.xilinx.com/bvdocs/userguides/ug070.pdf.
Verifying the EMPTY
Flag Synchronization
The only tricky detail in a FIFO with
unrelated read and write clocks is the
proper synchronization of the
EMPTY and FULL flags that cross
clock boundaries. Any design that
might thus be exposed to metastabilty
problems deserves special attention
and scrutiny.
At Xilinx, we tested the EMPTY
logic exhaustively by writing data into
the FIFO at 200 MHz and reading it
out at 500 MHz, which makes it go
EMPTY soon after each write cycle
(Figure 2). The detection logic was
thus exercised, and the trailing edge
of the EMPTY flag was re-synchronized
to the write clock 200 million
times a second.
More specifically, we wrote an
ascending data sequence at 200 MHz
and read it out at 500 MHz. We
wrote the output data directly into a
second FIFO at the same 500 MHz.
We then read the second FIFO out at
the original 200 MHz rate.
The combined dual FIFO forms a
synchronous system, but with asynchronous
data transfer between the
two halves. When we synchronously
subtracted the input data from the
output data, the difference was constant,
indicating flawless transfer at
the 500 MHz read/write rate and no
flag synchronization problem – even
at this high rate.
When the two clock frequencies
are uncorrelated, each read clock
cycle has a different phase relationship
with respect to the write clock.
During any second, the active read
clock edge steps across the ~5 ns
write clock period in ~200 million
different phase orientations, thus creating
a timing granularity of 0.025
femtoseconds (one quadrillionth of a
second). This resolution is millions
of times better than any conventional
deterministic test methodology can
possibly achieve.
We ran this design for a whole
week, with more than 1014 operations,
without any error. |
Printable PDF version of this article with graphics. (1/15/05) 318 KB |