|
You’d like to use DDR-SDRAM as the storage
medium for OC-192 test pattern generation
to dramatically increase the length of
patterns available to you. However, when
you take the standard approach to architecting
this interface and attempt implementation
in a FPGA, you find that the
FIFO read clock enable signals don’t meet
the timing requirements. Your project
budget can’t afford an ASIC. How can you
get around this issue and make the FPGA
work as the needed interface?
Fortunately, the Xilinx digital clock
manager (DCM) provides the answer. As
long as you’ve got sufficient global clock
resources available, you can use the
DCM’s quadrature-phase outputs to clock
the outputs of the four FIFO groups
directly. This eliminates the need for clock
enable signals in the 320 MHz clock
domain and yields a design that will
achieve timing closure at that speed.
Standard Architecture
Figure 1 shows the standard architecture
used to design a streaming data interface
between DDR-SDRAM and OC-192 serializers.
All of the design blocks shown in
this figure, with the exception of the
FIFO/MUX controller, are available as
directly instantiated elements (the DDR
flip-flop), Xilinx CORE Generator™
modules (MUX x and FIFO n), or Xilinx-provided
reference designs (DDR-SDRAM
controller).
DDR-SDRAM Controller
The DDR-SDRAM controller is a modified
Xilinx-provided reference design. I modified
it to provide a continuously streaming mode
of operation. The controller provides the
necessary address and control signals for
driving a standard PC-1600/2100 DDR-SDRAM
module and also converts the data
bus from a 64-bit DDR mode to a 128-bit
SDR mode to facilitate a simpler clocking
scheme inside the FPGA.
The controller stops fetching data from
memory and disables the write enable signal
to the FIFOs when it detects that
FIFO_0’s high watermark signal has gone
active (not shown in Figure 1).
The frequency of clock 1 is 100 MHz,
yielding a burst data rate of 12.8 Gbps.
This data rate is 28% higher than that
required by the serializer, leaving room for
overhead tasks such as DRAM refresh.
FIFOs
The eight FIFO blocks represent standard,
asynchronous FIFO elements generated
using the CORE Generator tool. The
FIFOs serve a dual purpose. First, they provide
elastic buffering between two asynchronous
clock domains. Second, they
work in conjunction with the MUX x and
FIFO/MUX controller blocks to provide
the data bus width reduction. This is necessary
to convert from the 128-bit data bus
presented by the DDR-SDRAM controller
to the 16-bit data bus required by typical
OC-192 serializers. I will explain this functionality
in more detail.
MUXs
The two multiplexer blocks represent standard
4-to-1 synchronous 16-bit bus multiplexers
also generated using the CORE
Generator tool.
DDR Flip-flop
The DDR flip-flop is a directly instantiated
element of the Virtex-II™ architecture.
Its two inputs are multiplexed onto its single
output through successive clocking at
both clock edges. In this way, the data A
input is clocked through to data out at the
rising edge of the clock, while data B is
clocked through at the falling edge.
FIFO/MUX Controller
The FIFO/MUX controller is the only
fully custom element in the architecture
(also shown in Figure 1). Its operation is
straightforward and you can easily understand
it by looking at the timing diagram
of Figure 2.
Note that while the signals generated by
the FIFO/MUX controller are not explicitly
shown in the timing diagram, their
behavior is implicitly depicted there. For
instance, the outputs of FIFOs 0 and 1
only change when the enable A signal from
the FIFO/MUX controller is active.
Likewise, whenever a multiplexer output
changes, it is the respective select X signal
from the FIFO/MUX controller that
dictates which of the multiplexer’s four
inputs gets clocked through to its output.
Timing Discussion
As mentioned previously, Figure 2 depicts a
timing diagram that describes the dynamic
behavior of the architecture shown in
Figure 1. The clock signal shown is clock 2
from Figure 1. (The clock 1 domain does
not contain any novel design features.)
The clock period “T” is 3.125 ns
(1/[320 MHz]). The numbers identifying
the various data intervals correspond to the
index of a particular 16-bit word in a
sequence of data, which is at least 24 16-bit
words in length. The FIFO outputs are
updated once every four clock cycles in
quadrature succession, as shown in the timing
diagram.
The two multiplexers select from among
the four FIFO groups in the same succession.
However, a delay is imposed on the
multiplexer selection sequence such that the
FIFO outputs are allowed
three clock cycles to propagate
to the multiplexer
inputs before being selected,
as shown.
By designing this way,
I can apply a multi-cycle
delay constraint to the
FIFO outputs and thus
ease the task of the place
and route engine. This is
very helpful when designing
in a 320 MHz clock
domain. The DDR flip-flop
selects from among
its two inputs in standard
fashion, as shown. The
result is a 640 Mword/s
data stream at the output,
yielding a data rate to the
serializer of 10.24 Gbps.
Implementation Results
Up to this point, the proposed
architecture appears
to satisfy the design
requirements. However,
when I tried to implement
this approach in a Virtex-II XC2V1000-6BG575 part, I couldn’t get the propagation
delays of the FIFO read clock enable signals
under the 3.125 ns period constraint
imposed by the 320 MHz clock. Therefore,
I came up with the following modification
to the basic architecture (Figure 3), which
resulted in successful timing closure.
Modification Description
In the modified architecture shown in
Figure 3, I eliminated the clock 1 domain
section of the design, as it is irrelevant to
the discussion. The only design block that
changed in the modified architecture is the
FIFO/MUX controller; all other design
blocks remain unchanged.
Instead of generating clock-enable signals,
the controller block generates four 80
MHz clocks in a quadrature phase arrangement.
This is very easy to accomplish when
designing for the Xilinx Virtex-II architecture,
as the DCMs in that architecture have
quadrature phase outputs. All I had to do
was divide down the 320 MHz clock by a
factor of four, using an additional DCM, to
generate the original 80 MHz clock signal.
When I apply these quadrature-phased
80 MHz clocks directly to the four FIFO
groups, respectively – without using any
clock enable signals – the data at all FIFO
outputs is exactly as required by the original
architecture. You can see this fairly easily by
envisioning these four clocks overlaid atop
the four FIFO n/n signals in the timing diagram
of Figure 2.
Notice that the rising edges of the four
clocks line up perfectly with the changes
in the outputs of the four FIFO groups.
You no longer need to use clock enables to
govern the FIFO read clocking and have,
therefore, eliminated the one group of signals
that failed to achieve timing closure.
Of course, you must have two additional
DCMs and five additional global clock
buffers available to make use of this
approach.
Conclusion
By taking advantage of the design
technique presented here, OC-192
test pattern generation hardware
designers can avail themselves of
the low cost and large capacity of
standard DDR-SDRAM modules,
thereby making possible the use of
extremely long test patterns or
automated testing with many
shorter patterns, all without incurring
the cost of ASIC design and
production.
The technique presented here
will also find applicability in the
area of direct digital synthesis
(DDS) of arbitrary waveforms,
where a high-speed digital waveform
is used, in conjunction with
PWM or Sigma/Delta modulation
and subsequent low-pass filtering,
to produce arbitrary
waveforms with great precision
and repeatability.
If you have any questions or suggestions,
please contact me, David
Banas, at (415) 846-5837, or e-mail
at dbanas@taoofdigital.com.
Printable PDF version of this article with graphics. (8/1/04) 285 KB |