Support|documentation

  Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Streaming Data at 10 Gbps



by David Banas, President, Tao of Digital, Inc.
dbanas@taoofdigital.com  (8/1/04)


Using a Virtex-II FPGA to stream data from DDR-SDRAM to OC-192 serializers.
article link to PDF
Article PDF 285 KB


You’d like to use DDR-SDRAM as the storage medium for OC-192 test pattern generation to dramatically increase the length of patterns available to you. However, when you take the standard approach to architecting this interface and attempt implementation in a FPGA, you find that the FIFO read clock enable signals don’t meet the timing requirements. Your project budget can’t afford an ASIC. How can you get around this issue and make the FPGA work as the needed interface?

Fortunately, the Xilinx digital clock manager (DCM) provides the answer. As long as you’ve got sufficient global clock resources available, you can use the DCM’s quadrature-phase outputs to clock the outputs of the four FIFO groups directly. This eliminates the need for clock enable signals in the 320 MHz clock domain and yields a design that will achieve timing closure at that speed.

Standard Architecture
Figure 1 shows the standard architecture used to design a streaming data interface between DDR-SDRAM and OC-192 serializers. All of the design blocks shown in this figure, with the exception of the FIFO/MUX controller, are available as directly instantiated elements (the DDR flip-flop), Xilinx CORE Generator™ modules (MUX x and FIFO n), or Xilinx-provided reference designs (DDR-SDRAM controller).

DDR-SDRAM Controller
The DDR-SDRAM controller is a modified Xilinx-provided reference design. I modified it to provide a continuously streaming mode of operation. The controller provides the necessary address and control signals for driving a standard PC-1600/2100 DDR-SDRAM module and also converts the data bus from a 64-bit DDR mode to a 128-bit SDR mode to facilitate a simpler clocking scheme inside the FPGA.

The controller stops fetching data from memory and disables the write enable signal to the FIFOs when it detects that FIFO_0’s high watermark signal has gone active (not shown in Figure 1).

The frequency of clock 1 is 100 MHz, yielding a burst data rate of 12.8 Gbps. This data rate is 28% higher than that required by the serializer, leaving room for overhead tasks such as DRAM refresh.

FIFOs
The eight FIFO blocks represent standard, asynchronous FIFO elements generated using the CORE Generator tool. The FIFOs serve a dual purpose. First, they provide elastic buffering between two asynchronous clock domains. Second, they work in conjunction with the MUX x and FIFO/MUX controller blocks to provide the data bus width reduction. This is necessary to convert from the 128-bit data bus presented by the DDR-SDRAM controller to the 16-bit data bus required by typical OC-192 serializers. I will explain this functionality in more detail.

MUXs
The two multiplexer blocks represent standard 4-to-1 synchronous 16-bit bus multiplexers also generated using the CORE Generator tool.

DDR Flip-flop
The DDR flip-flop is a directly instantiated element of the Virtex-II™ architecture. Its two inputs are multiplexed onto its single output through successive clocking at both clock edges. In this way, the data A input is clocked through to data out at the rising edge of the clock, while data B is clocked through at the falling edge.

FIFO/MUX Controller
The FIFO/MUX controller is the only fully custom element in the architecture (also shown in Figure 1). Its operation is straightforward and you can easily understand it by looking at the timing diagram of Figure 2.

Note that while the signals generated by the FIFO/MUX controller are not explicitly shown in the timing diagram, their behavior is implicitly depicted there. For instance, the outputs of FIFOs 0 and 1 only change when the enable A signal from the FIFO/MUX controller is active.

Likewise, whenever a multiplexer output changes, it is the respective select X signal from the FIFO/MUX controller that dictates which of the multiplexer’s four inputs gets clocked through to its output.

Timing Discussion
As mentioned previously, Figure 2 depicts a timing diagram that describes the dynamic behavior of the architecture shown in Figure 1. The clock signal shown is clock 2 from Figure 1. (The clock 1 domain does not contain any novel design features.)

The clock period “T” is 3.125 ns (1/[320 MHz]). The numbers identifying the various data intervals correspond to the index of a particular 16-bit word in a sequence of data, which is at least 24 16-bit words in length. The FIFO outputs are updated once every four clock cycles in quadrature succession, as shown in the timing diagram.

The two multiplexers select from among the four FIFO groups in the same succession. However, a delay is imposed on the multiplexer selection sequence such that the FIFO outputs are allowed three clock cycles to propagate to the multiplexer inputs before being selected, as shown.

By designing this way, I can apply a multi-cycle delay constraint to the FIFO outputs and thus ease the task of the place and route engine. This is very helpful when designing in a 320 MHz clock domain. The DDR flip-flop selects from among its two inputs in standard fashion, as shown. The result is a 640 Mword/s data stream at the output, yielding a data rate to the serializer of 10.24 Gbps.

Implementation Results
Up to this point, the proposed architecture appears to satisfy the design requirements. However, when I tried to implement this approach in a Virtex-II XC2V1000-6BG575 part, I couldn’t get the propagation delays of the FIFO read clock enable signals under the 3.125 ns period constraint imposed by the 320 MHz clock. Therefore, I came up with the following modification to the basic architecture (Figure 3), which resulted in successful timing closure.

Modification Description
In the modified architecture shown in Figure 3, I eliminated the clock 1 domain section of the design, as it is irrelevant to the discussion. The only design block that changed in the modified architecture is the FIFO/MUX controller; all other design blocks remain unchanged.

Instead of generating clock-enable signals, the controller block generates four 80 MHz clocks in a quadrature phase arrangement. This is very easy to accomplish when designing for the Xilinx Virtex-II architecture, as the DCMs in that architecture have quadrature phase outputs. All I had to do was divide down the 320 MHz clock by a factor of four, using an additional DCM, to generate the original 80 MHz clock signal.

When I apply these quadrature-phased 80 MHz clocks directly to the four FIFO groups, respectively – without using any clock enable signals – the data at all FIFO outputs is exactly as required by the original architecture. You can see this fairly easily by envisioning these four clocks overlaid atop the four FIFO n/n signals in the timing diagram of Figure 2.

Notice that the rising edges of the four clocks line up perfectly with the changes in the outputs of the four FIFO groups. You no longer need to use clock enables to govern the FIFO read clocking and have, therefore, eliminated the one group of signals that failed to achieve timing closure. Of course, you must have two additional DCMs and five additional global clock buffers available to make use of this approach.

Conclusion
By taking advantage of the design technique presented here, OC-192 test pattern generation hardware designers can avail themselves of the low cost and large capacity of standard DDR-SDRAM modules, thereby making possible the use of extremely long test patterns or automated testing with many shorter patterns, all without incurring the cost of ASIC design and production.

The technique presented here will also find applicability in the area of direct digital synthesis (DDS) of arbitrary waveforms, where a high-speed digital waveform is used, in conjunction with PWM or Sigma/Delta modulation and subsequent low-pass filtering, to produce arbitrary waveforms with great precision and repeatability.

If you have any questions or suggestions, please contact me, David Banas, at (415) 846-5837, or e-mail at dbanas@taoofdigital.com.

Printable PDF version of this article with graphics. PDF logo (8/1/04) 285 KB

 
/csi/footer.htm