Support|documentation

  Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Virtex-4 Memory Interfaces



by Maria George, Senior Product Applications Engineer, Xilinx, Inc.
maria.george@xilinx.com (1/15/05)


Virtex-4 devices make challenging memory interface requirements simple.
article link to PDF
Article PDF 328 KB


Xilinx® Virtex-4™ devices have a 64-tap absolute delay element built in each I/O, making high-speed memory interface read data capture very easy. This feature also provides the flexibility to adopt different read data capture schemes where clock/strobe or data can be delayed.

During a write to the external memory device, the clock/strobe must be transmitted center-aligned with respect to data. A memory write is easy to implement with Virtex-4 devices by means of the quadrature phase outputs of the DCM (CLK0, CLK90, CLK180, CLK270), ensuring that the clock/strobe is center-aligned with data. Figure 1 illustrates the clock/strobe and data phase relationship during read and write transactions.

For most memory interfaces, such as DDR 2 SDRAM, RLDRAM II, FCRAM II, and QDR II SRAM, the data rate is twice the clock rate because data is received and transmitted on both the rising and falling edges of the forwarded clock/strobe. Virtex-4 devices have both input and output DDR flip-flops, making DDR operation extremely simple.

Write Data and Clock/Strobe Transmission
During a write operation, the clock/strobe is generated using the output DDR registers clocked by a DCM clock output (CLK0) on the global clock network. The write data is transmitted using the output DDR registers clocked by a DCM clock output that is 90 degrees phase ahead (CLK270) of the clock used to generate clock/strobe. This meets the memory vendor specification of centering the clock/strobe in the data window.

Another innovative feature of the output DDR registers is the SAME_EDGE mode of operation. In this mode, a third register clocked by a rising edge is placed on the input of the falling edge register (Figure 2). Using this mode, both rising edge and falling edge data can be presented to the output DDR registers on the same clock edge (CLK270), thereby allowing higher DDR performance with minimal register-to-register delay.

Read Data Capture
Most memory interfaces are source-synchronous interfaces, where the clock/strobe is received edge-aligned with data during a read from the external memory device. This makes read data capture challenging because the read clock/strobe must be delayed to capture read data.

Read data capture is challenging because the read data and the incoming memory read clock/strobe are received edge-aligned from the external device.

The traditional technique to capture read data is to register it in the delayed memory clock/strobe domain. This entails:

  • Ensuring that the memory clock/strobe and the associated data have matched PCB trace delays between the memory device and the FPGA
  • Delaying the clock/strobe signals such that the edges of the clock/strobe center in the valid data window, as shown in Figure 3
  • Registering the read data with the delayed memory clock/strobe
  • Synchronizing registered read data to the system (FPGA) clock domain An alternate and simpler technique, currently used in Xilinx reference designs, is to capture read data directly in the system (FPGA) clock domain. This entails:
  • Ensuring that the memory clock/strobe and the associated data have matched PCB trace delays between the memory device and the FPGA
  • Determining phase difference between the memory clock/strobe to the system (FPGA) clock by detecting two memory clock/strobe transitions in the system clock domain
  • Detecting transitions of memory clock/strobe after the memory initialization sequence by delaying memory clock/strobe with respect to the system (FPGA) clock in unit increments
  • Delaying read data based on memory clock/strobe to system (FPGA) phase information such that the system (FPGA) clock is centered in the valid data window
Both techniques require delay elements to delay the clock/strobe or data.

The 64-tap, 80 ps absolute delay element available in each Virtex-4 I/O allows center alignment of memory clock/strobe in the data window or data centering with the system (FPGA) clock. Each Virtex-4 I/O also has input DDR flip-flops that are required for read data capture, either in the delayed memory strobe domain or the system (FPGA) clock domain.

You can use the input DDR flip-flops in the SAME_EDGE or SAME_EDGE_ PIPELINED modes. In the SAME_EDGE mode, the falling edge data is output on the following rising edge of the clock (Figure 4). In the SAME_EDGE_PIPELINED mode, both the rising edge and falling edge data are output together on the same rising edge of the clock (Figure 5). With these modes you can achieve higher design performance by avoiding half-clock cycle data paths in the FPGA fabric.

In the first technique, read data is captured in the delayed memory clock/strobe domain and must be re-captured in the system (FPGA) clock domain. The transfer of captured read data from the delayed memory clock/strobe domain to the internal system (FPGA clock) domain is defined as read data re-capture. Read data is re-captured within the I/O block.

Using the second technique, implemented in the Xilinx reference designs, you can directly capture read data in the system (FPGA) clock domain by delaying read data to meet the setup/hold time of the flip-flops in the system (FPGA) clock domain. A simple state machine is sufficient to implement the center alignment of the delayed read data with respect to the system (FPGA) clock after the initialization period.

This “run time” adjustment after the memory initialization sequence has significant advantages over other methods that set the required delay or phase shift during “compile time.” The 64-tap absolute delay element compensates for variations in process, temperature, or voltage, and hence increases the timing margins – resulting in a more reliable system.

The read data is re-captured and stored directly into the block RAM FIFO, a Virtex-4 feature that saves additional logic resources.

Conclusion
Virtex-4 architectural features enable you to easily and reliably implement high-speed memory interfaces. You can use the 64-tap, 80 ps absolute delay elements to capture read data by either delaying the memory clock/strobe or the data. Built in each I/O, the 64-tap absolute delay elements provide you the flexibility to select any I/O for memory interfaces. The “run time” adjustment after memory initialization improves design margins.

The input and output DDR registers enable you to receive and transmit clock/strobe and data at high frequencies; the differential clocking resource provides higher performance with better duty cycle and lower global clock buffer utilization; and the block RAM FIFO feature enables you to store transmitted or received data without additional logic resources.

For more information about the implementation and design details of different memory interfaces in Virtex-4 devices, visit the following websites:

  • DDR2 SDRAM (XAPP 701 and XAPP702) and DDR SDRAM
    (XAPP709): www.xilinx.com/products/design_resources/mem_corner/resource/xaw_dram_ddr.htm
  • RLDRAM (XAPP710):
    www.xilinx.com/products/design_resources/mem_corner/resource/rldram.htm
  • QDR II SRAM (XAPP703):
    www.xilinx.com/products/design_resources/mem_corner/resource/xaw_sram_qdr.htm

Printable PDF version of this article with graphics. PDF logo (1/15/05) 335 KB

 
/csi/footer.htm