|
Xilinx® Virtex-4™ devices have a 64-tap
absolute delay element built in each I/O,
making high-speed memory interface read
data capture very easy. This feature also
provides the flexibility to adopt different
read data capture schemes where
clock/strobe or data can be delayed.
During a write to the external memory
device, the clock/strobe must be transmitted
center-aligned with respect to data. A
memory write is easy to implement with
Virtex-4 devices by means of the quadrature
phase outputs of the DCM (CLK0,
CLK90, CLK180, CLK270), ensuring that
the clock/strobe is center-aligned with
data. Figure 1 illustrates the clock/strobe
and data phase relationship during read
and write transactions.
For most memory interfaces, such as
DDR 2 SDRAM, RLDRAM II, FCRAM
II, and QDR II SRAM, the data rate is
twice the clock rate because data is
received and transmitted on both the rising
and falling edges of the forwarded
clock/strobe. Virtex-4 devices have both
input and output DDR flip-flops, making
DDR operation extremely simple.
Write Data and Clock/Strobe Transmission
During a write operation, the clock/strobe is
generated using the output DDR registers
clocked by a DCM clock output (CLK0) on
the global clock network. The write data is
transmitted using the output DDR registers
clocked by a DCM clock output that is 90
degrees phase ahead (CLK270) of the clock
used to generate clock/strobe. This meets the
memory vendor specification of centering the
clock/strobe in the data window.
Another innovative feature of the output
DDR registers is the SAME_EDGE mode of
operation. In this mode, a third register
clocked by a rising edge is placed on the input
of the falling edge register (Figure 2). Using
this mode, both rising edge and falling edge
data can be presented to the output DDR registers
on the same clock edge (CLK270),
thereby allowing higher DDR performance
with minimal register-to-register delay.
Read Data Capture
Most memory interfaces are source-synchronous
interfaces, where the clock/strobe
is received edge-aligned with data during a
read from the external memory device. This
makes read data capture challenging because
the read clock/strobe must be delayed to
capture read data.
Read data capture is challenging because
the read data and the incoming memory
read clock/strobe are received edge-aligned
from the external device.
The traditional technique to capture
read data is to register it in the delayed
memory clock/strobe domain. This entails:
- Ensuring that the memory clock/strobe
and the associated data have matched
PCB trace delays between the memory
device and the FPGA
- Delaying the clock/strobe signals such
that the edges of the clock/strobe center
in the valid data window, as shown
in Figure 3
- Registering the read data with the
delayed memory clock/strobe
- Synchronizing registered read data to the
system (FPGA) clock domain
An alternate and simpler technique,
currently used in Xilinx reference designs,
is to capture read data directly in the system
(FPGA) clock domain. This entails:
- Ensuring that the memory clock/strobe
and the associated data have matched
PCB trace delays between the memory
device and the FPGA
- Determining phase difference between
the memory clock/strobe to the system
(FPGA) clock by detecting two memory
clock/strobe transitions in the system
clock domain
- Detecting transitions of memory
clock/strobe after the memory initialization
sequence by delaying memory
clock/strobe with respect to the system
(FPGA) clock in unit increments
- Delaying read data based on memory
clock/strobe to system (FPGA) phase
information such that the system
(FPGA) clock is centered in the valid
data window
Both techniques require delay elements
to delay the clock/strobe or data.
The 64-tap, 80 ps absolute delay element
available in each Virtex-4 I/O
allows center alignment of memory
clock/strobe in the data window or data
centering with the system (FPGA) clock.
Each Virtex-4 I/O also has input DDR
flip-flops that are required for read data
capture, either in the delayed memory strobe domain or the system (FPGA)
clock domain.
You can use the input DDR flip-flops in
the SAME_EDGE or SAME_EDGE_
PIPELINED modes. In the SAME_EDGE
mode, the falling edge data is output on the
following rising edge of the clock (Figure 4).
In the SAME_EDGE_PIPELINED mode,
both the rising edge and falling edge data
are output together on the same rising edge
of the clock (Figure 5). With these modes
you can achieve higher design performance
by avoiding half-clock cycle data paths in
the FPGA fabric.
In the first technique, read data is captured
in the delayed memory clock/strobe domain and must be re-captured in the system
(FPGA) clock domain. The transfer of
captured read data from the delayed memory
clock/strobe domain to the internal
system (FPGA clock) domain is defined as
read data re-capture. Read data is re-captured
within the I/O block.
Using the second technique, implemented
in the Xilinx reference designs,
you can directly capture read data in the
system (FPGA) clock domain by delaying
read data to meet the setup/hold time of
the flip-flops in the system (FPGA) clock
domain. A simple state machine is sufficient
to implement the center alignment
of the delayed read data with respect to the system (FPGA) clock after the initialization
period.
This “run time” adjustment after the
memory initialization sequence has significant
advantages over other methods that set
the required delay or phase shift during
“compile time.” The 64-tap absolute delay
element compensates for variations in
process, temperature, or voltage, and hence
increases the timing margins – resulting in a
more reliable system.
The read data is re-captured and stored
directly into the block RAM FIFO, a Virtex-4
feature that saves additional logic resources.
Conclusion
Virtex-4 architectural features enable you to
easily and reliably implement high-speed
memory interfaces. You can use the 64-tap, 80
ps absolute delay elements to capture read data
by either delaying the memory clock/strobe or
the data. Built in each I/O, the 64-tap absolute
delay elements provide you the flexibility to
select any I/O for memory interfaces. The
“run time” adjustment after memory initialization
improves design margins.
The input and output DDR registers
enable you to receive and transmit
clock/strobe and data at high frequencies; the
differential clocking resource provides higher
performance with better duty cycle and lower
global clock buffer utilization; and the block
RAM FIFO feature enables you to store
transmitted or received data without additional
logic resources.
For more information about the implementation
and design details of different memory interfaces in Virtex-4 devices, visit
the following websites:
- DDR2 SDRAM (XAPP 701 and XAPP702) and DDR SDRAM
(XAPP709): www.xilinx.com/products/design_resources/mem_corner/resource/xaw_dram_ddr.htm
- RLDRAM (XAPP710):
www.xilinx.com/products/design_resources/mem_corner/resource/rldram.htm
- QDR II SRAM (XAPP703):
www.xilinx.com/products/design_resources/mem_corner/resource/xaw_sram_qdr.htm
Printable PDF version of this article with graphics. (1/15/05) 335 KB |