Summary

Virtex™-E devices provide dedicated on-chip differential receivers between adjacent user I/O pins, which are ideal for receiving LVDS signals at speeds of up to 622 Mb/s in the –7 speed grade. This application note describes how to design a high-speed, low-voltage differential signaling (LVDS) transmitter and receiver in a Virtex-E FPGA suitable for point-to-point data transmission at a data rate of 622 Mb/s.

Introduction

LVDS is a leading standard for differential signaling between boards, chassis, and peripherals. For the first time, FPGAs are able to receive and drive data between boards at speeds of 622 Mb/s with no external buffering. The reference designs described in this application note implement a complete point-to-point link using LVDS at 622 Mb/s per data channel.

For an introduction to LVDS techniques, refer to the following application notes. These application notes provide a starting point for designers new to LVDS.

- XAPP230, The LVDS I/O Standard, describes the basic signaling levels and requirements for LVDS signaling.

The reference design for the LVDS 622 Mb/s receiver uses two data channels and one clock channel. The system relies on double data rate (DDR) clocking: new data is present on every transition of the clock signal. Clock and data lines have identical bandwidth requirements, making this approach attractive for high-speed systems. This LVDS receiver design extends the philosophy of source-synchronous signaling beyond the board level and onto the FPGA itself. Source-synchronous signaling is a technique used to drive clock and data from a single device and forward the clock along with the data to the destination. Clock and data propagate along adjacent paths with matched propagation delays.
Figure 1 shows a complete LVDS link, with two data channels running at 622 Mb/s and one clock channel running at 311 MHz. The transmitters on the left use both edges of the clock to transmit data driven by a clock multiplexer onto the LVDS channel. Clock and data delays are well matched since identical multiplexers generate both clock and data. Clock and data pass through the Virtex-E LVDS driver circuitry and off chip.

The source termination network adjusts the levels to be fully LVDS compliant and also provides source termination of the transmission lines to 50Ω, attenuating any reflected signals arrives back at the driver. The transmission lines can be microstrip or stripline at 50Ω to ground, or twisted pair with 100Ω differential impedance. The parallel terminator generally consists of a bank of 100Ω resistors, one resistor across each differential pair. For maximum timing margin at 622 Mb/s, the clock should be delayed by 1.0 ns relative to the data, using either additional trace delay or a driver with well-characterized propagation delay.

The signals are received by the differential LVDS receivers and pass to flip-flops that sample the data on the rising and falling edge of the forwarded 311 MHz clock. Data can be further demultiplexed down to 78 MHz as shown later in this application note.
Figure 2 shows the physical structure of a single data line passing from one Virtex-E device to another. The internal structure of the termination packs is also shown.

Figure 2: Virtex-E LVDS Line Driver and Receiver

Figure 3 shows an LVDS transmitter connected to an LVDS receiver on separate Virtex-E FPGAs.

The receiver accepts two data channels at 622 Mb/s each and one clock channel at 311 MHz. To equalize the clock’s propagation delay to the two data channels, the clock channel is located between the two data channels. Centering the clock signal between the data channels provides equal distance to both sampling channels in the FPGA. Figure 4 and Figure 5 show the transmitter and receiver block symbols, respectively, as they appear in the reference design. Figure 4 includes the time control symbols and attributes used to guarantee 622 Mb/s operation.

Figure 3: LVDS System Diagram
The LVDS Transmitter design accepts a 16-bit word at 78 MHz and transmits two 622 Mb/s-double-data-rate streams and a 311 MHz clock signal. The floorplanned portion of the transmitter design occupies a three-row by seven-column CLB footprint placed immediately adjacent to the LVDS pin pairs with direct routing to the output stage CLBs. Adhere to this pattern to maintain the timing balance between the clock and data pairs.

The transmitter design, Figure 6, is comprised of four parallel-in/serial-out shift registers feeding two double-data-rate registers. A third double-data-rate register with inputs tied to logic "1" and "0" provides the LVDS source clock signal. The load generator (LOAD_GEN) module provides load pulses to the shift registers for the rising and falling edges of the 311 MHz global clock. The input data bits are distributed to the shift registers such that word alignment is achieved in the receiver with a simple barrel shifter.
A clock input circuit is detailed in Figure 7. A DLL produces the 311 MHz global clock and divides the input clock by four to produce a 78 MHz clock for the 16-bit input data. Although the Virtex-E High-Performance I/O Demonstration Board (used for testing this reference design) uses a common LVPECL clock, the designer may use any clock source.
The parallel-in/serial-out shift register (PISO) shown in Figure 9 accepts a 4-bit word when a load signal is asserted during the active edge of the clock. Subsequent clock pulses shift the bits to the output. Figure 8 shows the load/shift timing.

Input data must be loaded into the shift registers with 1.6 ns (622 MHz) precision. LOAD_GEN accomplishes this by using both edges of the 311 MHz clock and a transparent latch as shown in Figure 9. Note that the 78 MHz and the 311 MHz clocks must be phase-aligned (the rising edge of both must occur simultaneously) to guarantee this precision. Use of the DLL as shown in Figure 7 accomplishes this.
Figure 9: PISO and LOAD_GEN

Figure 10: DDRFD
The DDRFD register, Figure 10, forms the output stage of the transmitter. The rising and falling edge data are registered on the appropriate clock edges, while the multiplexer (driven by the regenerated clock signal, QSEL) selects the appropriate line to present as the output signal.

The circuit that generates QSEL (the regenerated clock) drives the output multiplexer. It divides the clock into two half-frequency signals 90 degrees apart and applies them to the input of an exclusive OR-gate. The output of the exclusive OR-gate is a close approximation of the global clock. An FMAP is used to guarantee the combination of the exclusive OR-gate with the multiplexer into a single LUT.

A key feature of the clock regeneration circuit is that it uses two transparent latches to divide the clock by two. Latches are used because of their superior speed and their ability to initialize with a predictable phase relationship to the clock.

Data is received at 622 Mb/s from the two data pins into the high-speed receiver (HSRX) module. It is demultiplexed to 78 MHz and written, 16-bits wide, into a block RAM where it is allowed to cross a clock domain. It is read out 16-bits wide in bursts at any speed greater than 78 MHz. Figure 11 shows the internal elements of the LVDS receiver, including the two-channel demultiplexer and the block RAM buffer.

In Figure 12, double-data-rate registers (DDREG) reduce each data channel's (D1 and D2) data stream to two parallel single-data-rate streams. Subsequent DDREG pairs clocked at 155 MHz treat the two single-data-rate streams as half-frequency, double-data-rate streams and further reduce them to four parallel streams. These streams are finally reduced by four DDREG modules to eight single-data-rate streams at 78 MHz. The sixteen resulting streams (from both channels) are finally registered by a single 78 MHz clock edge for presentation to the FIFO.

The incoming 311 MHz LVDS clock enters a two-phase clock divider (CKDIV) which provides two half-frequency signals 90 degrees apart. These signals are further divided by two CKDIV modules to provide four phases of the clock at 78 MHz. The differently-phased clock signals are routed to the double-data-rate registers in such a way as to clock them at the most stable portion of their respective input signals.
Figure 11: RX2BIT
Figure 12: HSRX
Receiver Timing Analysis

A timing analysis of the receiver design indicates that the optimal timing margin is achieved when the clock is delayed relative to the data by 1.0 ns. With the clock delayed 1.0 ns, the design can tolerate almost ± 500 ps of clock/data jitter. The floorplan file and constraints within the design also guarantee that the 155 MHz and 78 MHz clock signals have appropriate setup and hold times relative to their data.

Block RAM Usage

The HSRX design, shown in Figure 12, reduces the data rate to 77.78 (78) Megawords per second or one 16-bit word every 12.8 ns. The data is written into a block RAM that is used as part of a FIFO, see Figure 11. An 8-bit counter that is clocked by the 78 MHz word clock (WCLK) provides write addressing on the first port of the RAM, while a second 8-bit counter, clocked by the designer’s system clock (SCLK) provides read addressing on the second port. The second counter is enabled by the designer’s read enable (RE) signal.

The FIFO level is maintained by the logic inside of the FULL_MT module of Figure 11. The designer evaluates the BUFSTAT[3:0] bus to determine the level of the FIFO; since BUFSTAT is comprised of the upper 4 bits of an 8-bit value, a zero value indicates a FIFO within 16 words of being empty while a hex value of “F” indicates a FIFO within 16 words of being full. Figure 14 shows the detail of the FULL_MT module. ADSU4 operates as a subtractor comparing the upper 4 bits of the FIFO write address with the upper 4 bits of the FIFO read address. The 4-bit difference output serves as the BUFSTAT value. Since the FIFO read and write address counters operate in different clock domains, the BUFSTAT value is sampled twice and the samples are compared. The third and final sample is taken only when the two prior consecutive samples are identical. This ensures that the write address counter was stable when the sample was taken. The designer’s RE signal should be generated as a function of the fullness of the FIFO. The designer chooses an almost full value of hex “D” or less and an almost empty value of hex “2” or greater, and uses these values to generate the RE signal. Almost full and almost empty values are chosen by the designer to accommodate the speed and latency constraints of the designer’s system.

In asynchronous rate-matching applications the designer must read the FIFO in bursts at a frequency above 78 MHz (which is just fast enough to keep the FIFO from overflowing). For example, the designer’s system clock might run at 103 MHz. When the FIFO level gets high enough for BUFSTAT to reach the almost full value, about hex “D”, the designer asserts RE until BUFSTAT reaches the almost empty value, about hex “2”. Then, the designer deasserts RE until the FIFO is almost full again.
Reference Designs

The two-channel transmitter and receiver reference designs were developed and floorplanned in two XCV1000E-7-FG900 devices. The reference designs are available at: ftp://ftp.xilinx.com/pub/applications/xapp/xapp233.zip.

The ZIP file includes:

- source schematics and library files drawn in Workview Office™ ViewDraw for Windows 7.5.5
- extracted EDIF netlists (*.edn)
- floorplan files (top.fnf and top.mfp)
- implementation files (top.ncd).

Implementing the Reference Designs

To implement either reference design, use the following procedure:

1. Unpack the ZIP file.
2. Locate the top.edn file in either the \TRANSMITTER or \RECEIVER directory, depending on the design you want to implement.
3. In the Alliance Series™ 3.1i software “New Project” dialog box, use top.edn as the source file with the following settings:
   a. Set the part to XCV1000E-7-FG900.
   b. Set the floorplan file to top.fnf.
   c. Set the floorplan guide file to top.mfp.
4. Run the implementation using the default place and route settings.
5. Verify that the timing constraints are met.

Notes:

1. The floorplan files are essential to achieving 622 Mb/s operation.
2. The BitGen warnings in the receiver refer to unused outputs on the block RAM and counters; these are normal and can be ignored.
Design Implementation Notes

The transmitter and receiver modules are floorplanned to fit within a 3x7 CLB footprint which aligns to the pitch of the block RAM structure. In principle, it is possible to tile multiple LVDS dual-channel receiver modules, one for every block RAM, near the left or right edges of the Virtex-E device. Using this method, an XCV300E device can accommodate 24 LVDS channels (16 data, 8 clock) at 622 Mb/s. Adhere to the floorplans provided for the transmitter and receiver modules as direct routing to/from IOBs is especially critical at high data rates.

The floorplan provided in the reference design only constrains component placement in the critical section near the pins. The high-speed routing is automatically implemented by the Xilinx software. The reference designs were implemented using Alliance Series 3.1i software with the provided floorplan files and all timing constraints were met.

For board-level considerations, see XAPP230, The LVDS I/O Standard, and XAPP232, Virtex-E LVDS Drivers and Receivers: Interface Guidelines.

Simulation Results

Figure 2 shows the complete schematic of the Virtex-E LVDS line driver and receiver. When driving a Virtex-E LVDS line receiver, connect the LVDS_OUT node in Figure 2 to a Virtex-E input and the LVDS_OUT node to the complementary Virtex-E input of the true-differential input.

The SPICE simulation included parasitic package effects for the BG432 package, running bursts of alternating data to measure response time and multi-symbol interference.
Figure 15 shows the pulse and 622 Mb/s burst data response of the Virtex-E LVDS line driver circuit in Figure 2 driving a Virtex-E LVDS receiver in the BGA432 package with long (5 ns) transmission lines. Voltages are measured at the on-die differential input. Notice the differential reflections (LVDS_OUT - LVDS_OUTX on the second waveform down) are negligible, confirming that the matched source impedance of the Virtex-E LVDS driver absorbs nearly all differential reflections. The well-matched source impedance of the Virtex-E LVDS driver results in no undershoot or signal swing reduction when driving pulsed data at 622 Mb/s or clocks at 311 MHz, as seen on the LVDS_OUT - LVDS_OUTX graph at the bottom of Figure 16.

For the best LVDS signal quality, the Virtex-E LVDS driver will actually improve signal integrity over standard off-the-shelf LVDS drivers due to its matched source and destination terminations.

Figure 16 shows actual test results measured on a Virtex-E High-Performance I/O Demonstration Board. These results were measured using a high-speed oscilloscope sampling at 8 Gs/s. Residual ripple and inter-symbol interference is comparable in the test results and in the simulation. These traces are measured at the LVDS receiver after 11 inches of stripline trace.
Figure 16: Results Measured on Virtex-E High-Performance I/O Demo Board
Figure 17 shows similar results for 20 inches of twisted pair ribbon cable going between two PC boards. The clock pair runs adjacent to the data pair. Inter-channel and inter-symbol interference at 622 Mb/s are negligible.

![Figure 17: Results Between Two PC Boards](image-url)
Conclusion

Virtex-E devices transmit and receive LVDS at 622 Mb/s in a –7 speed grade. Reliable data transmission is possible over electrical lengths exceeding 5 ns (30 inches), limited only by cable attenuation due to skin effect. Virtex-E devices using LVDS reliably transfer high-speed data and clocks over long distances between boards, chassis, and peripherals.

Revision History

The following table shows the revision history for this document.

<table>
<thead>
<tr>
<th>Date</th>
<th>Version</th>
<th>Revision</th>
</tr>
</thead>
<tbody>
<tr>
<td>12/21/99</td>
<td>1.0</td>
<td>Initial Xilinx release.</td>
</tr>
<tr>
<td>7/30/00</td>
<td>1.1</td>
<td>Revised figures and technical information.</td>
</tr>
<tr>
<td>01/06/01</td>
<td>1.2</td>
<td>Updated Block RAM Usage, page 11, Figure 11 and Figure 14.</td>
</tr>
</tbody>
</table>