|
We've all heard the phrase "timing is
everything," and this is certainly the case
for the majority of digital outputs on
modern FPGAs. Timing-calculation
errors of 10 or 20 percent were fine at 20
MHz, but at 200 MHz and above, they're
absolutely unacceptable.
As Xilinx Senior Field Applications
Engineer Jerry Chuang points out, "The
toughest case usually is a memory or
processor bus interface. Most designers
know that they have to account for Tco
(clock-to-output) as it relates to flight
time, but don't really know how."
Another signal integrity engineering
manager who preferred to remain anonymous
explains, "We've got lots of things
that hang on the hairy edge of working.
That's one of the reasons why they give
you so many knobs to turn on newer
memory interfaces."
To complicate matters, manufacturer
datasheets and application notes use
multiple, often-conflicting definitions of
many of the variables and procedures
involved, requiring you to investigate the
conventions used by manufacturer A versus
manufacturer B. Most of the recently
published signal integrity books either
gloss over the subject or avoid it altogether.
We hope that this article will
serve to blow away some of the fog and
reinforce some standard definitions.
System Timing for Synchronous Signals
An FPGA team will typically place and
route an FPGA according to their specific
timing requirements, leaving system-level
timing issues to be negotiated later with the
system-design team. With the sub-nanosecond
timing margins associated with many
signals, it's common for the system side to
be faced with PCB floor-planning changes,
part rotation, and sometimes the need to
negotiate pin swaps with the FPGA team to
accommodate timing goals. Proactive, prelayout
timing analysis and some careful
accounting can keep both the FPGA and
system teams from spending a month or
more chasing timing problems.
Two classes of signals pose problems for
FPGA designers and their downstream counterparts
at the system level: timing-sensitive
synchronous signals and asynchronous,
multi-gigabit serial I/Os. We'll concentrate on
parallel, synchronous designs in this article.
Margins
The system-timing spreadsheet for synchronous
designs is based on two "classic"
timing equations:
Tco_test(Max) + Jitter + TFlight(Max) + TSetup < TCycle
Tco_test(Min) + TFlight(Min) > THold
Or, once Tco_test is corrected, becoming
Tco_sys, as outlined in this article:
Tco_sys(Max) + Jitter + Tpcb_delay(Max) + TSetup < TCycle
Tco_sys(Min) + Tpcb_delay(Min) > THold
Each net's timing is initially set up with
a small, positive timing margin. This margin
is allocated to the TFlight(Max) and
TFlight(Min) values (or Tpcb_delay[Max]
and Tpcb_delay[Min], respectively) in the
preceding equations; these are timing contributions
of the PCB interconnect between
each net's driver and receivers.
If there is insufficient margin left to
design the interconnects, either the silicon
numbers need to be retargeted and
redesigned, or the system speed must be
slowed. Figure 1 shows how timing margins
shrink relative to frequency.
There are two ways to come up with the
interconnect values for the timing spreadsheet. Some signal
integrity tools automatically
make calculations
that produce a
single "flight-time"
value. However, especially
for designers just
learning about the
timing challenges of
high-speed systems, a
two-step approach is
more instructive. First,
you learn how to correct
a datasheet's driver
Tco value to match the behavior in your real
system; second, you add the additional delay
between the driver and each of its receivers.
Data Book Values
Initially, timing spreadsheets are populated
with values from the silicon vendor's data
book. You'll need first-order estimates from
silicon designers on the values of Tco and
setup and hold times for each system component.
You can usually obtain this data
from the component datasheet.
Test and Simulation Reference Loads
To arrive at the datasheet value for your
drivers' Tco, standard simulation test loads
(or reference loads) provide an artificial
interface between the silicon designer and
the system designer.
You'd prefer, of course, to have Tco specified
into the actual transmission-line
impedance you're driving on your PCB, but
the silicon provider has no way of knowing
what
that will be. Knowing what loading the
vendor assumed when publishing Tco is critical
so that you can adjust for the difference
between that load and your real one.
The Recipe for a Problem
As shown in Figure 2, if the reference load is
significantly different from the actual load
that the output buffer will see in your
design, the sum of the datasheet and PCBinterconnect
timing values will not represent
actual system timing. Actual or total
delay may be represented as:
Total Delay = Tco_sys + Tpcb_delay /= Tco_test + Tpcb_delay
where Tpcb_delay is the extra interconnect
delay between the time at which the
driver switches high or low until a given
receiver switches.
Note that this "PCB delay" is not just
the time it takes for a signal to travel along
the trace (sometimes called "copper delay" or "propagation delay"). Here, Tpcb_delay
accounts for effects such as ringing at the
receiver, as shown in Figure 3. Its value could
(on a poorly terminated net) easily be longer
than the simple copper delay.
Calculating accurate timing involves
more than finding Tpcb_delay. If the difference
between Tco_sys and Tco_test is
significant — even in the neighborhood of
100 ps — your board may not function
properly if you don't account for the difference.
But because Tco_test is a value created
with an assumed test load, it almost
never matches Tco_sys, the clock-to-output
delay you'll see in your actual system.
For example, Lee Ritchey, author of "Get
it Right the First Time" and founder of the
consulting firm Speeding Edge, was hired to
resolve a timing problem on a 200 MHz
memory system. After digging into the
design, he found that unadjusted datasheet
values were used, based on Tco values that
were measured on a 50 pF load rather than
something resembling the design's 50 Ohm
transmission-line load. As a result, this
improper accounting "threw timing off by
just over one nanosecond," he says. "That's
20 percent of the total timing budget, a
major error."
In the following sections, we'll see how
you can correct Tco_test to become Tco_sys,
avoiding this type of error altogether.
The Process
Measuring Tco_test
To measure Tco_test, you need to set up a
simulation with just the driver model and
the datasheet test load. Though they're an
optional sub-parameter in the IBIS specification,
most IBIS models (including Xilinx
IBIS models) contain a record of the test
load (Cref, Rref, Vref ) and the measurement
voltage (Vmeas) to use with these values.
Figure 4 shows these values for the
LVTTL8F buffer in the Virtex-II Pro IBIS
model, as well as a generic reference load diagram
taken from the IBIS specification.
Once you've gathered these load values
from the IBIS model, you simulate rising
and falling edges, and for each, measure
the time from the beginning of switching
until the driver pin crosses the Vmeas
threshold. These are the Tco_test values.
Obtaining "Tcomp," the
Timing-Correction Value
Now you need to calculate a compensation
value, Tcomp, that will convert the datasheet
Tco value into the actual Tco you'll see in
your system. Tcomp is the delay between the
time the driving signal, probed at the output,
crosses Vmeas into the silicon manufacturer's
standard reference load, and the time it
crosses Vmeas for your actual system load.
Tcomp is then used as a modification to the
Tco value from the vendor datasheet, as
shown in Figure 5.
The revised computation of actual delay
from the previous equation is then:
Total Delay = Tco_sys + Tpcb_delay
= (Tco_test + Tcomp) + Tpcb_delay
Note that Tcomp may be negative or positive,
depending on whether the actual load
in your system is smaller or larger than the
standard test load. Traditionally, silicon vendors
used capacitive test loads (like 35 pF) to
measure Tco; almost all real PCB transmission
lines do not present as heavy a load, so
Tcomp is usually negative in this situation.
Xilinx, for its current generation of
FPGAs, uses a 0 pF test load for output
driver wave shape accuracy. Real transmission
lines will represent a different load —
some mixture of inductance, capacitance,
and resistance. Because the transmissionline
load is heavier than a 0 pF "open load,"
Tcomp will be positive. Simulation is the
only way to accurately predict the exact
value of Tcomp.
Simulating Tpcb_delay
At this point in the process, you've completed
the first step in finding accurate delays for
your timing spreadsheet, and you've compensated
the datasheet Tco to match your real
system load. Next, you need to determine
Tpcb_delay, the additional delay caused by
the interconnect from driver to receiver.
A signal integrity simulator is the only
way to accurately do this, because only a
simulator can account for subtle effects like
reflections, receiver input capacitance, line
loss, and so forth.
From here, we'll explore some detailed
examples based on Xilinx-provided IBIS
models ¨C the process of calculating Tcomp
and then using the HyperLynx simulator
to determine an interconnect's Tpcb_delay
through pre-layout topology analysis. You
could enter the values that we come up with
directly into your system-timing spreadsheet.
The process using Mentor Graphics'
HyperLynx product is straightforward. You
look up the manufacturer's test load in the
IBIS model (see Figure 4), enter it in the
LineSim schematic, set up your actual interconnect
topology just below the reference
load, and begin a simulation, probing at
both drivers so that you can measure Tcomp
and Tpcb_delay, as shown in Figure 6.
Running the Numbers on a Real Problem
An important design for an electronic equipment
manufacturer had a Xilinx FPGA talking
to a bank of SRAMs at 125 MHz,
meaning the cycle time (Tcycle) was 8 ns. The Xilinx datasheet specified Tco as 4 ns (i.e.,
Tco_test). The SRAM's setup time was 2 ns.
Some of the traces connecting the FPGA
to an SRAM were six inches long; a signal
integrity simulation showed a worst-case
maximum PCB delay (to the receiver's "far"
threshold) of 2.5 ns. This yielded in the
design's timing spreadsheet a total time of
4 + 2.5 + 2 = 8.5 ns (Tco_test + Tpcb_delay
+ Tsetup), violating the 8 ns cycle time.
However, the Tco value, when corrected
for the actual design load, was 4-1.2 = 2.8 ns
(Tco_sys = Tco_test + Tcomp), meaning
that the actual total delay value was
2.8 + 2.5 + 2 = 7.3 ns (Tco_sys + Tpcb_delay
+ Tsetup), leaving an acceptable timing
margin of 700 ps.
Note that in this calculation, we measured
to the time at which the receiver signal
crossed the farthest-away threshold to get
the worst-case, longest possible Tpcb_delay.
For a rising edge, we measured to the last
crossing of Vih; for a falling edge, to the last
crossing of Vil.
Conclusion
For seamless interaction between the FPGA
designer and the system designer, it's prudent
to do as much pre-layout, "what-if " analysis
as possible. And, though not covered explicitly
in this article, you can also verify that your
laid-out printed circuit boards meet your
timing requirements using a post-layout simulator
with batch analysis capabilities.
Some Mentor products that perform this
type of analysis are HyperLynx, ICX, and
XTK. Running these simulations, you're revising
simulated representations of interconnect
circuits in minutes as compared to the weeks
required to spin actual PCB prototypes.
The new HyperLynx Tco simulator is
available on Mentor Graphics' website,
www.mentor.com/hyperlynx/tco/. Included with
the Tco simulator are the Virtex-II Pro,
Virtex-II, and Spartan IBIS models;
boilerplate schematics that will help you make
adjustments to data book Tco values; and a
detailed tutorial on Tco and flight-time correction
that parallels this article.
| What is "Flight Time"? |
| In this article, we've shown conceptually
how Tco values specified into a
silicon vendor's test load can be corrected
on a per-net basis to give the
actual clock-to-output (Tco) timing
you'll see on your PCB, and then
added to the additional trace delays
between drivers and receivers to give
accurate timing values. However, signal
integrity (SI) tools actually deal
with corrected timing values in a different
(but equal) way.
The most convenient output
from an SI tool is a single number —
called "flight time" — shown in
Figure 5 as (Total Delay – Tco_test)
or (Tpcb_delay – Tcomp). You can
add this value to the standard data
book Tco values in your timing spreadsheet
to give the same effect as the twostep
process described in this article.
When an SI tool calculates timing
values, it 1) simulates each driver model
into the vendor's test load, measures the
time for the output to cross the Vmeas
threshold, and stores the value
(Tco_test); 2) simulates the actual nets
in the design and measures the time at
which each receiver switches (Total
Delay); and 3) for each receiver, subtracts
the driver-switching-into-test-load
time from the receiver time (Total Delay
– Tco_test). The resulting flight time is
a single number that can be added to
each net's row in a timing spreadsheet,
and that both compensates Tco_test for
actual system loading and accounts for
the interconnect delay between driver
and receiver.
The term "flight time" is somewhat
unfortunate, although it's
become the industry standard. The
name suggests the total propagation
delay between driver and receiver, but
the value calculated is actually the
delay derated to compensate for the
reference load. For old-style capacitive
reference loads (e.g., 50 pF),
flight time can even be negative. |
Printable PDF version of this article with graphics. (3/25/04) 300 KB |