|
Advances in process technology have lead
to dramatic increases in FPGA device
densities. Several Xilinx® VirtexTM. families
have devices exceeding 1 million system
gates. This increase in device density
and the use of 300 mm wafers have made
FPGAs affordable for volume production.
Designs that were once exclusively
targeted at ASICs are now being implemented
in programmable devices. The
largest 90 nm Virtex-4 device provides
more than 200,000 logic cells, 6 MB of
block RAM, and nearly 100 DSP
blocks. Creating a design to efficiently
utilize the available resources in these
devices and meet performance requirements
can be challenging. Fortunately,
todayfs EDA software tools have evolved
to meet these challenges.
Logic optimization, logic placement,
and minimized interconnection delays are
all important to achieve maximum performance.
Timing-driven synthesis technology
has provided a significant
improvement in design performance. The
limiting factor to the effectiveness of timing-driven synthesis is the accuracy of estimating
routing delays.
Physical synthesis — the use of physical
placement and routing information during
synthesis — has been at the forefront to
effectively address these issues. Physical
synthesis and optimization further
expands on this technology by involving
synthesis in implementation decisions
after the netlist is generated. This allows
for dynamic re-examination of synthesis
mapping and packing decisions based on
actual placement and routing information
during implementation.
Benefits of Physical
Synthesis and Optimization
Interconnection delays between logic levels
are affected by the proximity of placement
for the logic elements, routing
congestion, and local competition between
nets for the fastest routing resources. The
answer to this problem is to revisit synthesis
decisions during mapping, placement,
and routing. During the mapping phase,
the netlist can be reoptimized, packed, and
placed based on the urgency of individual
timing paths. This approach reduces the
number of implementation cycles required
for timing closure.
Physical Synthesis and
Optimization Flows
Xilinx ISE software provides several software
options to enable physical synthesis
and optimization. You can use these
options individually or together, depending
on the specific needs of your design.
Define Timing Requirements
The most important step for effective
physical synthesis is to set up accurate,
comprehensive timing constraints. With
these constraints in place, the implementation
tools can make more informed
decisions that will improve your overall results. Constrain the clocks and I/O pins
that have firm requirements to allow the
rest of the design to be relaxed.
The easiest way to define these timing
constraints is to use the Constraints
Editor. This graphical tool allows you to
enter clock frequencies, multi-cycle and
false path constraints, I/O timing requirements,
and a host of other clarifying
requirements. Constraints are written to a
user constraint file (UCF), which may
also be edited in any text editor.
If user-defined timing constraints are
not provided, a new feature in ISE. 8.1i
software will automatically generate timing
constraints for each internal clock. In
Performance Evaluation Mode (PEM),
you can get the high-performance results
of physical synthesis and optimization
without having to provide timing targets.
Run Global Optimization
For designs containing IP cores or other
netlists, the NGD file available after the
translate (NGDBuild) phase of implementation
represents the first time that
the entire design has been completely
assembled. Global optimization, a new
feature added to the 7.1.01i version of
Map, will take the fully assembled design
and attempt to improve design performance
by re-optimizing the combinatorial
and register logic. Global optimization
(map -global_opt on the command line)
has been shown to increase design clock
frequencies by an average of 7%.
Two other options let you further control
the optimization completed during
this phase: retiming (map -retiming) will
move registers forward and back to balance
combinatorial logic delays, and
equivalent register removal (map -equivalent_register_removal) will remove registers
with redundant functionality.
Enable Timing-Driven
Packing and Placement
Timing-driven packing and placement is
at the heart of the physical synthesis capabilities
available within the implementation
flow. When you enable this option
(map -timing), the placement phase of
place and route is done within Map,
allowing packing decisions to be revisited
when initial results are less than optimal.
This iterative flow does away with unrelated
logic packing.
Different levels of optimization exist
in Xilinx physical synthesis and optimization.
The first level was introduced
in ISE 6.1i software and began with logic
transformations, including fanout control,
logic replication, congestion control,
and improved delay estimation.
These routines led to much more efficient
packing and placement of designs,
resulting in faster clock frequencies and
denser logic utilization.
The next level added logic and register
optimization; Map can now rearrange elements
to improve critical path delays.
These transformations give much greater
flexibility to meet the timing requirements
of the design. A number of different
techniques (including pin swapping,
basic element switching, and logic recombination)
are used to massage the physical
elements into a different yet logically
identical structure that will meet the
design requirements.
ISE 8.1i software introduces one more
level of physical synthesis . combinatorial
logic optimization. The -logic_opt switch
enables a flow that examines all of the
combinatorial logic in the design. Given
placement and timing information, you
can make more informed decisions about
optimizing LUT structures to improve the
overall design.
Examples of Physical Synthesis and Optimization
- Logic Duplication: If a LUT or flip-flop
drives multiple loads, and the placement of
one or more of those loads is too far away
from the source to meet timing, the LUT
or flip-flop can be replicated and placed
close to that group of loads, thus reducing
routing delays (Figure 1)
- Logic Recombination: If the critical path
traverses through multiple LUTs through
multiple slices, the logic can be reassembled
utilizing fewer slices by using a more
timing efficient combination of LUTs and
muxes to reduce the routing resources
needed for that path (Figure 2)
- Basic Element Switching: If a function is
built with LUTs and muxes within a slice,
physical synthesis and optimization can
rearrange the function to give the fastest
path (usually through the mux select pin)
to the most critical signal (Figure 3)
- Pin Swapping: Each input pin of a LUT
may have a different delay, so Map has the
ability to swap pins (and the associated
LUT equation) so that the most critical
signal is placed on the fastest pin (Figure 4)
Conclusion
The physical synthesis and optimization
capabilities within the Xilinx toolset will continue
to mature and expand with each software
release. Along with improved quality of
results, you can expect to see greater control
over the types of optimizations. Other
planned enhancements include the consideration
of more design elements in the reoptimization
phase (such as registers allowing
movement into and out of the I/O blocks or
dedicated functions like block RAM and
DSP blocks) and the inclusion of the routing
phase into the reiterative physical synthesis
and optimization system.
The physical synthesis and optimization
tools in Xilinx ISE software have been created
to re-examine the structure of your FPGA
design during the packing and placement
phases of implementation. With the knowledge
of timing constraints and physical layout,
optimizing synthesis decisions during
map and place and route can significantly
improve your results.
Printable PDF version of this article with graphics. (12/1/05) 280 KB
|