|
Synopsys® Design Compiler® FPGA (DC
FPGA) allows you to meet your high-performance
design goals by using a powerful
set of optimization algorithms and
features specifically tuned for the Xilinx®
Virtex-4™ architecture. These algorithms
use special Virtex-4 resources such as the
DSP48 block and block RAM to achieve
the lowest overall area utilization and the
optimal circuit timing performance.
Design Compiler FPGA Overview
Designs that target complex devices such
as Virtex-4 FPGAs require the same power
and flexibility in synthesis that only ASIC
designers had access to in the past. DC
FPGA is built on Design Compiler’s
industry-leading ASIC synthesis technology
and then customized to include FPGAspecific
optimizations to handle even the
most challenging designs. FPGA-specific
optimizations enable optimal mapping to
FPGA basic primitives such as LUTs and
complex components like RAM, multipliers,
and DSP blocks.
DC FPGA includes innovative Adaptive
Optimization™ (AO) technology to
dynamically tune the synthesis algorithms
based on the design context, as well as timing
constraints to provide faster synthesis
runtime and optimal timing. DC FPGA
inherits Design Compiler’s reliability –
proven through the development of more
than 125,000 ASIC designs. DC FPGA
brings the powerful ASIC-strength synthesis
of Design Compiler to FPGA designs.
In addition to AO technology, DC
FPGA deploys a rich set of optimizations to
achieve the best timing Quality of Results
(QoR) for FPGA devices. These include:
- Constraint-driven synthesis and design
space exploration
- Automatic finite state machine (FSM)
extraction and optimization
- Automatic inference of special FPGA
resources, such as RAM, ROM, multipliers,
DSP blocks, shift registers, and
global clock buffers
- Advanced datapath optimizations and
module generation
- Logic and register duplication
- Register retiming and pipelining
- Critical path re-synthesis
- Across-boundary optimization
- Automatic gated-clock transformation
DC FPGA is part of a family of products
from Synopsys that work in conjunction
with the Xilinx ISE™ tool to
streamline the FPGA design process.
In this article, we’ll show how DC
FPGA optimizes for high performance in
Xilinx Virtex-4 FPGAs.
Constraint-Driven Synthesis
DC FPGA uses a true timing-driven synthesis
engine. You can greatly influence the final
implementation choice by specifying appropriate
timing and design-specific constraints
during synthesis. Therefore, we recommend
that you drive DC FPGA synthesis with the
same set of constraints as the Xilinx ISE tool.
At a minimum, you should specify
appropriate design timing constraints such as clock frequency, I/O offsets, and any
timing exceptions applicable to your design
(such as multicycle and false paths). Any
other design-specific constraints – such as
controlling special FPGA resource usage –
could also be specified. For best performance,
your design should not be overconstrained,
which in some cases can lead
to unnecessary increases in area.
Without any timing constraints, DC
FPGA will perform area-based optimizations
with good timing results. With proper
timing constraints, DC FPGA applies
the AO technology to explore the areatiming
tradeoffs of various optimizations,
selecting the final implementation that best
fits your constraints.
For example, your timing goals enable
DC FPGA to decide whether distributed
RAM, block RAM, or a LUT with register-based
implementation is sufficient for an
inferred memory component in your
design. Otherwise, DC FPGA optimizes
for the lowest area utilization possible.
Table 1 shows two implementations for
a small sub-module with two different
clock constraints. The module is the critical
one for a larger design of about 8,600
slices. The design contains a single clock
domain with only one clock period constraint
specified in DC FPGA.
In the first case, the module is constrained
at 10 ns. DC FPGA exceeds the timing
requirement after its area-based implementation
and does not invoke the timing optimization
phase. The critical path of the
design runs through a series of carry logic.
In the second case, when a much tighter
constraint (3 ns) is applied, DC FPGA performs
aggressive timing optimizations and
replaces the carry logic on its critical paths
with parallel circuit structures built by LUTs.
This results in a design with a slightly larger
area but meets the new timing requirement, which was impossible to achieve with the
carry logic structure. At the overall design
level, a 29% timing improvement is achieved
with a minor area increase of 11 slices.
Flexible FSM Support
DC FPGA contains sophisticated FSM
extraction and optimization algorithms to
ensure optimum high-performance state
logic implementation. Once the FSM is
detected and extracted from the RTL code,
DC FPGA’s powerful state machine optimization
engine performs various optimization
schemes, such as optimizing
unreachable states or removing duplicate
states to produce the best logic implementation
to meet timing.
At the same time, you have the flexibility
to select a different FSM coding style
such as one-hot, binary, gray, and zero-one-hot
on a state-machine-by-state-machine
basis, design basis, and global basis. This
FSM encoding exploration flexibility allows
you to customize the synthesis script to
address design bottlenecks.
For an FPGA implementation,
one-hot state implementations typically
provide the best timing QoR
for most designs at the expense of a
higher register-to-LUT ratio. For
most designs this is not a problem
because of the register-rich architecture
of FPGA devices.
High-Performance DSP Inference Capability
The availability of special FPGA resources
such as block RAM, dedicated DSP slice,
and carry logic combined with your specified
design and timing constraints guides
DC FPGA’s specialized optimization algorithms
to determine the best optimum circuit
implementation.
DC FPGA is highly capable of inferring
complex circuit topology from your
design’s RTL coding structure, effectively
deciding the final implementation that best
exploits the resources of the targeted
FPGA. DC FPGA minimizes overall
resource usage while providing the best circuit
performance possible.
This powerful optimization feature allows
DC FPGA to effectively infer and map complex
logic configurations into special resources such as the Virtex-4 dedicated
DSP48 slice. To illustrate this powerful
feature, Figure 1 shows a simple multiply
accumulate (MAC) logic structure, where
A- and B-registered input signals are multiplied.
The registered multiplier intermediate
output is then accumulated in the
last adder stage, feeding the registered Q
output signal.
The RTL code for this simple MAC
function is:
module test ( Q, A, B, clk );
output [47:0] Q;
input [16:0] A, B;
input clk;
reg [47:0] Q;
reg [16:0] A_reg, B_reg;
reg [33:0] mult;
always @( posedge clk )
begin
A_reg <= A;
B_reg <= B;
mult <= A_reg * B_reg;
Q <= Q + mult;
end
endmodule
DC FPGA is able to effectively implement
the logic configuration shown in
Figure 1 in a single DSP48 slice, fully recognizing
and taking advantage of the DSP48’s
embedded 18 x18 signed multipliers, accumulated
adder mode, and integrated
pipeline registers to obtain the highest performance
system clock speed.
Figure 2 shows the final DC FPGA single
DSP48 implementation without the
use of other logic resources. The
OPMODE control input pin of the
DSP48 element is set to “0100101” to
realize the overall MAC functionality mode
intended by circuit topology, while the
AREG, BREG, MREG, and PREG attributes
are set to “1,” respectively, to signify a
single-stage register pipeline.
Furthermore, the high-performance
DSP inference feature in DC FPGA supports
very complex design topologies.
Such topologies are extensively used in
DSP-intensive applications such as a digital FIR filter, commonly found in wireless
communication applications.
Figure 3 shows the schematic of a four-tap
systolic FIR digital filter structure. DC
FPGA uses advanced DSP inference to
implement this design in only four DSP48
slices without the use of external logic
resources. The integrated pipeline registers
are further exploited for faster clock
throughput performance for this type of
filter structure.
The following shows the RTL code for
the systolic FIR filter:
module test ( Yn, Xn, h0, h1, h2, h3, clk );
output [47:0] Yn;
input [15:0] Xn, h0, h1, h2, h3;
input clk;
reg [15:0] X [7:1];
wire [15:0] h [3:0];
reg [32:0] mult [3:0];
reg [47:0] pcout [3:0];
wire [47:0] Yn;
integer i;
assign h[3] = h3, h[2] = h2, h[1] = h1, h[0] = h0;
always @( posedge clk )
begin
X[1] <= Xn;
mult[0] <= h[0] * X[1];
pcout[0] <= mult[0];
for (i=1; i <= 3; i=i+1)
begin: my_for_loop_block0
X[2*i] <= X[2*i-1];
X[2*i+1] <= X[2*i];
mult[i] <= h[i] * X[2*i+1];
pcout[i] <= pcout[i-1] + mult[i];
end //my_for_loop_block0
end
assign Yn = pcout[3];
endmodule
DC FPGA can also implement other
complex logic configurations in a DSP48
slice. Table 2 shows a sample of some of these
complex logic structures.
The designs shown in Table 2 were
synthesized using DC FPGA and place
and routed using Xilinx ISE 6.3i Service
Pack 2, while targeting an XC4VFX20-11 Virtex-4 device. The purpose of this
exercise is to show the performance and
area improvements performed by DC
FPGA’s advanced DSP inference capability.
Each design was synthesized with
and without DSP inference enabled during
synthesis.
Conclusion
Complex devices such as Virtex-4 require a
flexible ASIC-strength synthesis solution.
The advanced optimization engine in
Synopsys Design Compiler FPGA efficiently
utilizes the special resources available in
Virtex-4 devices to provide the highest performance
design possible.
DC FPGA gives you the freedom to
modify synthesis scripts to address
design bottlenecks, implement different
FSM encoding styles, or to explore other
design optimizations to reach your design
goals. Now you have access to the power
and flexibility of Design Compiler to
implement your complex FPGA designs.
DC FPGA is an integral part of the
complete ASIC-strength prototyping
solution from Synopsys. Other tools supported
in the Xilinx flow are Formality™
for formal verification, DesignWare®
Library IP, Leda® for RTL design and
code checking, PrimeTime® for static
timing analysis, VCS® for simulation,
Module Compiler™ for datapath synthesis,
and HSPICE™ for analysis of multigigabit
serial I/Os.
DC FPGA has a rapidly growing base of
more than 100 customers. For more information
about Design Compiler FPGA,
visit www.synopsys.com/products/dcfpga/dcfpga.html.
Printable PDF version of this article with graphics. (1/15/05) 340 KB |