|
All low-cost FPGAs provide basic logic
capability at attractive prices and serve a
broad range of general-purpose design
requirements. When you consider embedding
DSP functions in an FPGA fabric,
however, you may believe that you must
choose high-end FPGAs to get platform
features such as embedded multipliers and
distributed memory.
With Spartan-3™ FPGAs, the landscape
for embedded DSP has changed.
Spartan-3 devices may be low cost, but
they also have the platform features
required for DSP designs. These platform
features allow area-efficient implementation
of signal processing functions
– allowing you to realize significantly
lower price points.
Spartan-3 devices are ideal as
coprocessors or pre-/post-processors,
offloading highly computational functions
from a programmable DSP to
enhance system performance.
Optimized for DSP
The Spartan-3 family from Xilinx uses 90
nm process technology in conjunction with
300 mm wafers to dramatically lower the
cost of FPGAs. At the same time, the
devices incorporate key DSP resources such
as embedded 18 x 18-bit multipliers and
large blocks (18 kb) of memory, distributed
RAM, and shift-register logic. This
advanced feature set means that you can
use Spartan-3 FPGAs to implement DSP
algorithms at a significantly lower cost than
competing FPGAs. The specific features
that help in efficiently implementing DSP
are shown in Figure 1.
In addition to increasing the basic performance
of systems, these embedded features
enhance device utilization. For
instance, the embedded Spartan-3 multiplier
would take 300-400 logic elements (LEs) if
implemented in the logic fabric. And because
the embedded multiplier is adjacent to logic
fabric, augmenting the functionality (such as
creating accumulators or concatenating the
multipliers to create complex arithmetic
functions) is fairly straightforward.
Many DSP functions are best implemented
in pipelines with time multiplexing
for efficiency. This allows you to create
faster systems with higher bandwidth, but
it comes at the expense of requiring more
interim storage elements. For example, a
time-multiplexed filter would store the
results of individual multiply-accumulate
cells in shift registers. Such designs can run
out of registers or memory before they run
out of logic resources. The Spartan-3
FPGA family is unique in providing a
mode where a single look-up table (LUT)
is capable of implementing logic functions
or acting as a 16-bit shift register.
As shown in Figure 2, this architecture
enhancement allows you to use a single
LUT in place of 16 registers – maximizing
area efficiency when implementing time-multiplexed
DSP functions.
Many DSP functions are also extremely
memory-intensive – requiring scratch-pad
memory for storing coefficients, implementing
FIFOs, and large buffers. As
shown in Figure 3, Spartan-3 devices provide
more memory bits than other low-cost
FPGAs available today.
For many DSP designs, the critical
resource is the embedded memory within
the FPGA – not logic or multipliers.
Because of insufficient memory, designers
using competing low-cost devices may have
to migrate to a larger device or use external
memory for systems that would fit into a
single, small Spartan-3 FPGA.
Common DSP Functions
Let’s see how these features impact device
utilization by looking at two implementation
examples of a finite impulse response
(FIR) filter. One is a MAC-based implementation,
while the other is a multichannel
distributed arithmetic (DA)
implementation.
FIR filters are commonly used in base
stations, digital video, wireless LANs,
xDSL, and cable modems. Our benchmark
is the implementation of a 64-tap, MAC
FIR filter with 16-bit data and coefficients
running at 130 MHz in a Spartan-3
XC3S400 FPGA. The first implementation
uses a single MAC; the second implementation
uses four MACs. Figure 4 shows the
device utilization section of the report file
for both implementations.
Going from a one-MAC to a four-MAC
implementation dramatically increases the
performance of the FIR filter. The number of
LUTs only doubles and remains at just 4% of
the total available logic. A four-MAC implementation
uses four block RAMs and four
multipliers to efficiently implement the FIR
filter using minimum device logic resources.
Another interesting implementation is
that of a multi-channel FIR function. In
this case we can look at how the device utilization
changes when we go from a onechannel
FIR to an eight-channel FIR filter.
As shown in Figure 5, a single channel
distributed arithmetic FIR filter uses 29%
of the logic resources and 39% of the registers
of a XC3S1000 Spartan-3 device.
When implementing an eight-channel version
of the same filter, we would normally
time multiplex the different channels to
conserve logic. But this would use a lot of
registers, or a significant amount of on-chip
memory to store the intermediate results.
With Spartan-3 FPGAs, the intermediate
results are stored in LUTs configured as
16-bit shift registers (SRL-16). This allows
the eight-channel version of the same filter
to be implemented using only 10% more
of the available logic and only 7% more of
the available registers – 8x more channels
for only 25% more device resources (see
Figure 6).
This dramatic savings is directly related
to the use of the SRL-16s available in the
Spartan-3 device. In the report file, you
can see that an additional 1,343 LUTs are
used in the SRL-16 mode for the eightchannel
implementation.
Implementing this design in an FPGA
without SRL16 capability would require
an additional 10,744 (1343 x 8) flip-flops
used as storage elements, demanding a
massive device for the register count and
likely squandering the associated combinatorial
logic resources.
Conclusion
The Spartan-3 architecture is optimized
to give you very high area efficiency when
implementing signal processing functions.
By combining these DSP-friendly
system features with low unit costs,
Spartan-3 FPGAs enable the industry’s
lowest price points for high-performance
DSP functions. This allows a Spartan-3
device to act as a low cost but highly efficient
and high-performance co-processor
to a programmable DSP processor.
Printable PDF version of this article with graphics. (1/15/05) 305 KB |