|
FPGAs have been used in DSP applications
for years as logic aggregators, bus bridges,
and peripherals. More recently, FPGAs
have gained considerable traction in highperformance
DSP applications and have
also emerged as ideal co-processors for
standard DSP devices.
In these latter roles, FPGAs provide
tremendous computational throughput by
using highly parallel architectures. Because
the hardware is re-configurable, you can
develop customized architectures for ideal
implementation of your algorithms.
The new generation of Spartan-3™
low-cost FPGAs, developed using 90 nm
process technology, not only creates an
effective way to implement high-performance
DSP functions but provides an even
more economical solution. Their low cost
means that you can use them to implement
high-performance DSP co-processing
functions in conjunction with a conventional
DSP device – typically integrating
pre- and post-processing functions in a
cost-effective manner.
Key Advantages
FPGA architectures are well suited for
highly parallel implementations of DSP
functions, allowing for very high performance.
And user programmability allows you
to trade off device area versus performance
by selecting the appropriate level of parallelism
to implement your functions.
FPGAs are essentially arrays of uncommitted
logic and signal processing
resources. These signal processing
resources allow you to implement DSP
functions using highly scalable, parallel
processing techniques.
For example, whereas a traditional
DSP solution would implement multiple
multiply accumulate (MAC) functions in
a serial manner, an FPGA allows you to
implement these in parallel using dedicated
multipliers and registers that are now
available in the Spartan-3 family.
As another example, consider a 256-tap
finite impulse response (FIR) filter. By
using resources available in the FPGA fabric,
you can design a highly parallel implementation
and achieve higher performance (Figure 1).
Because FPGAs are completely hardware-
configurable, you have the flexibility
to only use the necessary resources that the
algorithm demands.
Figure 2 shows the different ways of
implementing four MAC functions. By
using four embedded multipliers within
the FPGA fabric, you can complete these
implementations at maximum speed.
Alternatively, you can opt to conserve area
and implement the same function at a
lower performance by using only one multiplier,
one accumulator, and a register, or
use the semi-parallel approach.
Although FPGAs bring significant
benefits to DSP, it is important to analyze
the effective cost of implementing DSP
functions within the FPGA fabric. For
the purpose of this analysis, the new
Spartan-3 FPGA family is considered
because of its low cost and system features
for DSP.
Spartan-3 Devices: Optimized for DSP
Spartan-3 FPGAs use 90 nm manufacturing
technology to achieve low silicon die
costs. These devices are also the only lowcost
FPGAs that have all of the features
required for efficiently implementing
DSP functions – features that were once
the exclusive domain of high-end FPGAs
(Table 1).
Table 1 – These Spartan-3 features enable DSP functions in an area-efficient manner.
| Spartan-3 Silicon Features | Customer Benefits |
| Embedded 18 x 18 Multipliers | Area-efficient implementation of multiply function |
| Distributed RAM | Local storage for DSP coefficients, small FIFOs |
| Shift Register Logic | 16-bit shift register ideal for capturing high-speed or burst mode
data and to store data in DSP applications |
| Up to 104 18 Kb Block RAM | Video line buffers, cache tag memory, scratch-pad
memory, packet buffers, large FIFOs |
With the Spartan-3 family, you can
implement high-performance, complex
DSP functions in a small portion of the
total device, leaving the rest of the device
free to implement system logic or interfacing functions – providing both lower costs
and higher system integration.
Table 2 demonstrates how the combination
of advanced features and low cost
work together to provide DSP capability
at a low cost. The table shows a sampling
of available Spartan-3 parts, the number
of million multiply accumulate per second
(MMAC/s), and the cost for
MMAC/s in each device.
Table 2 –Calculating the cost per MMAC/s
| Device | Embedded Mults (18 x 18) | MMAC/second (Number of Mults x 150 MHz) | Cost for MMAC/s |
| XC3S50 | 4 | 600 | $0.0055 |
| XC3S200 | 12 | 1,800 | $0.0024 |
| XC3S400 | 16 | 2,400 | $0.0030 |
| XC3S1000 | 24 | 3,600 | $0.0037 |
| XC3S1500 | 32 | 4,800 | $0.0044 |
We calculated the MMAC/s
column by multiplying the number
of multipliers with their operating
frequency, which for
Spartan-3 FPGAs is 150 MHz in
the slowest speed grade.
Then, looking at the published
50,000-unit price for the slowest
speed grade of the appropriate
device, we calculated the cost for
MMAC/s. This is one of the quoted
industry benchmarks, with the
cost per MMAC/s reaching a
quarter of a cent.
How to Achieve the Lowest DSP Function Cost
No standard currently exists to estimate the
actual cost of implementing DSP functions
onto FPGAs. For the purposes of this
analysis, however, let’s theorize that the
effective cost is the cost based on percentage
of silicon area utilized, multiplied by
the unit device cost. This is a fair calculation,
since the remainder of the FPGA is
available for other system functions.
To calculate the effective cost of a DSP
function when implemented in an FPGA,
we considered the Spartan-3 XC3S1000
device, which is a mid-range member of the
Spartan-3 family. In many cases, a given
DSP function uses not only the FPGA logic
but also embedded multipliers and block
RAMs. In that case, we included the estimated
amount of die space taken by these
embedded functions and added that to the
die area used by the logic.
Table 3 shows some of these functions
and the cost of implementing these within
the Spartan-3 silicon. (We have not
included the cost for programming the
PROM, because in many cases you can use
the existing EPROM on-board to program
the FPGA.)
Table 3 – Effective costs of various DSP functions in a Spartan-3 device
| Functions | % of the XC3S1000 Device Utilized | Effective Cost (50K Units) | Key Specification | Other Specifications |
| 1024-point complex FFT | 24.1% | $3.23 | 20 µs transform | 20 µs transform, burst I/O, 16-bit input and phase factor |
| Single channel 64-tap FIR filter | 3.0% | $0.41 | 8.1 MSPS | 16-bit data and co-efficient, MAC implementation, 8.1 MSPS |
| Digital down converter per channel | 18.6% | $2.49 | Sample rate 100 MSPS | |
| Digital up converter per channel | 18.6% | $2.49 | Sample rate 100 MSPS | |
| Viterbi decoder | 37.8% | $5.06 | 1.9 MSPS per channel | Parallel mode, trace-back 42,
constraint length = 7,
32-channel, 1.9 MSPS per channel |
| Reed Solomon G.709 encoder | 1.3% | $0.17 | 120 MHz | |
| Reed Solomon G.709 decoder | 6.9% | $0.92 | 60 MHz | |
Some of the most common functions used
in DSP applications are fast Fourier transforms
(FFTs) and FIR filters. A single channel
64-tap MAC FIR filter running at 8.1 mega
samples per second (MSPS) can be implemented
for an effective cost of $0.41. Note
that this filter uses 200 logic slices
and four embedded multipliers – approximately 3% of the die area.
You can also implement simple
forward error correction DSP
cores such as Viterbi and Reed
Solomon functions at a low cost
within the Spartan-3 device. A 32-channel, parallel mode Viterbi
decoder running at 1.9 MSPS per
channel has an effective cost of
$5.06, or $0.16 per channel. A
Reed Solomon G.709 decoder
function running at 60 MHz takes only 6.9% of the same device (with
an effective cost of $0.92).
Complex functions such as a digital
down converter (DDC) or a digital up converter
(DUC) – commonly used in wireless
base stations – take less than 20% of the
Spartan-3 XC3S1000 device (with an
effective cost of $2.49).
Development Tool Flow
With Xilinx, you can use industry standard
development tools for your DSP
designs. Using MATLAB™ and
Simulink™ from The MathWorks, coupled
with Xilinx System Generator for
DSP, you can now model, simulate, and
verify your signal processing algorithms
on your target hardware platform without
leaving the Simulink environment.
The design flow typically involves the
following steps:
- A DSP designer develops and verifies
the hardware model using
industry-standard tools from The
MathWorks in conjunction with
Xilinx System Generator for DSP.
- With a push of a button, Xilinx
System Generator generates an
HDL circuit representation that is
bit- and cycle-true, meaning that
the behavior is guaranteed to match
the functionality seen in the
Simulink/System Generator model.
- The ISE design tools synthesize the
design and produce a bitstream that
can be used to program the FPGA.
The error-prone and time-consuming
step of having an FPGA designer translate
the system engineer’s design into
HDL is thus eliminated. Figure 3 shows
a typical design flow using the Xilinx
System Generator. With recent advances
in this product, DSP designers can now
generate an FPGA bitstream directly
using Simulink/System Generator.
Conclusion
With its combination of low unit cost
and architecture optimized for DSP
functions, Spartan-3 FPGAs have the
industry’s lowest price points for highperformance
DSP functions. Xilinx further
enables embedded DSP functions by
providing design tools that fit within
your tool flow and enhance your productivity
by automating the FPGA implementation
process.
With the availability of Spartan-3
devices, associated design tools, and the
increasing number of off-the-shelf DSP
functions optimized for this fabric, you
must evaluate embedding DSP functions
within Spartan-3 FPGAs as a viable
option.
For more information, visit www.xilinx.com/spartan3/,
www.xilinx.com/dsp/,
and www.xilinx.com/ipcenter/.
Printable PDF version of this article with graphics. (4/15/04) 303 KB |