Support|documentation

  Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Using Spartan-3 FPGAs to Implement High-Performance DSP



by Suhel Dhanani, Sr. Marketing Manager, Spartan Solutions, Xilinx, Inc..
suhel.dhanani@xilinx.com (1/15/05)


Spartan-3 FPGAs provide breakthrough cost points for embedded DSP.
article link to PDF
Article PDF 305 KB


All low-cost FPGAs provide basic logic capability at attractive prices and serve a broad range of general-purpose design requirements. When you consider embedding DSP functions in an FPGA fabric, however, you may believe that you must choose high-end FPGAs to get platform features such as embedded multipliers and distributed memory.

With Spartan-3™ FPGAs, the landscape for embedded DSP has changed. Spartan-3 devices may be low cost, but they also have the platform features required for DSP designs. These platform features allow area-efficient implementation of signal processing functions – allowing you to realize significantly lower price points.

Spartan-3 devices are ideal as coprocessors or pre-/post-processors, offloading highly computational functions from a programmable DSP to enhance system performance.

Optimized for DSP
The Spartan-3 family from Xilinx uses 90 nm process technology in conjunction with 300 mm wafers to dramatically lower the cost of FPGAs. At the same time, the devices incorporate key DSP resources such as embedded 18 x 18-bit multipliers and large blocks (18 kb) of memory, distributed RAM, and shift-register logic. This advanced feature set means that you can use Spartan-3 FPGAs to implement DSP algorithms at a significantly lower cost than competing FPGAs. The specific features that help in efficiently implementing DSP are shown in Figure 1.

In addition to increasing the basic performance of systems, these embedded features enhance device utilization. For instance, the embedded Spartan-3 multiplier would take 300-400 logic elements (LEs) if implemented in the logic fabric. And because the embedded multiplier is adjacent to logic fabric, augmenting the functionality (such as creating accumulators or concatenating the multipliers to create complex arithmetic functions) is fairly straightforward.

Many DSP functions are best implemented in pipelines with time multiplexing for efficiency. This allows you to create faster systems with higher bandwidth, but it comes at the expense of requiring more interim storage elements. For example, a time-multiplexed filter would store the results of individual multiply-accumulate cells in shift registers. Such designs can run out of registers or memory before they run out of logic resources. The Spartan-3 FPGA family is unique in providing a mode where a single look-up table (LUT) is capable of implementing logic functions or acting as a 16-bit shift register.

As shown in Figure 2, this architecture enhancement allows you to use a single LUT in place of 16 registers – maximizing area efficiency when implementing time-multiplexed DSP functions.

Many DSP functions are also extremely memory-intensive – requiring scratch-pad memory for storing coefficients, implementing FIFOs, and large buffers. As shown in Figure 3, Spartan-3 devices provide more memory bits than other low-cost FPGAs available today. For many DSP designs, the critical resource is the embedded memory within the FPGA – not logic or multipliers. Because of insufficient memory, designers using competing low-cost devices may have to migrate to a larger device or use external memory for systems that would fit into a single, small Spartan-3 FPGA.

Common DSP Functions
Let’s see how these features impact device utilization by looking at two implementation examples of a finite impulse response (FIR) filter. One is a MAC-based implementation, while the other is a multichannel distributed arithmetic (DA) implementation.

FIR filters are commonly used in base stations, digital video, wireless LANs, xDSL, and cable modems. Our benchmark is the implementation of a 64-tap, MAC FIR filter with 16-bit data and coefficients running at 130 MHz in a Spartan-3 XC3S400 FPGA. The first implementation uses a single MAC; the second implementation uses four MACs. Figure 4 shows the device utilization section of the report file for both implementations.

Going from a one-MAC to a four-MAC implementation dramatically increases the performance of the FIR filter. The number of LUTs only doubles and remains at just 4% of the total available logic. A four-MAC implementation uses four block RAMs and four multipliers to efficiently implement the FIR filter using minimum device logic resources.

Another interesting implementation is that of a multi-channel FIR function. In this case we can look at how the device utilization changes when we go from a onechannel FIR to an eight-channel FIR filter.

As shown in Figure 5, a single channel distributed arithmetic FIR filter uses 29% of the logic resources and 39% of the registers of a XC3S1000 Spartan-3 device. When implementing an eight-channel version of the same filter, we would normally time multiplex the different channels to conserve logic. But this would use a lot of registers, or a significant amount of on-chip memory to store the intermediate results.

With Spartan-3 FPGAs, the intermediate results are stored in LUTs configured as 16-bit shift registers (SRL-16). This allows the eight-channel version of the same filter to be implemented using only 10% more of the available logic and only 7% more of the available registers – 8x more channels for only 25% more device resources (see Figure 6).

This dramatic savings is directly related to the use of the SRL-16s available in the Spartan-3 device. In the report file, you can see that an additional 1,343 LUTs are used in the SRL-16 mode for the eightchannel implementation.

Implementing this design in an FPGA without SRL16 capability would require an additional 10,744 (1343 x 8) flip-flops used as storage elements, demanding a massive device for the register count and likely squandering the associated combinatorial logic resources.

Conclusion
The Spartan-3 architecture is optimized to give you very high area efficiency when implementing signal processing functions. By combining these DSP-friendly system features with low unit costs, Spartan-3 FPGAs enable the industry’s lowest price points for high-performance DSP functions. This allows a Spartan-3 device to act as a low cost but highly efficient and high-performance co-processor to a programmable DSP processor.

Printable PDF version of this article with graphics. PDF logo (1/15/05) 305 KB

 
/csi/footer.htm