UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

AR# 29555

LogiCORE Fast Fourier Transform (FFT) v4.1 - Why do FFTs with large complex multipliers fail in PAR when targeting a Spartan-3A DSP device?

Description

Why do FFTs with large complex multipliers fail in PAR when targeting a Spartan-3A DSP device?

This also happens to a Virtex part as well, but is more common in the Spartan-3A DSP devices, due to fewer DSP48As in a column.

Solution

This is due to the size of the complex multiply, which causes a long cascade that cannot be placed.

Often, this happens with the use of the streaming architecture, which can create large complex multipliers, resulting in a long DSP48 cascade. (i.e., A FFT core that has 18-bit input data, 18-bit twiddle factors and uses the streaming architecture, the implementation will be a decimation in frequency which means that the complex multiplier comes after the butterfly. The streaming architecture has a complex multiplier after every second butterfly. The result is that the data path grows by 1 bit in each butterfly, so by the time the data path reaches the complex multiplier, it is 20 bits wide. Also, the 18-bit twiddle factors are internally increased by 1 bit so that +1 can be represented exactly, so the second input to the complex multiplier is 19 bits wide. Therefore, each complex multiplier is 20 x 19 bits. As the DSP48 inputs are 18 bits wide, the complex multiplier is constructed by cascading several DSP48s. In this case, 4 are required to build each real multiplier. As you have asked to optimize complex multipliers for speed using DSP48s, the complex multiplier uses 4 real multipliers, so there are a total of 16 DSP48As in each complex multiplier. These must be cascaded in two groups of 8 DSP48s in order to produce the separate real and imaginary outputs.)

There are three possible work-arounds:

- Use a different larger part, with more DSP48s per column.

- Uncheck the "Optimize complex multipliers for speed using DSP48s" check box to get a 3-real-multiplier complex multiplier instead. This might impact your maximum clock frequency, but there is no change to data precision. It will also reduce your DSP48 count to 75% of its current level.

- Reduce either your input data width to 16 bits, or your twiddle factor width to 17 bits, or both. Changing only one of these will give a complex multiplier that uses 8 DSP48s instead of 16. Changing both of these will give a complex multiplier that uses only 4 DSP48s. There is no impact on clock frequency, but a small reduction in data precision.

For a detailed list of LogiCORE Fast Fourier Transform (FFT) Release Notes and Known Issues, see (Xilinx Answer 29209).

AR# 29555
Date Created 10/28/2007
Last Updated 12/15/2012
Status Active
Type General Article