Virtex-4 FXT FPU v2.1 for the PowerPC Performance and Size

The Virtex®-4 FPU for PowerPC® 405 is a Xilinx implementation that only supports single precision floating point operations and is not PowerPC compliant. With Xilinx provided compiler modifications, single precision floating point instructions can be executed to achieve increased performance over software emulation. Refer to the Virtex-4 data sheet for specific details.

The following data shows the FPGA resources consumed by the Floating Point Unit (FPU) and the clock frequency the PowerPC® 405 can achieve with an FPU.

Device Support:  Virtex-4 FX

Single Precision FPU Type Resources Clock Frequency
PowerPC / FPU (MHz)
Slices
DSP48 Block RAMs -10 Speed Grade -12 Speed Grade
Lite 1100 4 2 275 / 137.5 340 / 170
Full (with div / sqrt) 1250 4 2 275 / 137.5 340 / 170
Virtex-4 FXT FPU Performance and Acceleration Data:

All benchmarks provided below were performed on a Xilinx ML403 Board with a 200 MHz PowerPC and 100 MHz FPU. The data is then scaled where appropriate to accurately reflect the respective system being measured.

There are three tables provided demonstrating:

Virtex-4 FXT FPU Peak Sustained Performance

The following performance data is representative of the maximum performance PowerPC with FPU system for the different speed grades.

Algorithm
Performance
Software Precision FPU (-10) Single Precision FPU (-12)
FIR Filter 78.7 MFLOPS 97.4 MFLOPS
Whetstone 18.3 MFLOPS 22.6 MFLOPS
FPU Acceleration over Software for (Equivalent Frequency)

The following performance data is representative of a 275 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration.

Algorithm Performance
Acceleration
Software Emulation of Floating Pt * Single Precision FPU
FIR Filter
846.09 ms
63.94 ms
13x
Video Editing Algorithm
248.38 ms
27.93 ms
9x
PID Loop
1.66 us
0.37 us
4x
1024pt FFT
19.48 ms
5.42 ms
4x
FPU Acceleration over oftware (Maximum Frequency in -10)

The following performance data is representative of a 350 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration (these systems reflect the respective maximum clock frequency in the -10 speed grade device).

Algorithm
Performance
Acceleration
Software Emulation of Floating Pt *
Single Precision FPU
FIR Filter
664.00 ms
63.94 ms
10x
Video Editing Algorithm
195.16 ms
27.93 ms
7x
PID Loop
1.30 us
0.37 us
3x
1024pt FFT
15.31 ms
5.42 ms
3x

The speedup over software floating point execution will depend highly on the type of application and the amount of time the algorithm spends performing Floating Point arithmetic.

Furthermore, the largest performance speed ups are achieved with C-Code that takes full optimal advantage of the FPU. Guidance for code improvements can be found in the data sheet (PDF).

* C-Code is compiled using IBM Performance Libs delivered with EDK 8.2i  
 
/csi/footer.htm