The Virtex®-4 FPU for PowerPC® 405 is a Xilinx implementation that only supports single precision floating point operations and is not PowerPC compliant. With Xilinx provided compiler modifications, single precision floating point instructions can be executed to achieve increased performance over software emulation. Refer to the Virtex-4 data sheet for specific details.
The following data shows the FPGA resources consumed by the Floating Point Unit (FPU) and the clock frequency the PowerPC® 405 can achieve with an FPU.
Device Support: Virtex-4 FX
| Single Precision FPU Type | Resources | Clock Frequency PowerPC / FPU (MHz) |
|||
|
Slices |
DSP48 | Block RAMs | -10 Speed Grade | -12 Speed Grade | |
| Lite | 1100 | 4 | 2 | 275 / 137.5 | 340 / 170 |
| Full (with div / sqrt) | 1250 | 4 | 2 | 275 / 137.5 | 340 / 170 |
All benchmarks provided below were performed on a Xilinx ML403 Board with a 200 MHz PowerPC and 100 MHz FPU. The data is then scaled where appropriate to accurately reflect the respective system being measured.
There are three tables provided demonstrating:
The following performance data is representative of the maximum performance PowerPC with FPU system for the different speed grades.
| Algorithm | Performance |
|
| Software Precision FPU (-10) | Single Precision FPU (-12) | |
| FIR Filter | 78.7 MFLOPS | 97.4 MFLOPS |
| Whetstone | 18.3 MFLOPS | 22.6 MFLOPS |
The following performance data is representative of a 275 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration.
| Algorithm | Performance | Acceleration |
||
| Software Emulation of Floating Pt * | Single Precision FPU | |||
| FIR Filter | 846.09 ms |
63.94 ms |
13x |
|
| Video Editing Algorithm | 248.38 ms |
27.93 ms |
9x |
|
| PID Loop | 1.66 us |
0.37 us |
4x |
|
| 1024pt FFT | 19.48 ms |
5.42 ms |
4x |
|
The following performance data is representative of a 350 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration (these systems reflect the respective maximum clock frequency in the -10 speed grade device).
Algorithm |
Performance |
Acceleration |
|
Software Emulation of Floating Pt * |
Single Precision FPU |
||
| FIR Filter | 664.00 ms |
63.94 ms |
10x |
| Video Editing Algorithm | 195.16 ms |
27.93 ms |
7x |
| PID Loop | 1.30 us |
0.37 us |
3x |
| 1024pt FFT | 15.31 ms |
5.42 ms |
3x |
The speedup over software floating point execution will depend highly on the type of application and the amount of time the algorithm spends performing Floating Point arithmetic.
Furthermore, the largest performance speed ups are achieved with C-Code that takes full optimal advantage of the FPU. Guidance for code improvements can be found in the data sheet (PDF).