The Virtex®-4 FPU for PowerPC® 405 is a Xilinx implementation that only supports single precision floating point operations and is not PowerPC compliant. With Xilinx provided compiler modifications, single precision floating point instructions can be executed to achieve increased performance over software emulation. Refer to the Virtex-4 data sheet for specific details.
The following data shows the FPGA resources consumed by the
Floating Point Unit (FPU) and the clock frequency the PowerPC® 405 can achieve with an FPU.
Device Support: Virtex-4 FX
| Single Precision FPU Type |
Resources |
Clock Frequency
PowerPC / FPU (MHz)
|
|
Slices |
DSP48 |
Block RAMs |
-10
Speed Grade |
-12
Speed Grade |
| Lite |
1100 |
4 |
2 |
275 / 137.5 |
340 / 170 |
| Full (with div / sqrt) |
1250 |
4 |
2 |
275 / 137.5 |
340 / 170 |
Virtex-4 FXT FPU Performance and Acceleration Data:
All benchmarks provided below were performed on a Xilinx
ML403 Board with a 200 MHz PowerPC and 100 MHz FPU. The data is
then scaled where appropriate to accurately reflect the respective
system being measured.
There are three tables provided demonstrating:
Virtex-4 FXT FPU Peak Sustained
Performance
The following performance data is representative of the maximum
performance PowerPC with FPU system for the different speed
grades.
| Algorithm |
Performance |
Software Precision
FPU (-10) |
Single Precision
FPU (-12) |
| FIR Filter |
78.7
MFLOPS |
97.4
MFLOPS |
| Whetstone |
18.3
MFLOPS |
22.6
MFLOPS |
FPU Acceleration over Software
for (Equivalent Frequency)
The following performance data is representative of a 275
MHz PowerPC system for software emulation and a 275 MHz PowerPC
with a 137.5 MHz FPU system for FPU acceleration.
Algorithm |
Performance |
Acceleration |
Software Emulation
of Floating Pt * |
Single Precision
FPU |
| FIR Filter |
846.09 ms |
63.94 ms |
13x |
| Video Editing Algorithm |
248.38 ms |
27.93 ms |
9x |
| PID Loop |
1.66 us |
0.37 us |
4x |
| 1024pt
FFT |
19.48 ms |
5.42 ms |
4x |
FPU Acceleration over
Software (Maximum Frequency in -10)
The following performance data is representative of a 350 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration (these systems reflect the respective maximum clock frequency in the -10 speed grade device).
Algorithm |
Performance |
Acceleration |
Software Emulation
of Floating Pt * |
Single Precision
FPU |
| FIR Filter |
664.00 ms |
63.94 ms |
10x |
| Video Editing Algorithm |
195.16 ms |
27.93 ms |
7x |
| PID Loop |
1.30 us |
0.37 us |
3x |
| 1024pt
FFT |
15.31 ms |
5.42 ms |
3x |
The speedup over software floating point execution will depend
highly on the type of application and the amount of time the
algorithm spends performing Floating Point arithmetic.
Furthermore, the largest performance speed ups are achieved
with C-Code that takes full optimal advantage of the FPU.
Guidance for code improvements can be found in the data
sheet.
|