Product|ipcenter

/csi/ip.htm
 

Home : Products & Services : Intellectual Property : Virtex-4 FXT FPU v2.1 : FPU v2.1 for the PowerPC Performance and Size

Virtex-4 FXT FPU v2.1 for the PowerPC Performance and Size

  Product Details

The Virtex®-4 FPU for PowerPC® 405 is a Xilinx implementation that only supports single precision floating point operations and is not PowerPC compliant. With Xilinx provided compiler modifications, single precision floating point instructions can be executed to achieve increased performance over software emulation. Refer to the Virtex-4 data sheet for specific details.

The following data shows the FPGA resources consumed by the Floating Point Unit (FPU) and the clock frequency the PowerPC® 405 can achieve with an FPU.

Device Support:  Virtex-4 FX

Single Precision FPU Type
Resources
Clock Frequency
PowerPC / FPU (MHz)
Slices
DSP48
Block RAMs
-10
Speed Grade
-12
Speed Grade
Lite
1100
4
2
275 / 137.5
340 / 170
Full (with div / sqrt)
1250
4
2
275 / 137.5
340 / 170

Virtex-4 FXT FPU Performance and Acceleration Data:

All benchmarks provided below were performed on a Xilinx ML403 Board with a 200 MHz PowerPC and 100 MHz FPU. The data is then scaled where appropriate to accurately reflect the respective system being measured.

There are three tables provided demonstrating:

Virtex-4 FXT FPU Peak Sustained Performance

The following performance data is representative of the maximum performance PowerPC with FPU system for the different speed grades.

Algorithm
Performance
Software Precision FPU (-10)
Single Precision
FPU (-12)
FIR Filter
78.7 MFLOPS
97.4 MFLOPS
Whetstone
18.3 MFLOPS
22.6 MFLOPS

FPU Acceleration over Software for (Equivalent Frequency)

The following performance data is representative of a 275 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration.

Algorithm
Performance
Acceleration
Software Emulation of Floating Pt *
Single Precision
FPU
FIR Filter
846.09 ms
63.94 ms
13x
Video Editing Algorithm
248.38 ms
27.93 ms
9x
PID Loop
1.66 us
0.37 us
4x
1024pt FFT
19.48 ms
5.42 ms
4x

FPU Acceleration over Software (Maximum Frequency in -10)

The following performance data is representative of a 350 MHz PowerPC system for software emulation and a 275 MHz PowerPC with a 137.5 MHz FPU system for FPU acceleration (these systems reflect the respective maximum clock frequency in the -10 speed grade device).

Algorithm
Performance
Acceleration
Software Emulation of Floating Pt *
Single Precision
FPU
FIR Filter
664.00 ms
63.94 ms
10x
Video Editing Algorithm
195.16 ms
27.93 ms
7x
PID Loop
1.30 us
0.37 us
3x
1024pt FFT
15.31 ms
5.42 ms
3x

The speedup over software floating point execution will depend highly on the type of application and the amount of time the algorithm spends performing Floating Point arithmetic.

Furthermore, the largest performance speed ups are achieved with C-Code that takes full optimal advantage of the FPU. Guidance for code improvements can be found in the pdf data sheet.

* C-Code is compiled using IBM Performance Libs delivered with EDK 8.2i
 
 
Virtex-4 FXT FPU Product Page
FPU Example Designs (ZIP)
Data Sheet (PDF)
Related Products
Platform Studio (XPS) and the EDK
PowerPC Embedded Processor
PowerPC Embedded IP
PowerPC & MicroBlaze Development Kit
Floating Point Operators
ISE Foundation Design Tool
MicroBlaze Floating Point Unit
/csi/footer.htm