Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Achieve Breakthrough Performance in Your System



by Adrian Cosoroaba, Marketing Manager, Xilinx, Inc.
adrian.cosoroaba@xilinx.com (7/11/05)


Virtex-4 FPGAs set new records in system performance while consuming minimal power and providing superior signal integrity.
article link to PDF
Article PDF 280 KB


Performance in today’s systems is defined by more than FPGA clock rates. Every system has different requirements, and the maximum achievable performance is determined by various factors such as logic fabric performance, I/O bandwidth, embedded processing, and DSP performance, among others. These requirements can also be subject to power restrictions, as well as signal integrity and cost budgets.

Xilinx® developed the Virtex™-4 FPGA family after consulting hundreds of customers to address these requirements and make it easier than ever to meet system performance goals. In this article, we’ll look at how Virtex-4 FPGAs provide new and unique capabilities to help you meet diverse requirements for system performance.

System Design Challenges
With each new generation of devices, semiconductor vendors are able to offer higher clock rates, due to shrinking process geometries. However, today’s system performance challenges go beyond traditional glue logic and maximized clock rates. In a PC, for example, the real system performance bottleneck lies not in clock frequency but in how the other blocks of the system work together at the desired frequency.

Let’s consider these challenges in the perspective of applications employing highperformance FPGAs. Seemingly diverse applications like video stream processing, packet data processing, storage systems, wireless base stations, and many others incorporate similar functions, including:

  • Incoming and outgoing data streams
  • Bridging multiple connectivity standards
  • Arithmetic and DSP (signal conditioning and data processing)
  • External memory interfacing
  • State machines
  • Data buffering
  • Embedded processing (Figure 1)
To facilitate these applications, Virtex-4 FPGAs include common building blocks as embedded – yet parameterizable – hard IP. The integration of complex functions like DSP slices, embedded CPUs, dedicated I/O circuitry, and on-chip RAM (block RAM, FIFOs) provides you with unprecedented capabilities to build programmable systems within a single FPGA device.

Meeting system requirements takes the right combination of I/O bandwidth, programmable logic, on-chip RAM, DSP, and embedded processing. To provide the ideal combination of functions, Virtex-4 FPGAs come in three flavors (LX, SX, and FX platforms) comprising 17 devices.

Virtex-4 FPGAs offer not only enhanced logic fabric capabilities, but also customized XtremeDSP™ MACs and embedded PowerPC™ processors that give you enough performance headroom to reach your design performance goals.

I/O bandwidth is often the limiting factor in the quest for performance. To remove I/O bottlenecks, Virtex-4 FPGAs have unique built-in 1 Gbps ChipSync™ sourcesynchronous circuitry and 622 Mbps to 10.3125 Gbps serial transceivers that can help you achieve bandwidth targets.

System Performance Categories
Let’s look at various aspects of performance and Virtex-4 FPGAs in the context of seven major performance categories: logic fabric, embedded processing, DSP, on-chip RAM, high-speed serial, I/O memory bandwidth, and I/O LVDS bandwidth. Figure 2 offers a comparison with the nearest 90 nm FPGA vendor in each of these categories.

Logic Fabric Performance
Xilinx enhanced the performance of its already fast programmable logic fabric by building Virtex-4 devices with advanced 90 nm technology. A flexible look-up table (LUT) architecture (with the ability to covert any LUT into a 16-bit RAM or 16-bit shift register), a high-speed carry chain, and arithmetic blocks provide further performance gains.

The 500 MHz global clocking structure, the key driver behind logic performance, is fully differential to reduce skew, jitter, and dutycycle distortion. Virtex-4 FPGAs also provide a hierarchical clocking structure (global and regional clocks) and clock management circuitry. Evaluations of logic fabric performance using a suite of realworld designs demonstrate a performance advantage as much as 70% above our nearest 90 nm competitor. Averaged across this suite of designs, the Virtex-4 performance advantage is 15%. This performance boost means that Virtex-4 devices effectively provide an extra speedgrade advantage.

Embedded Processing
Virtex-4 FX platform FPGAs provide up to two enhanced PowerPC 405 cores, each delivering 702 DMIPS performance at 450 MHz, while consuming only 0.45 mW/MHz. This is more than three times the performance of the best soft microprocessor cores.

Moreover, the new Auxiliary Processor Unit (APU) controller makes it easy to reach even higher levels of performance by integrating custom co-processors and hardware accelerators. The APU controller provides a low-latency path for connecting co-processor modules implemented in the FPGA to the embedded PowerPC processor. These userdefined, configurable hardware accelerator functions operate as extensions to the PowerPC 405, offloading the CPU from demanding computational tasks. For example, implementing floating-point calculations in hardware improves performance by a factor of 20 over software emulation. A 10/100/1000 Mbps tri-mode Ethernet MAC implemented alongside a PowerPC processor enables Ethernet connectivity.

DSP Performance
The XtremeDSP™ slice is a versatile, user-configurable block providing twice the DSP performance of previous implementations while drawing less than 1/7th the power. Each slice contains a dedicated two’s complement, signed 18 x 18 bit multiplier, and a three-input adder/subtracter/accumulator with feedback path. With as many as 512 XtremeDSP slices running at 500 MHz, a single Virtex-4 FPGA delivers 256 GigaMAC/s (18 x 18 GMACs) performance.

You can configure the XtremeDSP slices to implement multipliers, counters, multiply- accumulators, and many more functions, all without consuming logic fabric resources. The ability to implement complex systolic functions without incurring the delay of fabric routing provides significant performance gains. For example, in a 32-tap FIR implementation, the Virtex-4 FPGA outperforms competing devices by 40%.

On-Chip Memory Performance
The Virtex-4 family carries forward the size and basic structure of on-chip memory, 18 Kb dual-port block RAM (proven in previous generations), but adds a data-output pipeline register to increase speed to 500 MHz. The two ports still have individual width control, and in write mode you can choose between automatically reading the previously stored data or the new data. Two neighboring block RAMs, when combined, form a 32K x 1 RAM without loss of speed, or a 512-deep 64-wide RAM with automatic Hamming error correction – without using any extra logic.

Each block RAM also contains its own FIFO controller, a unique Virtex-4 FPGA feature that provides 500 MHz functionality without additional logic resources. Compared to competing devices, the block RAMs provide at least 20% better performance.

But getting your FPGA internal blocks to run fast is only half the battle. Maximum system performance requires efficient interaction between the FPGA and other components in your system. Virtex-4 FPGAs offer the flexibility to achieve the highest possible bandwidth for chip-to-chip, board-to-board, and box-to-box connectivity.

High-Speed Serial I/O
As designs move to faster interface speeds, serial interconnect saves power and board space while reducing design complexity and cost. Virtex-4 RocketIO™ MGTs offer performance from 622 Mbps to 10.3125 Gbps, one of the broadest ranges offered by any device. The transceivers are fully programmable and can implement a myriad of speeds and serial standards. Link-layer IP is available for such standards as PCI Express, Serial-ATA, Fibre Channel, Gigabit Ethernet, and Aurora.

Memory I/O Bandwidth
The great majority of systems today need a data buffer external to the FPGA for temporary storage. This buffer’s bandwidth can be the critical factor in determining overall performance.

Memory interfaces like DDR2 SDRAM, QDR II SRAM, or RLDRAM II are source-synchronous, with per-pin data rates of more than 533 Mbps. Memory bandwidth is determined not only by the per-pin data rate but also by the width of the bus. The ChipSync circuitry built into every I/O simplifies the physical layer interface and provides the capability to implement buses three times wider than other programmable solutions, for bandwidths as high as 260 Gbps.

To enable reliable data capture, ChipSync circuitry also includes built-in delay elements, adjustable in 75 ps increments, to ensure the proper alignment between clock and data signals. The unique capability to calibrate timing at run time, rather than at design time, substantially improves design margins. Xilinx also provides hardware-verified reference designs, development systems, and software tools to further speed up the implementation of memory interfaces.

LVDS I/O Bandwidth
ChipSync technology simplifies the design of differential parallel bus interfaces, with embedded SERDES blocks that serialize and de-serialize parallel interfaces to match the data rate to the speed of the internal FPGA circuits. Additionally, this technology provides per-bit and per-channel de-skew for increased design margins, simplifying the design of interfaces such as SPI-4.2, XSBI, and SFI-4, as well as RapidIO.

Virtex-4 FPGAs incorporate ChipSync technology into every I/O, providing the most flexible I/O solution available. This enables wider 1 Gbps LVDS buses for up to 480 Gbps bandwidth, 60% higher than the competition.

Other Performance Challenges
Achieving the desired system performance with your FPGA is often impeded by signal integrity, cost, and power budget restrictions.

The innovative Application Specific Modular Block (ASMBL) architecture enables I/O, clock, power, and ground pins to be located anywhere on the silicon chip, not just along the periphery. This architecture alleviates the problems associated with I/O and array dependency, power and ground distribution, and hard-IP scaling.

Furthermore, the Virtex-4 FPGA packaging technology, SparseChevron, enables distribution of power and ground pins evenly across the package. The benefit to you is improved signal integrity. As demonstrated by Dr. Howard Johnson, Virtex-4 FPGA devices have seven times less simultaneously switching output (SSO) noise and crosstalk when compared to competing devices.

The ASMBL architecture, with its column-based implementation of programmable logic, DSP slices, block RAM, I/O columns, MGTs, clocking, and PowerPC embedded cores, provides another significant benefit in that it allows a more flexible allocation of resources. This enables Xilinx to offer three Virtex-4 FPGA platforms: the LX platform, optimized for logic resources; the SX platform, optimized for DSP; and the FX platform, optimized for embedded processing and high-speed serial applications.

Device power budgets impose an additional impediment to meeting performance goals. Because power consumption increases with clock rate, you may exceed your power budget at frequencies below your performance target, even if your chosen device has more performance on tap. Selecting a device with low power consumption will help you achieve performance goals while staying within your power budget, and can deliver the additional benefits of lower system cost and higher reliability through reduced power supply and cooling requirements.

Virtex-4 FPGAs incorporate unique triple-oxide 90 nm technology that significantly reduces static power. Additionally, by implementing commonly used functions such as embedded IP, Virtex-4 FPGAs further reduce dynamic power when compared to previous generations or competing devices. Measurements and analysis of Xilinx against competing tools and silicon show that Virtex-4 FPGAs consume 1 to 5W less than the competition’s 90 nm FPGAs.

Conclusion
Virtex-4 FPGAs incorporate innovative built-in silicon features, extensive embedded IP, triple-oxide 90 nm technology, and unique packaging to provide designers with capabilities that enable breakthrough performance at the lowest cost.

For more information about getting started with your Virtex-4 FPGA design, visit www.xilinx.com/virtex4.

Printable PDF version of this article with graphics. PDF logo (7/11/05) 280 KB

 
Jobs Events Webcasts News Investors Feedback Legal Privacy Trademarks Sitemap
© 1994-2008 Xilinx, Inc. All Rights Reserved.