Support|documentation

  Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Virtex-4: Breakthrough Performance at the Lowest Cost



by Greg Lara, Product Marketing Manager, Virtex Solutions, Xilinx, Inc.
greg.lara@xilinx.com (10/25/04)


Virtex-4 FPGAs deliver what you've been looking for.
article link to PDF
Article PDF 300 KB


As Xilinx began to define the capabilities of the fourth-generation of VirtexTM devices, we set out to address the performance, functionality, and cost requirements of next-generation electronic systems, and to increase our customers' productivity by easing system design challenges. We interviewed more than 800 customers, including system architects and experts in logic design, embedded processing, high-performance DSP, and high-speed connectivity.

Despite the differences in their end products, these high-end FPGA users had a number of common key requirements. They asked for higher system performance to meet the demands of their leading-edge products; lower power consumption to meet stringent power budgets driven by system cost and reliability requirements; help in reducing system cost to enable them to thrive in a competitive marketplace; and solutions to simplify complex design challenges, such as building source-synchronous interfaces to the latest high-speed memories and advanced components.

We achieved these goals by enhancing the features proven popular in earlier Virtex devices and developing new capabilities never before available in FPGAs. Combining advanced processing technology with greater integrated functionality, Virtex-4TM FPGAs provide 2x more density, and boost performance as much as 2x, while reducing power consumption by as much as 50% compared with previous-generation FPGAs (see, "Features at a Glance"). At the same time, Virtex-4 FPGAs cut the cost of programmable system platforms by more than 50%, enabling developers to adopt high-performance FPGAs in an extraordinary range of products.

Higher Performance
Viretx-4 FPGAs attack the requirements for higher performance on several fronts. First, designers can improve system performance, thanks to the advanced 90 nm process and optimized FPGA fabric.

The second approach is to include dedicated, performance-tuned circuitry for implementing key system functions, such as integrated processors, DSP slices, Ethernet MACs, and serial transceivers. For example, the embedded Virtex-4 XtremeDSPTM slice delivers up to 500 MHz performance and the RocketIOTM serial transceiver ranges from 0.6 to 11.1 Gbps – unprecedented in the industry.

The third approach is the incorporation of powerful clock management capability, enabling engineers to extract the maximum performance from the programmable logic fabric. Xesium clocking technology addresses designers' demands for more flexible clocking with abundant resources – up to 32 global clocks in each device and up to 20 digital clock manager (DCM) circuits.

Xesium DCM circuits enable flexible generation of multiple clock domains with differential signaling supporting frequencies of up to 500 MHz performance and 40% less jitter than previous circuitry. In addition, Virtex-4 devices are the only FPGAs to provide differential clocking networks, a key advantage in implementing precision clocks with minimal skew and jitter.

Virtex-4 FPGAs further enhance clock management with phase-matched clock dividers (PMCD) that provide improved handling of multiple synchronous clock domains. These circuits, together with enhanced software support, give designers precise edge control and frequency synthesis capabilities, enabling the generation of high-quality clock networks.

Power Advantage
Virtex-4 FPGAs reduce power with a combination of techniques. By using a triple oxide technology, Xilinx can make trade-offs between speed and leakage that reduce static power consumption by 40% as we build transistors with different gate oxide thicknesses for configuration, interconnect, and I/O. This technology enables us to offset, and even reverse, the increase in leakage current inherent in the migration to finer geometry nodes and is exclusive to Xilinx in the FPGA industry.

In addition, dynamic power consumption decreases by 50% because of lower supply voltage and lower capacitance in the 90 nm process. Finally, extensive use of abundant embedded IP provides valuable functionality in circuits optimized to consume as little as one-tenth the power of an equivalent implementation in programmable logic fabric.

Lower System Cost
Xilinx addressed the requirements for lower system cost on three fronts:

  • 90 nm, 300 mm process leadership produces the lowest FPGA price.
    Xilinx manufactures Virtex-4 FPGAs using the same 90 nm, 300 mm processing technology we use to build the world's lowest-cost FPGAs, Spartan-3TM devices. The combination of finer geometries and larger 12 inch wafers produces approximately five times as many die per wafer, compared to building an equivalent chip with 130 nm process on 200 mm (8 inch) wafers. This lowers cost per die significantly.

  • Multiple platforms deliver cost-optimized feature sets.
    With each generation of Virtex FPGAs, Xilinx has taken advantage of the latest process node to fabricate devices that offer greater capacity, higher performance, and lower price. For the Virtex-4 family, we went even further to achieve cost reduction.

    As we strive to expand the use of Virtex FPGAs into new markets and geographies, we see that our customers have different requirements that vary with the complexity and target price for the systems they are creating. Using our propriety ASMBL (pronounced "assemble") architecture (see Figure 1 and, "ASMBL Architecture Enables Cost-Optimized Platforms"), we have assembled three different platforms (Figure 2) with an initial offering of 17 devices that deliver cost-optimized solutions for the widest range of high-performance electronic systems.
    ASMBL Architecture Enables Cost-Optimized Platforms
    With traditional FPGA architectures, increasing the size of the devices to meet the demands for greater logic capacity and more memory typically results in parallel scaling of all the advanced features on the die, rapidly increasing cost.

    To solve this inefficiency, Xilinx introduced a radical new architecture that enables us to offer a new generation of Virtex FPGAs providing the broadest range of capabilities in three unique platforms with feature mixes optimized to meet the requirements of different application domains. The ASMBL (Advanced Silicon Modular Block) architecture enables Xilinx to scale the capabilities and capacity of Virtex FPGAs independently of one another and rapidly assemble multiple platforms.

  • Integrated IP reduces the customer's bill of materials and saves FPGA resources.
    Virtex-4 FPGAs reduce system cost with abundant integrated IP. By incorporating many functions that find use in a broad range of applications, Virtex-4 FPGAs replace a number of discrete components commonly found on system boards.

    Designers can take advantage of embedded PowerPCTM processors, up to 10 Mb of embedded dual-port RAM/FIFO, integrated Ethernet MACs, sophisticated DSP circuitry, and on-board serial transceivers, among other features. This helps our customers lower system cost in several ways: by reducing component count and streamlining logistics with a smaller bill of materials; by simplifying the design and manufacturing of system hardware; by easing PCB design and manufacturing; and by improved system reliability through the reduction of solder joints.

    In addition, building dedicated circuits on the FPGA provides required functionality efficiently, while preserving the programmable logic fabric for customers to add the value of their proprietary designs. The result is more capability within a single package at a given price point.

Up to 80% Additional Cost Reduction with EasyPath
The EasyPath¢â program further lowers system cost for customers who are ready to take their finished design to volume production. Xilinx creates customized test programs for EasyPath customers that exercise only the device resources used in the specific design. This approach shortens test time and increases yield to reduce FPGA unit price up to 80%.

Source Synchronous Interfacing
To ensure reliable data transfer between a new generation of high-speed devices, hardware designers are turning to source-synchronous design techniques, in which the component sending the data generates and issues its own clock signal along with the data that it transmits. This technique eliminates one set of problems associated with parallel interfaces, but introduces its own circuit design challenges. ChipSync technology significantly simplifies component interface design with critical built-in circuitry that is available in every Virtex-4 I/O (see , "Virtex-4 Solves Source-Synchronous Design Challenges").

Virtex-4 FPGA Features at a Glance
  • Largest logic capacity
    • Up to 200,000 logic cells
  • Largest memory capacity
    • Up to 10 Mb block RAM
  • Highest performance
    • 500 MHz Xesium clocking technology
    • Expanded clocking resources
    • Enhanced clock precision
    • Reduced clock jitter and skew
  • Simplified source-synchronous interfacing
    • ChipSync technology
  • Complete serial connectivity solution
    • 622 Mbps – 11.1 Gbps RocketIO transceivers
  • Higher performance, low-power DSP
    • 500 MHz XtremeDSP slice
  • Simplified processor acceleration
    • PowerPC 405 processor with auxiliary processor unit (APU) controller interface
  • Integrated Ethernet MAC
  • Fourth-generation design security

Embedded Processing
Embedded developers have already used Xilinx processor solutions to create thousands of designs. As we talked to these developers about the requirements for their next-generation systems, several common themes emerged.

A Full Range of Processing Solutions
Engineers need a range of processing solutions to match the requirements of different tasks, ranging from simple control functions to advanced algorithms and highspeed calculation. In addition, they want the different solutions to share a common design environment.

Xilinx satisfies these requirements with a range of processors that includes the PicoBlazeTM eight-bit microcontroller soft core, the MicroBlazeTM 32-bit general purpose processor soft core, and the industry-standard PowerPC architecture, in the form of a performance-optimized hard core.

Efficient Hardware Acceleration
Using an FPGA with an embedded processor as a platform for programmable system design enables flexible partitioning of functionality into hardware and software.

Immersing the processor in the FPGA logic fabric opens the door to the additional flexibility of creating custom hardware to accelerate the execution of critical software. Hardware acceleration enables designers to apply logic resources to achieve performance exactly where needed.

Creating hardware (tightly coupled to the CPU) to act on a set of operands can accelerate the execution of key software by performing in a single cycle calculations that take many cycles on a processor. This performance boost is achieved by tuning the hardware design to provide the degree of parallelism required by the algorithm.

High Performance, Flexible Hardware Acceleration
Creating accelerators for FPGA-based processors requires three elements: programmable logic fabric for building the custom hardware; unassigned address space for the new instruction; and a low-latency path between the processor and the acceleration hardware. Xilinx provides the most efficient integration of microprocessor and FPGA fabric with dedicated interfaces that save clock cycles by eliminating bus overhead; are decoupled from the CPU to enable implementation of multiple accelerators; and do not stall the pipeline crucial to RISC performance.

All Virtex FPGAs have abundant programmable logic resources suitable for building acceleration hardware. Xilinx enables efficient accelerator integration for the MicroBlaze soft processor core with the Fast Simplex Link (FSL). The MicroBlaze processor supports up to 32 input and 32 output FSL, and code development is easy with simple programming for blocking and non-blocking instructions.

Virtex-4 FX devices include up to two PowerPC hard processor cores. Xilinx first introduced the immersed PowerPC 405 core in the Virtex-II ProTM family. For the Virtex-4 family, Xilinx has increased processor performance to 680 DMIPS at 450 MHz and reduced power consumption to 0.44 mW/MHz while maintaining compatibility with all software and IP created for the first-generation core.

A new auxiliary processor unit (APU) controller simplifies the integration of acceleration hardware for the PowerPC core by providing a direct interface between the CPU pipeline and the FPGA logic fabric. This ultra-low-latency architecture enhances performance by reducing, by a factor of ten, the number of bus cycles needed to access the accelerator hardware. The net result is a 20-fold increase in processor-accelerator efficiency.

High-Speed Connectivity
When we asked system developers to describe their connectivity requirements, they highlighted the need for performance to support emerging standards and flexibility to upgrade today's designs to meet future bandwidth requirements. They are looking for solutions that offer bandwidth greater than 3.125 Gbps, provide complete support for multiple communication standards, and maintain the highest possible signal integrity.

Our third-generation RocketIO multigigabit transceiver satisfies these requirements with the industry's broadest operating range and other enhancements. Virtex-4 FX FPGAs enable bridging between just about any serial or parallel connectivity standard. For example, the third-generation RocketIO multi-gigabit transceivers provide compliance with the PCI Express standard, with support for out-of-band signaling (electrical idle and beaconing) and spread-spectrum clocking.

To address the challenges of backplane and other high-speed connectivity designs, RocketIO multi-gigabit transceivers provide comprehensive equalization techniques to ensure signal integrity in a wide variety of applications (Table 1). These advanced equalization techniques enable engineers to give new life to old systems by upgrading legacy backplanes.

Table 1 – RocketIO features at a glance
  • Third-generation multi-gigabit transceivers
  • Operating range: 622 Mbps – 11.1 Gbps
  • Channels: 24
  • Transmit pre-emphasis
  • Decision feedback equalization (DFE)
  • 8b/10b and 64b/66b encode/decode
  • Sonet jitter compliant at OC-12 and OC-48 line rates

In addition, Virtex-4 FX devices include built-in Ethernet connectivity, enabling seamless chip-to-chip connections without consuming programmable logic resources. The Ethernet MAC core supports 10/100/1000 Mbps data rates with UNH-verified standards compliance and interoperability.

High-Performance DSP
Developers told us they need to achieve higher DSP performance targets to implement next-generation applications such as MPEG-4 video compression/decompression and multi-channel mobile communications. Scaling existing DSP implementations to meet these targets with multiple programmable DSPs or dedicated ASIC hardware can be prohibitively expensive. Designers also need to control system power consumption as they squeeze more functionality into smaller form factors.

To address new DSP performance requirements, Xilinx crafted the versatile XtremeDSP slice, providing twice the DSP performance of previous implementations while drawing less than 1/7th of the power. Although all Virtex-4 FPGAs contain XtremeDSP slices, the Virtex-4 SX platform provides the highest ratio of XtremeDSP slices to other resources.

The largest SX device, the XC4VSX55, has 512 slices. Using these 500 MHz XtremeDSP slices with 18 x 18-bit multiplier and 48-bit accumulator exclusively, this device can achieve 256 GMAC/s performance at a very aggressive price point, providing the most powerful DSP capabilities of any FPGA in the industry. Demonstrating the revolutionary flexibility of the multi-platform approach enabled by the ASMBL architecture, the DSP-optimized SX55 offers ten times the DSP value, as measured in GMACs/dollar, compared with previous-generation FPGAs.

Xilinx is helping DSP developers close the gap between the performance of programmable single-MAC DSPs and the requirements of advanced algorithms with Virtex-4 SX platform FPGAs. Virtex-4 FPGAs can serve alongside programmable DSPs as pre-processors or co-processors to offload compute-intensive tasks.

Conclusion
To learn more about how you can take advantage of the breakthrough capabilities and performance of Virtex-4 FPGAs in your next system, please visit our website at www.xilinx.com/virtex4/.

Virtex-4 Solves Source-Synchronous Design Challenges
Source-synchronous interfaces typically send signals at bandwidths of up to 1 Gbps or higher on each channel. FPGA logic circuitry has difficulty processing incoming signals at that speed, so the frequency must be reduced by converting serial data on each channel to parallel data as it enters the device. Conversely, transmission requires converting parallel data to serial format. Traditionally, this process involves multiple stages of dividing down or multiplying up the speed. The steps required to meet the setup and hold requirements are laborious and time-consuming.

ChipSync technology simplifies design and boosts performance with an embedded SERDES that serializes and de-serializes parallel bus interfaces to match the data rate to the speed of the internal FPGA circuits. ChipSync technology enables data rates greater than 1 Gbps for differential I/O, and over 600 Mbps for single-ended I/O. This ability simplifies the design of interfaces such as SPI-4.2, XSBI, and SFI-4, as well as RapidIO and HyperTransport.

Each channel and clock follows a slightly different route through the printed circuit board. Ensuring reliable data capture requires satisfying the setup and hold times of each channel. With communication interfaces of eight channels and higher, and with memory buses up to 144 bits wide, this can be an extremely challenging task.

ChipSync technology simplifies the implementation of communication and high-speed memory interfaces (including DDR 2 SDRAM, QDR II SRAM, FCRAM II, and RLDRAM II) by compensating routing issues that produce skew between data and clock signals. Built-in circuitry enables the delay of each data and clock channel within the SelectIO¢â block, in 78 ps increments, to meet the setup and hold requirements for reliable data capture.

For extreme levels of skew, the misalignment might be greater than a bit interval. Aligning bits helps read the data reliably, but some channels might be out of step with others. To address extreme levels of skew, greater than a bit interval, ChipSync technology provides a bitslip capability. An optional training pattern simplifies the task of aligning data words across all channels. With source-synchronous design, each interface has its own clock. As multiple interfaces and memories are connected to the same FPGA, the need for numerous flexible clock resources grows. With clock-aware I/Os, ChipSync technology enables simultaneous implementation of multiple source-synchronous interfaces.

Xesium clocking makes this possible with up to 24 clock regions per device. Each region can have up to six I/Os acting as clock sources for data capture. Up to 95 I/Os can be clocked by a single I/O clock, providing great clock flexibility and a large number of clocks.

Printable PDF version of this article with graphics. PDF logo (10/25/04) 300 KB

 
/csi/footer.htm