|
As Xilinx began to define the capabilities of
the fourth-generation of VirtexTM devices,
we set out to address the performance, functionality,
and cost requirements of next-generation
electronic systems, and to increase
our customers' productivity by easing system
design challenges. We interviewed more
than 800 customers, including system architects
and experts in logic design, embedded
processing, high-performance DSP, and
high-speed connectivity.
Despite the differences in their end
products, these high-end FPGA users had a
number of common key requirements.
They asked for higher system performance
to meet the demands of their leading-edge
products; lower power consumption to
meet stringent power budgets driven by system
cost and reliability requirements; help
in reducing system cost to enable them to
thrive in a competitive marketplace; and
solutions to simplify complex design challenges,
such as building source-synchronous
interfaces to the latest high-speed memories
and advanced components.
We achieved these goals by enhancing
the features proven popular in earlier Virtex
devices and developing new capabilities
never before available in FPGAs.
Combining advanced processing technology
with greater integrated functionality, Virtex-4TM FPGAs provide 2x more density, and
boost performance as much as 2x, while
reducing power consumption by as much as
50% compared with previous-generation
FPGAs (see, "Features at a Glance").
At the same time, Virtex-4 FPGAs cut the
cost of programmable system platforms by
more than 50%, enabling developers to
adopt high-performance FPGAs in an
extraordinary range of products.
Higher Performance
Viretx-4 FPGAs attack the requirements
for higher performance on several fronts.
First, designers can improve system performance,
thanks to the advanced 90 nm
process and optimized FPGA fabric.
The second approach is to include dedicated,
performance-tuned circuitry for implementing
key system functions, such as
integrated processors, DSP slices, Ethernet
MACs, and serial transceivers. For example,
the embedded Virtex-4 XtremeDSPTM
slice delivers up to 500 MHz performance
and the RocketIOTM serial transceiver
ranges from 0.6 to 11.1 Gbps – unprecedented
in the industry.
The third approach is the incorporation
of powerful clock management capability,
enabling engineers to extract the maximum
performance from the programmable logic
fabric. Xesium clocking technology
addresses designers' demands for more flexible
clocking with abundant resources – up
to 32 global clocks in each device and up to
20 digital clock manager (DCM) circuits.
Xesium DCM circuits enable flexible
generation of multiple clock domains with
differential signaling supporting frequencies
of up to 500 MHz performance and 40%
less jitter than previous circuitry. In addition,
Virtex-4 devices are the only FPGAs
to provide differential clocking networks, a
key advantage in implementing precision
clocks with minimal skew and jitter.
Virtex-4 FPGAs further enhance clock
management with phase-matched clock
dividers (PMCD) that provide improved
handling of multiple synchronous clock
domains. These circuits, together with
enhanced software support, give designers
precise edge control and frequency synthesis
capabilities, enabling the generation of
high-quality clock networks.
Power Advantage
Virtex-4 FPGAs reduce
power with a combination
of techniques. By using a
triple oxide technology,
Xilinx can make trade-offs
between speed and leakage
that reduce static power
consumption by 40% as we
build transistors with different
gate oxide thicknesses
for configuration, interconnect,
and I/O. This technology
enables us to offset, and
even reverse, the increase in leakage current inherent in the migration to
finer geometry nodes and is exclusive to
Xilinx in the FPGA industry.
In addition, dynamic power consumption
decreases by 50% because of lower supply
voltage and lower capacitance in the 90 nm
process. Finally, extensive use of abundant
embedded IP provides valuable functionality
in circuits optimized to consume as little as
one-tenth the power of an equivalent implementation
in programmable logic fabric.
Lower System Cost
Xilinx addressed the requirements for lower
system cost on three fronts:
- 90 nm, 300 mm process leadership
produces the lowest FPGA price.
Xilinx manufactures Virtex-4 FPGAs
using the same 90 nm, 300 mm processing
technology we use to build the
world's lowest-cost FPGAs, Spartan-3TM
devices. The combination of finer
geometries and larger 12 inch wafers
produces approximately five times as many die per wafer, compared to building
an equivalent chip with 130 nm
process on 200 mm (8 inch) wafers.
This lowers cost per die significantly.
- Multiple platforms deliver cost-optimized
feature sets.
With each generation of Virtex
FPGAs, Xilinx has taken advantage of
the latest process node to fabricate
devices that offer greater capacity,
higher performance, and lower price.
For the Virtex-4 family, we went even
further to achieve cost reduction.
As we strive to expand the use of
Virtex FPGAs into new markets and
geographies, we see that our customers
have different requirements that vary
with the complexity and target price
for the systems they are creating. Using
our propriety ASMBL (pronounced
"assemble") architecture (see Figure 1
and, "ASMBL Architecture
Enables Cost-Optimized Platforms"), we have assembled three different platforms
(Figure 2) with an initial offering
of 17 devices that deliver
cost-optimized solutions for the widest
range of high-performance electronic
systems.
ASMBL Architecture Enables Cost-Optimized Platforms
With traditional FPGA architectures, increasing the size of the devices to meet the
demands for greater logic capacity and more memory typically results in parallel
scaling of all the advanced features on the die, rapidly increasing cost.
To solve this inefficiency, Xilinx introduced a radical new architecture that enables
us to offer a new generation of Virtex FPGAs providing the broadest range of capabilities
in three unique platforms with feature mixes optimized to meet the requirements
of different application domains. The ASMBL (Advanced Silicon Modular
Block) architecture enables Xilinx to scale the capabilities and capacity of Virtex
FPGAs independently of one another and rapidly assemble multiple platforms. |
- Integrated IP reduces the customer's
bill of materials and saves FPGA
resources.
Virtex-4 FPGAs reduce system cost
with abundant integrated IP. By incorporating
many functions that find use
in a broad range of applications, Virtex-4 FPGAs replace a number of discrete
components commonly found on system
boards.
Designers can take advantage of
embedded PowerPCTM processors, up
to 10 Mb of embedded dual-port
RAM/FIFO, integrated Ethernet
MACs, sophisticated DSP circuitry,
and on-board serial transceivers,
among other features. This helps our
customers lower system cost in several
ways: by reducing component count
and streamlining logistics with a smaller
bill of materials; by simplifying the
design and manufacturing of system
hardware; by easing PCB design and
manufacturing; and by improved system
reliability through the reduction
of solder joints.
In addition, building dedicated circuits
on the FPGA provides required
functionality efficiently, while preserving
the programmable logic fabric
for customers to add the value of their
proprietary designs. The result is more
capability within a single package at a
given price point.
Up to 80% Additional
Cost Reduction with EasyPath
The EasyPath¢â program further lowers
system cost for customers who are ready to
take their finished design to volume production.
Xilinx creates customized test programs
for EasyPath customers that exercise
only the device resources used in the specific
design. This approach shortens test time
and increases yield to reduce FPGA unit
price up to 80%.
Source Synchronous Interfacing
To ensure reliable data transfer between a
new generation of high-speed devices, hardware
designers are turning to source-synchronous
design techniques, in which the
component sending the data generates and
issues its own clock signal along with the
data that it transmits. This technique eliminates
one set of problems associated with
parallel interfaces, but introduces its own
circuit design challenges. ChipSync technology
significantly simplifies component
interface design with critical built-in circuitry
that is available in every Virtex-4 I/O
(see , "Virtex-4 Solves Source-Synchronous Design Challenges").
Virtex-4 FPGA Features at a Glance
- Largest logic capacity
- Up to 200,000 logic cells
- Largest memory capacity
- Highest performance
- 500 MHz Xesium clocking
technology
- Expanded clocking resources
- Enhanced clock precision
- Reduced clock jitter and skew
- Simplified source-synchronous
interfacing
- Complete serial connectivity
solution
- 622 Mbps – 11.1 Gbps
RocketIO transceivers
- Higher performance,
low-power DSP
- Simplified processor acceleration
- PowerPC 405 processor with
auxiliary processor unit (APU)
controller interface
- Integrated Ethernet MAC
- Fourth-generation design security
|
Embedded Processing
Embedded developers have already used
Xilinx processor solutions to create thousands
of designs. As we talked to these
developers about the requirements for their
next-generation systems, several common
themes emerged.
A Full Range of Processing Solutions
Engineers need a range of processing solutions
to match the requirements of different
tasks, ranging from simple control functions
to advanced algorithms and highspeed
calculation. In addition, they want
the different solutions to share a common
design environment.
Xilinx satisfies these requirements with
a range of processors that includes the
PicoBlazeTM eight-bit microcontroller soft
core, the MicroBlazeTM 32-bit general
purpose processor soft core, and the
industry-standard PowerPC architecture,
in the form of a performance-optimized
hard core.
Efficient Hardware Acceleration
Using an FPGA with an embedded processor
as a platform for programmable system
design enables flexible partitioning of functionality
into hardware and software.
Immersing the processor in the FPGA logic
fabric opens the door to the additional flexibility
of creating custom hardware to accelerate
the execution of critical software.
Hardware acceleration enables designers to
apply logic resources to achieve performance
exactly where needed.
Creating hardware (tightly coupled to
the CPU) to act on a set of operands can
accelerate the execution of key software by
performing in a single cycle calculations
that take many cycles on a processor. This
performance boost is achieved by tuning
the hardware design to provide the degree
of parallelism required by the algorithm.
High Performance,
Flexible Hardware Acceleration
Creating accelerators for FPGA-based
processors requires three elements: programmable
logic fabric for building the
custom hardware; unassigned address space
for the new instruction; and a low-latency
path between the processor and the acceleration
hardware. Xilinx provides the most
efficient integration of microprocessor and
FPGA fabric with dedicated interfaces that
save clock cycles by eliminating bus overhead;
are decoupled from the CPU to
enable implementation of multiple accelerators;
and do not stall the pipeline crucial
to RISC performance.
All Virtex FPGAs have abundant programmable
logic resources suitable for
building acceleration hardware. Xilinx
enables efficient accelerator integration for
the MicroBlaze soft processor core with the
Fast Simplex Link (FSL). The MicroBlaze
processor supports up to 32 input and 32
output FSL, and code development is easy
with simple programming for blocking and
non-blocking instructions.
Virtex-4 FX devices include up to two
PowerPC hard processor cores. Xilinx first
introduced the immersed PowerPC 405
core in the Virtex-II ProTM family. For the
Virtex-4 family, Xilinx has increased
processor performance to 680 DMIPS at
450 MHz and reduced power consumption
to 0.44 mW/MHz while maintaining
compatibility with all software and IP created
for the first-generation core.
A new auxiliary processor unit (APU)
controller simplifies the integration of
acceleration hardware for the PowerPC
core by providing a direct interface
between the CPU pipeline and the FPGA
logic fabric. This ultra-low-latency architecture
enhances performance by reducing,
by a factor of ten, the number of bus cycles
needed to access the accelerator hardware.
The net result is a 20-fold increase in
processor-accelerator efficiency.
High-Speed Connectivity
When we asked system developers to
describe their connectivity requirements,
they highlighted the need for performance
to support emerging standards and flexibility
to upgrade today's designs to meet
future bandwidth requirements. They are
looking for solutions that offer bandwidth
greater than 3.125 Gbps, provide complete
support for multiple communication
standards, and maintain the highest possible
signal integrity.
Our third-generation RocketIO multigigabit
transceiver satisfies these requirements
with the industry's broadest
operating range and other enhancements.
Virtex-4 FX FPGAs enable bridging
between just about any serial or parallel
connectivity standard. For example, the
third-generation RocketIO multi-gigabit
transceivers provide compliance with the
PCI Express standard, with support for
out-of-band signaling (electrical idle and
beaconing) and spread-spectrum clocking.
To address the challenges of backplane
and other high-speed connectivity designs,
RocketIO multi-gigabit transceivers provide
comprehensive equalization techniques
to ensure signal integrity in a wide
variety of applications (Table 1). These
advanced equalization techniques enable
engineers to give new life to old systems by
upgrading legacy backplanes.
Table 1 – RocketIO features at a glance
- Third-generation multi-gigabit transceivers
- Operating range: 622 Mbps – 11.1 Gbps
- Channels: 24
- Transmit pre-emphasis
- Decision feedback equalization (DFE)
- 8b/10b and 64b/66b encode/decode
- Sonet jitter compliant at OC-12
and OC-48 line rates
|
In addition, Virtex-4 FX devices
include built-in Ethernet connectivity,
enabling seamless chip-to-chip connections
without consuming programmable
logic resources. The Ethernet MAC core
supports 10/100/1000 Mbps data rates
with UNH-verified standards compliance
and interoperability.
High-Performance DSP
Developers told us they need to achieve
higher DSP performance targets to implement
next-generation applications such as
MPEG-4 video compression/decompression
and multi-channel mobile communications.
Scaling existing DSP
implementations to meet these targets
with multiple programmable DSPs or
dedicated ASIC hardware can be prohibitively
expensive. Designers also need to
control system power consumption as
they squeeze more functionality into
smaller form factors.
To address new DSP performance
requirements, Xilinx crafted the versatile
XtremeDSP slice, providing twice the
DSP performance of previous implementations
while drawing less than 1/7th of
the power. Although all Virtex-4 FPGAs
contain XtremeDSP slices, the Virtex-4
SX platform provides the highest ratio of
XtremeDSP slices to other resources.
The largest SX device, the
XC4VSX55, has 512 slices. Using these
500 MHz XtremeDSP slices with 18 x
18-bit multiplier and 48-bit accumulator
exclusively, this device can achieve 256
GMAC/s performance at a very aggressive
price point, providing the most powerful
DSP capabilities of any FPGA in
the industry. Demonstrating the revolutionary
flexibility of the multi-platform
approach enabled by the ASMBL architecture,
the DSP-optimized SX55 offers
ten times the DSP value, as measured in
GMACs/dollar, compared with previous-generation
FPGAs.
Xilinx is helping DSP developers close
the gap between the performance of programmable
single-MAC DSPs and the
requirements of advanced algorithms with
Virtex-4 SX platform FPGAs. Virtex-4
FPGAs can serve alongside programmable
DSPs as pre-processors or co-processors to
offload compute-intensive tasks.
Conclusion
To learn more about how you can take
advantage of the breakthrough capabilities
and performance of Virtex-4 FPGAs
in your next system, please visit our website
at www.xilinx.com/virtex4/.
Virtex-4 Solves Source-Synchronous Design Challenges
Source-synchronous interfaces typically send signals at bandwidths of up to
1 Gbps or higher on each channel. FPGA logic circuitry has difficulty processing
incoming signals at that speed, so the frequency must be reduced by
converting serial data on each channel to parallel data as it enters the device.
Conversely, transmission requires converting parallel data to serial format.
Traditionally, this process involves multiple stages of dividing down or multiplying
up the speed. The steps required to meet the setup and hold requirements
are laborious and time-consuming.
ChipSync technology simplifies design and boosts performance with an
embedded SERDES that serializes and de-serializes parallel bus interfaces to
match the data rate to the speed of the internal FPGA circuits. ChipSync
technology enables data rates greater than 1 Gbps for differential I/O, and
over 600 Mbps for single-ended I/O. This ability simplifies the design of
interfaces such as SPI-4.2, XSBI, and SFI-4, as well as RapidIO and
HyperTransport.
Each channel and clock follows a slightly different route through the printed
circuit board. Ensuring reliable data capture requires satisfying the setup and
hold times of each channel. With communication interfaces of eight channels
and higher, and with memory buses up to 144 bits wide, this can be an
extremely challenging task.
ChipSync technology simplifies the implementation of communication and
high-speed memory interfaces (including DDR 2 SDRAM, QDR II SRAM,
FCRAM II, and RLDRAM II) by compensating routing issues that produce
skew between data and clock signals. Built-in circuitry enables the delay of
each data and clock channel within the SelectIO¢â block, in 78 ps increments,
to meet the setup and hold requirements for reliable data capture.
For extreme levels of skew, the misalignment might be greater than a bit
interval. Aligning bits helps read the data reliably, but some channels might
be out of step with others. To address extreme levels of skew, greater than a
bit interval, ChipSync technology provides a bitslip capability. An optional
training pattern simplifies the task of aligning data words across all channels.
With source-synchronous design, each interface has its own clock. As multiple
interfaces and memories are connected to the same FPGA, the need for
numerous flexible clock resources grows. With clock-aware I/Os, ChipSync
technology enables simultaneous implementation of multiple source-synchronous
interfaces.
Xesium clocking makes this possible with up to 24 clock regions per device.
Each region can have up to six I/Os acting as clock sources for data capture.
Up to 95 I/Os can be clocked by a single I/O clock, providing great clock
flexibility and a large number of clocks.
|
Printable PDF version of this article with graphics. (10/25/04) 300 KB |