Today’s FPGA architectures in advanced silicon nodes at 16nm and above present designers with fantastic opportunities to integrate large amounts of logical functionality with very high performance targets. Often “time to market” is a critical factor in the success of these products, and that means that designers must quickly converge in timing closure to meet performance targets. The ability to accelerate the implementation phase of the design cycle is of critical importance, and to get there designers need 3 things:
The Vivado Design Suite delivers the best implementation tools with significant advantages in performance, runtime and memory consumption, and we will cover some of reasons for that below. But the best tools will still struggle if given unrealistic designs, and therefore methodology is very important to deliver the benefits that will allow you to converge more quickly on larger and higher performance designs. Xilinx has compiled an extensive list of methodology recommendations in UG949, UltraFAST Design Methodology. At the heart of the methodology recommendations is world class analysis and reporting capabilities delivered by Vivado. These reports allow the designer to cross-probe from the report to the schematics and the device view as well as the exact line in the RTL code where the object was inferred. The timing analysis engine includes highly customizable queries that makes debugging timing closure challenges possible. Here is a list of important reports that will help designers accelerate implementation:
The Vivado® Design Suite Analytical Place and Route technology delivers more predictable design closure by concurrently optimizing for multiple variables: timing (T) but also interconnect related metrics such as congestion (C) and wire length (W). The analytical placer sets the Vivado Design Suite apart to stay a generation ahead. The graph below illustrates an example of a multi-variable cost function solved analytically by the Vivado Design Suite.
Fig.1 Optimizing for multiple variables
Competitive solutions are based on simulated annealing placement, a technology using random initial placements followed by random moves, trying to find a local minima of a global metric (typically a timing cost), but unable to handle local metrics such as congestion. Only the Vivado Design Suite scales for today’s device density and interconnect delays.
Fig.2 Traditional P&R Algorithm
The Vivado Design Suite accelerates implementation by delivering more turns per day while helping to eliminate them altogether. Vivado’s analytical placer delivers 4X faster runtimes and half the memory footprint of competing solutions.
Fig.3 The graph above highlights both the run time advantage and the predictable behavior of the Vivado place and route engine. Run times are consistently up to 4x faster than alternative solutions while the variance in results is much tighter enabling design closure with fewer iterations.
The Vivado Design Suite runtime advantage increases, over competing solutions, with design complexity, as defined by:
The Vivado analytical Place and Route technology, mathematically finds an implementation solution that optimizes density (wire length) and routability (congestion). As a result, competitive results show:
Fig.4 Vivado Runtime advantage increases with design complexity compared with competing solutions.
For illustration purposes, we selected an Ethernet Media Access Controller. The design is then stamped repeatedly to gradually fill up a Virtex UltraScale® VU095 FPGA device and compare to the closest competitor 1,115,000 LCs offering:
How Vivado can push device utilization higher…
Xilinx UltraScale™ architecture offers truly independent LUTs which can be routed at very high rate of utilization with Vivado. The software can reach 99% LUTs utilized and still place and route the design and meet timing! By contrast the competitor LUT device utilization cannot reach full device utilization (it stops at 64% in this example), it fails to place and route long before being able to use all LUTs in the device. It’s in fact not that surprising that the competitor’s LUTs can rarely be used at a satisfactory level of utilization considering that their physical cluster is often limited to only use one LUT leaving the other unusable.
In conclusion, Vivado place and route technology has been designed to handle dense and challenging designs and can reach high levels of LUT utilization enabling the user to put more logic into the device.When comparing devices that are similar in size as per their logic cell (LC) count, Xilinx UltraScale FPGAs can pack more logic through Vivado advanced algorithms.
Performance depends on all 3 variables that the Vivado Analytical Place and Route optimizes for: timing, congestion and wire length.
Just like for the runtime comparison, the benchmark suite above shows that the across the 7 series devices, performance advantage increases with design complexity. For simple to medium complexity designs, the performance advantage varies in these ranges:
Fig.6 Vivado’s Performance Advantage as a function of design complexity.
Again, for high complexity designs, the Vivado Design Suite is the only implementation solution, where the competition reaches its algorithmic limit.
Because Vivado’s Analytical Place and Route optimizes for short wire lengths, designs inherently consume less dynamic power. Also, Vivado’s default and advanced power optimizations, coupled with technological and architectural power optimization techniques, give the 7 series device family a 35% power advantage over competing solutions.
Fig. 7 Head-to-head Application Benchmarks: ~35% Average Power Savings at the same performance.