Support|documentation

  Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Emerging Design Methodologies Elicit the Power of Virtex-4 FPGAs



by Darren Zacher, Technical Marketing Engineer, Mentor Graphics Corporation, Design Creation and Synthesis Division
darren_zacher@mentor.com (1/15/05)


Adopt a broader design flow methodology instead of the traditional point-tool approach.
article link to PDF
Article PDF 195 KB


Customers in today’s demanding communications and consumer applications need to attain unprecedented levels of capacity and performance while reducing power consumption and overall cost. With the introduction of high-end devices into the marketplace, more of these applications are being addressed by FPGA solutions.

As professional programmable logic designers, you are always searching for better ways to create value and differentiate your products. To do so effectively, you need to adopt comprehensive, high-productivity design flows instead of point tools to crack new design challenges and take advantage of the benefits of the latest programmable silicon platforms.

Multiple Platforms, Unprecedented Opportunity
With the release of Xilinx® Virtex-4™ devices, you can enjoy twice the density, twice the performance, and half the power consumption of previous Xilinx FPGA families. If you seek sheer DSP performance, you might prefer Virtex-4 SX FPGAs, which offer 256 GigaMAC/s performance for 18-bit operations. The LX family of FPGAs offers higher performance logic; with FX devices, you can explore embedded processing and high-speed serial connectivity applications. These three platforms, comprising a complete selection of 17 devices, collectively offer a compelling alternative to ASICs and ASSPs.

To fully exploit this immense potential, design teams must consider moving away from serial, iterative, point-tool approaches that involve designing or re-designing from scratch. To manage non-recurring engineering time and costs and create efficient, reliable flows, you must clearly identify which of the various “building blocks” you need to focus on when using a platform approach to successfully implement a high-end design.

Typical building blocks may include:

  • Intellectual property such as internal company, Xilinx, or third-party IP
  • Lower-level blocks used in the context of a bottom-up design flow
  • Algorithms via C or C++ or MATLAB™
  • RTL blocks
  • Embedded processors
  • I/O interfaces
By using a comprehensive, methodical design flow, you can effectively optimize these blocks in a multimillion-gate device.

As high-end FPGAs approach ASIClevel performance, designers are adapting many advanced ASIC techniques for FPGA design. The complex FPGA design flow shares some commonality with ASIC design; for instance, RTL simulation remains basically unchanged. But certain subtle differences exist under the hood, and many steps are fundamentally different. The pre-built nature of FPGAs implies a “use or lose” approach to features or capabilities, so you must match functional requirements with the device architecture. Thus, common steps such as synthesis or place and route all differ subtly in the FPGA domain.

You can use C++ synthesis techniques borrowed from ASIC flows to target FPGAs. C++ specifications are much less tied to any specific hardware than the corresponding RTL code.

Another technique, physical synthesis, illustrates the subtleties involved when the same general approach is used for both ASICs and FPGAs. Physical synthesis requires a detailed understanding of the FPGA’s hardware structure. At the very least, physical synthesis tools must be more specifically targeted to FPGA architectures.

A typical high-end FPGA design flow should encompass such tasks as:

  • Early design rule checking
  • Higher level design abstraction
  • Functional and system-level simulation and verification
  • Advanced physical synthesis techniques
Let’s describe each of these in more detail.

Integrated Approach to Design Creation
In terms of design entry, the need to create faster, larger, and complex designs packed into the latest FPGA devices within the shortest possible time presents significant challenges. The high availability of configurable logic in platform FPGAs that include hard ASIC macros – such as embedded processor blocks and complex I/O standards – has truly enabled programmable SoC, where a serialized design approach would not work. Only a systemlevel RTL design concept, used in parallel with multiple aspects of managing and optimizing the high-level design creation process, will ensure success.

Large design projects mandate the collaboration of several engineers or engineering teams, often belonging to separate companies and typically distributed in different geographic locations worldwide. This team-based approach raises the importance of a consistent design coding style for teams to share code effectively.

Teams invariably comprise experienced project leaders and designers alongside less experienced junior engineers working on the various building blocks of a design. The resulting skill diversity makes the need for consistency critical. It is imperative that companies carefully scrutinize the planning and creation process to identify poor design styles, incorrect design rules, and syntax/semantic errors at the earliest possible stage before even attempting to tie the building blocks together or simulate/synthesize the design.

In bigger designs, it is not unusual for multidisciplinary design teams to focus on and optimize only a portion of the device. As the system is defined in RTL by combining both vendor and internal IP (and for those applications utilizing DSP functionality, RTL generated algorithmically), you will need an integrated system design approach to help synchronize the development of each specific part of a large, high-capacity FPGA.

From the configuration of the embedded processor to logic development and high-speed I/O assignment, the ideal synchronization of these teams and processes is required to deliver an optimized field-programmable SoC. The merging and management of these multiple disciplines to generate the system-level RTL and associated design files is a huge task best handled by a comprehensive and flexible environment.

To reduce development cost and time to market, 80-90% of projects may now include both re-work of an existing design as well as reuse of previously designed components or IP, whether internal or purchased. Because this trend is expected to increase, you need to ensure that your components/subsystems are designed to be reusable and conform to established design reuse rules.

Through cooperative efforts in the design community and internal corporate standardization, the industry has developed a number of reuse methodology guidelines that can be checked using automated tools. Tools such as Mentor Graphics® HDL Designer Series™ (HDS) can help design teams successfully integrate both hard and soft IP (such as PowerPC™ and MicroBlaze™ processors). Larger designs at higher speeds have prolonged traditional simulation cycles.

Similarly, synthesis can become a protracted, iterative process in order to achieve desired performance goals. You need to maximize the productivity of potentially long EDA tool runs by ensuring that as many code errors as possible are found and fixed before the start of simulation and synthesis (Figure 1).

Equally important are integrated connections to advanced tools such as DesignAnalyst™ and Precision® Synthesis from Mentor Graphics to ensure against errors and reduce iterations, as well as integration with any third-party EDA tools through a flexible integration mechanism. Through static design checking or “linting” products, you can perform many different forms of checking during the design creation process.

Interactive HDL visualization and creation tools provide automatic documentation features and reporting as well as intelligent debug and analysis to effectively manage FPGA designs. Moreover, tight bidirectional communications with PCB tools from within the design creation process shorten design cycles by integrating and synchronizing HDL design with PCB design, eliminating time-consuming manual steps.

Higher Abstraction Levels Speed Hardware Design
For the first time, professional design engineers are literally struggling to keep pace with Moore’s Law, which makes it difficult to fully utilize the capacity of 90 nm ASICs or efficiently target the complex structures found in domain-specific FPGAs. Algorithmic C synthesis (Figure 2) promises to raise the abstraction of hardware design by providing a new, more abstract entry point, benefiting both ASIC and FPGA hardware designers. But to understand the need for higher abstraction languages, you must first analyze the problems with existing RTL methodologies.

The design complexity of new DSP applications has outpaced traditional RTL capabilities. To create hardware implementations for blocks of computationally intensive algorithms using RTL, design teams must iterate through several steps, including micro-architecture definition, handwritten RTL, and area/speed optimization through RTL synthesis. This manual process is slow and error-prone. In the final result, both the micro-architecture and technology characteristics become hard-coded into the RTL description. This hard coding renders the whole notion of RTL reuse or retargeting impractical in real applications.

An optimized C-to-RTL synthesis flow not only promotes a higher level of abstraction, it also gives the design team the flexibility to transition from one implementation technology to another. You can tune the hardware for high-performance parallel implementations or smaller, more serial implementations.

Using this approach to describe functional intent (offered in the Mentor Graphics Catapult™ C Synthesis tool), you can move up to a far more productive abstraction level for designing hardware. As hardware designers, you can reduce implementation efforts by as much as 20X while creating a more repeatable and reliable design flow.

The ability to select fundamentally superior micro-architectural alternatives allows you to create designs of better quality than traditional RTL methods. Finally, this approach closes the conceptual gap between algorithm designers modeling in C/C++ and hardware designers working at the RTL abstraction level.

Simulation and Verification Challenges
Using standard RTL verification methods in high-capacity FPGAs quickly diminishes the benefits of faster hardware creation. The current execution speeds of software validation platforms and RTL verification environments are insufficient to quickly test design functionality. Design verification takes significantly longer than design development because of the limited speed of RTL simulators and the time needed to manually create an RTL test bench.

Additionally, C/C++ simulation (although upwards of 10,000X faster than RTL) may be inadequate to validate the original algorithm given the data-intensive nature of DSP designs. These challenges are in fact opportunities for both algorithm development and system validation through the use of accelerated simulation.

High-level design verification flows are now turning to address rapid algorithm validation and verification, using hardware acceleration by leveraging the benefits of a SystemC verification environment. These flows begin with the algorithm designer validating designs in C++ and end with the hardware designer verifying the algorithm in RTL.

This method of using high-level C/C++ synthesis in combination with a SystemC verification environment provides an automated path from algorithm development to synthesized RTL running in an FPGA prototyping environment. Executing the algorithm directly in hardware gives algorithm designers the ability to validate algorithms and hardware designers the ability to validate the entire system at or near real-time speeds.

The use of SystemC as a verification environment permits both algorithm and hardware designers to use the same test bench and test vectors, eliminating the need for manual test bench creation. The combined approach of hardware acceleration of C/C++ algorithms in a SystemC verification environment provides a push-button solution for accelerated algorithm development and system validation.

Balancing the Cost/Timing Closure Equation
An essential step in realizing a high-capacity FPGA design is to optimize that design for both timing and cost. Timing closure challenges are well known. Using stand-alone logic synthesis with place and route can be non-deterministic by nature, especially for large devices.

Designers tend to write and rewrite RTL code and constraints to try and coax the place and route tool to do their bidding. Once you go down this path, you then must iterate through place and route – the most time-consuming step in FPGA design – before gaining any visibility as to whether your changes were a step in the right direction or if they only served to further exacerbate the problem.

Similar to optimization for timing, the process of achieving true “cost closure” involves a reduction in area to reduce FPGA part cost, or a reduction in the total cost of the design by increasing levels of abstraction and design reuse. The irony is that once you attain a successful implementation, any change – no matter how small – in the design or architecture threatens to obsolete that success. This unpredictability negates the reduced cost and time-to-market benefits of using programmable logic in the first place.

Increasing die sizes place additional burdens on the extant methodologies. A large die poses a significant challenge in obtaining repeatable, high-quality placements out of current placement algorithms. The larger die size is now widening the distribution curve of net delays grouped by fanout, the basis behind industry-accepted wire delay models.

This widened distribution has a degrading effect on the accuracy of fanout-based wire delay models. In larger devices, interconnect delay dominates performance for FPGA platforms. Because fanout-based delay estimates in FPGAs struggle to model even a simplified version of physical reality today, you can see why optimization decisions based on a wire-load estimate are often ineffective. Worse, physical proximity cannot always relate directly to delay, so traditional floorplanning falls painfully short. Advanced physical synthesis techniques can solve these issues in several ways.

First, to improve accuracy and reduce design iterations, you must consider real interconnect delay and physical effects up front (Figure 3); combining logic and physical synthesis is critical for the design of larger, high-performance FPGAs. Some physical synthesis alternatives available today are based solely on technology borrowed from the ASIC implementation space.

In reality, forcing an ASIC methodology – and mentality – on the FPGA world cannot work. Such approaches essentially try to outsmart the vendor placement and may show promise in certain situations, but most cannot match the performance of a tool that leverages the FPGA vendor’s postlayout information to provide accurate physically aware synthesis.

Second, FPGA-oriented physical synthesis solutions need to take into account successful implementation experience that you have previously developed. For instance, when you complete a modular design and have optimized performance for a portion of it using physical synthesis, a good tool must ensure that you can take full advantage of these optimizations and reuse them on subsequent designs.

Physical synthesis in FPGAs is growing beyond the ASIC model to be a valuable part of cost minimization and component reuse strategies. When investing in a synthesis tool with a highly deterministic process for improved results, look for technologies and algorithms that not only optimize designs for cost and timing, but also enable you to translate your professional experience and previous design implementations at the physical level into faster time to market in subsequent designs.

Any tool used in professional FPGA design (including the Precision Synthesis tool from Mentor Graphics) should consider FPGA vendor placement results as soon as possible, and only then begin to manipulate the design using physical synthesis – integrated with logic synthesis in a unified data model – to converge on timing at a lower cost.

From Point Tools to ESL Design Flows
Every designer stands poised to benefit from the new standard set by Virtex-4 high-performance FPGAs. The next-generation challenge faced by mainstream FPGA EDA tool vendors is to leverage point-tool expertise and thus meld apparently contradictory trends – higher levels of abstraction on the one hand and greater dependence on specific physical characteristics on the other – into a coherent design methodology and highly productive flow.

In keeping with these advances, EDA tool companies will continue to extend and improve their comprehensive, integrated design flows spanning all levels of abstraction. Mentor Graphics continues to be a technology leader in this space. Designers must take advantage of EDA tools that now address both physical and electronic system-level (ESL) challenges of high-end FPGAs, and thus realize the unprecedented potential of these devices as ASIC replacements in new SoC designs.

To access the latest product news, application notes, and case studies, evaluate new design flows, or schedule a product demonstration, visit www.mentor.com/fpga/.

Printable PDF version of this article with graphics. PDF logo (1/15/05) 195 KB

 
/csi/footer.htm