Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Hardware/Software Co-Verification



by Ross Nelson, Seamless FPGA Product Manager, Mentor Graphics Corporation
ross_nelson@mentor.com (7/15/05)


Gain full visibility into your software and hardware – and achieve a faster design iteration loop in the process.
article link to PDF
Article PDF 275 KB


You’ve probably been there: clever detective work leads you to a small change in the HDL for your embedded processor-based design. Now you just have to run synthesis, place and route, and darn ... you suddenly realize it will be another day before you can see the result.

Large devices allow you to stuff a whole system into the FPGA, but debugging these complex systems with limited visibility – and a one-day turnaround – can consume weeks of your precious time.

Hardware/software co-verification has been successfully applied to complex ASIC designs for years. Now available to FPGA designers, Seamless FPGA from Mentor Graphics brings together the debug productivity of both a logic simulator and a software debugger. Seamless FPGA co-verification enables you to remove synthesis and place and route from the design iteration loop, while yielding performance gains 1,000 times faster than logic simulation.

Shortening the Design Iteration Loop
Because development boards are readily available, many FPGA designers incorporate them into the highly iterative design loop. Unfortunately, the development board brings major overhead to every design iteration. This overhead comes in the form of logic synthesis, followed by place and route. Although necessary to produce a final design, you can remove these time-consuming steps from the highly iterative design debug loop by targeting simulation as the verification platform.

With simulation as the verification engine, the only overhead between editing the HDL and verification becomes a relatively quick compile of your HDL. The time you can save on your next embedded FPGA is easy to calculate: How many times did you run place and route on your last FPGA design? And how long did place and route consume your PC for each run?

It’s true that simulation runs slower than the real-time speed of a development board. Seamless FPGA provides some innovative ways to dramatically increase the rate at which your embedded software simulates. The increase in a typical system is several orders of magnitude.

Improving Hardware and Software Visibility
To debug your FPGA design, you need full and clear visibility. You need to know what is happening in the hardware and what the software is doing. You need to be able to change a register, or force a signal to a different state. Sometimes you need to be able to stop time and take a closer look. The more visibility you have, the more quickly you can see the problem or prove you have resolved the bug.

Hardware Visibility
Probing inside or even on the pins of your FPGA is a challenge. The ChipScope™ Pro analyzer from Xilinx® helps with this, but in a logic simulator (in addition to viewing every signal) you can also change their values. Working from your source HDL, you can step through the code, view variables, or stop time. For detailed, immediate, and hassle-free visibility, it is hard to beat logic simulation.

Software Visibility
Software visibility in logic simulation is another item with which to contend. Running the fully functional processor model allows you to execute software, but knowing what is in R3 of the processor is almost impossible if you are given only waveforms.

Co-verification provides an enhanced processor model connected to a software debugger. In the Mentor Graphics XRAY debugger, you can view and change everything from registers to memory, stack, and variables. XRAY also provides a source code view with symbolic debug. You can step through code at the source or assembly level and use breakpoints to halt execution or run powerful macros.

If you are using the Accelerated Technology Nucleus real-time operating system (RTOS), you can view the status of tasks, mailboxes, queues, pipes, signals, events, semaphores, and the memory pool.

Much Faster Than Logic Simulation Alone
Running substantial amounts of software on a standard processor model in logic simulation is not practical; the run times are just too long. However, running this software actually turns out to be one of the most effective verification strategies available. The payoff for running diagnostics, device drivers, board support package (BSP) code, booting the RTOS, and running lowlevel application code is huge. It is not surprising that verifying hardware – by putting it through its paces the way the software will actually use it – is effective. Similarly, the software is tested against the actual design (including any external board-level components that are included in the simulation) before the board is actually built.

The challenge has always been to run enough software to really boot the system and do something interesting. Co-verification is able to speed up the run time by taking advantage of one simple observation: most of the simulation time is spent re-validating the same processor-to-memory path. Although you need to test your memory subsystem and try several dozen corner cases, you don’t need to repeat those same tests over again every time you fetch an instruction from memory. Similarly, you need to verify that the processor can push a value on the stack and pop it off again with the correct result, but repeating this test every time a software function is called would be overkill.

Accesses to hardware peripherals always generate bus cycles in the logic simulation, but instruction fetches and stack operations can typically be offloaded for faster execution. By allowing you to specify which bus cycles are run in the logic simulator and which are not, Seamless FPGA allows you to make the performance tradeoff. And you can change this specification at any time during your simulation session. You can run through reset with full cycle-accurate behavior, and then switch off instruction fetches and stack accesses to boot the RTOS.

Accessing memory through the logic simulator requires several hardware clock cycles. Each clock cycle requires significant work in the logic simulator as it drags along the heavy weight of all the other logic in your FPGA. Using a “back door” to directly access the memory contents, instead of running the bus cycle in the logic simulator, allows accesses to occur many orders of magnitude faster.

The speedup is very significant. For example, the following data is from a typical design configuration with a PowerPC™ running Nucleus on the Xilinx Virtex™-II Pro FPGA. Booting the Nucleus RTOS in logic simulation alone requires 12 hours and 13 minutes. The same task with these techniques employed accomplishes the task in only six seconds – 7,330 times faster.

Using this technique, Seamless FPGA maintains one coherent view of memory contents through a back door into Xilinx block RAM memory models or any other memory device. So if your DMA controller drops something into memory that the processor later executes, it will still all work together correctly. And if the processor generates a large data packet and instructs hardware to transmit it using DMA, there are no data inconsistencies.

Identifying Processor Bus Bottlenecks
The performance of your FPGA platform can be seriously impacted by the memory structure of the design. What should be located in cache versus block RAM or external memory? Where are the bottlenecks? Do other bus masters starve the processor? Questions like these are important, but getting the answers can be difficult without real data from your hardware/software application.

Seamless FPGA gathers performance data from the simulation and displays it graphically in the system profiler (Figure 1), enabling you to identify:

  • Which functions are consuming most of the CPU time
  • Unexpected lulls or bursts of activity
  • Cache efficiency and memory hot spots
  • Code execution and duration at the function level
  • Bus utilization and bus master contention
Ease of Use and Integration
Seamless FPGA is easy to use and set up. Using the knowledge you have already entered in Xilinx Platform Studio (XPS), Seamless FPGA automatically configures itself to co-verify your design. You may already know how to use ModelSim, and Seamless FPGA leaves the full functionality and user interface unchanged. The XRAY software debugger uses many of the same menu icons for operations like step, step over, and run.

To set up Seamless FPGA, simply choose File > Import from Xilinx Platform Studio and specify your XPS project file. The import process does all of the setup steps and in about one minute proceeds to invoke ModelSim and the XRAY debugger. If you have two or more Xilinx processors in your design, you will have additional software debugger windows, one for each processor.

Once ModelSim and XRAY have been invoked (Figure 1), you are ready to verify your design. In ModelSim, enter any stimulus commands needed – typically this is reset and clock, plus any design-specific stimulus – and then click “run.” In XRAY, click “go” or “step” to start stepping through your embedded code. By default, all bus cycles are routed to the hardware simulation.

To increase software execution speed, three icon selections are provided. These icons are labeled “optimizations” because they increase the rate of software execution by directing Seamless FPGA to access memory contents through a back door without requiring the logic simulator to run every bus cycle. The first button directs all instruction fetch cycles to use the back door. A second button allows you to specify any number of address ranges, which use the back door. When accesses use the back door, you can either choose to keep advancing the logic simulation in lock step with the software or remove that requirement.

The optimization settings can be changed at any time on the fly during a simulation session. This allows you to quickly run to a certain point in your software, and then enable all bus cycles for detailed cycle-accurate verification.

Conclusion
With large FPGA designs employing embedded processors, it’s not possible to complete a design in a few weeks. These designs are very sophisticated; unfortunately, so are the bugs that you must track down and resolve to produce an effective system on schedule.

Software content in your FPGA can bring lower system costs, higher configurability, and increased functionality. But software doesn’t execute alone – it interfaces with hardware, and the hardware/software interface often stretches across disciplines and design teams.

Seamless FPGA bridges the hardware/software gap with a productive software and hardware debug environment that provides the visibility to find bugs and performance bottlenecks efficiently. And once you have fixed them, you can quickly turn the fix and verify it, without having to wait for your PC to rumble through place and route for hours on end.

Try Seamless FPGA on your design today. For your free 30-day evaluation copy, visit www.seamlessfpga.com. The included example design and Quick Start Guide will get you up and running in no time. For more information, e-mail seamless_fpga@mentor.com.

Printable PDF version of this article with graphics. PDF logo (7/15/05) 275 KB

 
Jobs Events Webcasts News Investors Feedback Legal Privacy Trademarks Sitemap
© 1994-2008 Xilinx, Inc. All Rights Reserved.