Build Targets

The build target of the Vitis™ tool defines the nature and contents of the FPGA binary (.xclbin) created during compilation and linking. There are three different build targets: two emulation targets used for validation and debugging purposes: software emulation and hardware emulation, and the default system hardware target used to generate the FPGA binary (.xclbin) loaded into the Xilinx® device.

Compiling for an emulation target is significantly faster than compiling for the real hardware. The emulation run is performed in a simulation environment, which offers enhanced debug visibility and does not require an actual accelerator card.

Table 1. Comparison of Emulation Flows with Hardware Execution
Software Emulation Hardware Emulation Hardware Execution
Host application runs with a C/C++ or OpenCL™ model of the kernels. Host application runs with a simulated RTL model of the kernels. Host application runs with actual hardware implementation of the kernels.
Used to confirm functional correctness of the system. Test the host / kernel integration, get performance estimates. Confirm that the system runs correctly and with desired performance.
Fastest build time supports quick design iterations. Best debug capabilities, moderate compilation time with increased visibility of the kernels. Final FPGA implementation, long build time with accurate (actual) performance results.

Software Emulation

The main goal of software emulation (sw_emu) is to ensure functional correctness of the host program and kernels. Software emulation provides a purely functional execution, without any modeling of timing delays, or latency; it does not give any indication of the accelerator performance.

The kernel code is always compiled and running natively. The application code is either:

  • Compiled and running natively on an x86 processor (Data Center platforms)
  • Cross-compiler to the Arm® processor and running in an emulator (Embedded platforms)

Thus, software emulation is typically used for algorithm refinement, debugging functional issues, and letting developers iterate quickly through the code to make improvements. The software programming model of fast compilation and run iterations is preserved.

The v++ compiler does the minimum transformation of the kernel code to create the FPGA binary to run the host program and kernel code together. Software emulation takes the C-based kernel code and compiles it with GCC. It runs each kernel as a separate C-thread. If there are multiple compute units of a single kernel, each CU is run as a separate thread. Therefore, it mimics the parallel execution model of the hardware. However, within each kernel the execution is modeled sequentially although there might be parallelism within a kernel when running on hardware. The software emulation driver implements the XRT API and acts as a bridge between the user application running XRT and the device process modeling the hardware components.

TIP: For RTL kernels, software emulation can be supported if a C model is associated with the kernel. The RTL Kernel Development Flow provides an option to associate C model files with the RTL kernel for support of software emulation flows.

The following describes the software emulation limitations:

  • There is a global memory limit of 16 GB which should not be exceeded for simulation purposes.
  • Software emulation is not supported for AI Engine kernels.
  • Software emulation does not support AXI4-Stream Interfaces without Side-Channels (see Vitis High-Level Synthesis User Guide (UG1399).
As discussed in Vitis Compiler Command, the software emulation target is specified in the v++ command with the -t option:
v++ -t sw_emu ...

You can use the GDB debugger for both the host application and the kernel code, set break points or use printf() to print information and checkpoints. For details on how to debug the host application or the kernel during software emulation, refer to Debugging in Software Emulation.

Hardware Emulation

Hardware emulation runs an RTL simulation of the programmable logic design, where the PL kernels are integrated with a cycle-approximate model of the hardware platform.

Hardware emulation is especially useful for the following tasks:

  • Checking the functional correctness of the RTL code synthesized from the C, C++, or OpenCL kernel code
  • Testing the interactions between different kernels or multiple CUs
  • Using hardware waveforms to gain detailed visibility into internal activity of the kernels
  • Getting initial performance estimates for the application

Each kernel is compiled to a hardware model (RTL). During hardware emulation, kernels are run in the Vivado logic simulator, with a waveform viewer to examine the kernel design. Some third-party simulators are also supported as described in RTL Simulator Support. In addition, hardware emulation provides performance and resource estimates for the hardware implementation.

SystemC models are provided for the key IP used in the hardware platform, like Versal NoC/DDR memory, CIPS, PS block, AI Engine, UltraScale+ MIG DDR memory, and AXI4 SmartConnect. These IP models are used during hardware emulation to improve simulation performance and results.

In hardware emulation, compile and execution times are longer than software emulation, but it provides a detailed, cycle-accurate, view of kernel activity. Xilinx recommends using small data sets for validation during hardware emulation to keep runtimes manageable.

IMPORTANT: The DDR memory model and the memory interface generator (MIG) model used in hardware emulation are high-level simulation models. These models provide good simulation performance, but only approximate latency values and are not cycle-accurate like the kernels. Therefore, performance numbers shown in the profile summary report are approximate, and should be used for guidance and for comparing relative performance between different kernel implementations.

As discussed in Vitis Compiler Command, the hardware emulation target is specified in the v++ command with the -t option:

v++ -t hw_emu ...

System Hardware Target

When the build target is the hardware, v++ builds the FPGA binary for the Xilinx device by running Vivado synthesis and implementation on the design. It is normal for this build target to take a longer period of time than generating either the software or hardware emulation targets in the Vitis IDE. However, the final FPGA binary can be loaded into the hardware of the accelerator card, or embedded processor platform, and the application can be run in its actual operating environment.

As discussed in Vitis Compiler Command, the system hardware target is specified in the v++ command with the -t option:

v++ -t hw ...