Profiling and Optimization

There are two distinct areas to be considered when performing algorithm optimization in the SDSoC™ Environment:
  • Application code optimization
  • Hardware function optimization

Most application developers are familiar with optimizing software targeted to a CPU. This usually requires programmers to analyze algorithmic complexities, overall system performance, and data locality. There are many methodology guides and software tools to guide the developer identifying performance bottlenecks. These same techniques can be applied to the functions targeting hardware acceleration in the SDSoC Environment.

As a first step, programmers should optimize their overall program performance independently of the final target. The main difference between SDSoC and general purpose software is that, in SDSoC projects part of the core compute algorithms are pushed onto the FPGA. This implies that the developer must also be aware of algorithm concurrency, data transfers, memory usage/consumption, and the fact that programmable logic is targeted.

Generally, the programmer must identify the section of the algorithm to be accelerated and how best to keep the hardware accelerator busy while transferring data to and from the accelerator. The primary objective is to reduce the overall computation time taken by the combined hardware accelerator and data motion network versus the CPU software only approach.

Software running on the CPU must efficiently manage the hardware function(s), optimize its data transfers, and perform any necessary pre- or post- processing steps.

The SDSoC Environment is designed to support your efforts to optimize these areas, by generating reports that help you analyze the application and the hardware functions in some detail. The reports are automatically generated when you build the project, and listed in the Reports view of the SDx IDE, as shown in the following figure. Double-click on a listed report to open it.

Figure: Report View

The following figures show the two main reports: the HLS Report, and Data Motion Network Report.

To access these reports from the GUI, ensure the Reports view is visible. This view is typically below the Project Explorer view. You can use the Window > Show View > Other menu command to display the Reports view if it is not displayed. See Working with SDx for more information.

Figure: HLS Report Window

The HLS Report provides details about the High-Level Synthesis process (HLS). This tasks translates the C/C++ model into a hardware description language responsible for implementing the functionality on the FPGA. This lets you see the impact of the design on the hardware implementation. You then optimize the hardware function(s) based on the information.

Figure: Data Motion Network Report

The Data Motion Network Report describes the hardware/software connectivity for each hardware function. The Data Motion Network table shows (from the right most column to the left most) what sort of datamover is used for transport of each hardware function argument, and to which system port that datamover is attached. The Pragmas shows any SDS based pragmas used for the hardware function.

The Accelerator Callsites table shows the following:
  • The Accelerator instance name.
  • The Accelerator argument.
  • The name of the port on the IP that pertains to said argument.The name of the port on the IP that pertains to the accelerator argument. Typically this is the same as the argument, unless multiple arguments are bundled into a single port.
  • The direction of transfer
  • The size, in bytes, of data to be transferred. If the compiler cannot deduce the size to be transfered, this value is set to zero.
  • List of all pragmas related to this argument
  • <system port>:<datamover>, if applicable. Indicates which platform port and which datamover will be used for transport of this argument.
  • Estimated CPU cycle times for configuration of data movers, and transfer of data.
Generally, the Data Motion report page indicates first:
  • What characteristics are specified in pragmas.
  • In the absense of a pragma, what the compiler was able to infer.

The distinction is that the compiler might not be able to deduce certain program properties. In particular, the most important distinction here is cacheability. If the DM report indicates cacheable, and the data is in fact uncacheable [or vice versa], correct cache behavior would occur at runtime; it is not necessary to structure your program such that the compiler can identify data as being uncacheable to remove flushes.

Additional details for each report, as well as a profiling and optimization methodology, and coding guidelines can be found in the SDSoC Profiling and Optimization Guide .