Analyzing the Results of Synthesis

After synthesis completes, Vitis HLS automatically creates synthesis reports to help you understand the performance of the implementation. Examples of these reports include the Simplified Synthesis Report, Schedule Viewer, and Dataflow Viewer. You can view these reports from the Analysis perspective in the Vitis HLS GUI.

Open the Analysis perspective by clicking Analysis in the upper right corner of the Vitis HLS GUI. The Analysis perspective is provided as a place to view different elements of your project to evaluate the results of synthesis and the performance of your current solution.

By default, the Analysis perspective opens with the Schedule Viewer displayed. As shown in the following figure the Analysis perspective includes multiple windows and views:
  • Schedule Viewer: Shows each operation and control step of the function, and the clock cycle that it executes in.
  • Module Hierarchy: Shows the function hierarchy and the performance characteristics of the current hierarchy.
  • Performance Profile: Shows the Loops from the top-level function without any performance information.
  • Resource Profile: Shows the resource usage of different elements of the synthesized function.
  • Properties view: Shows the properties of the currently selected control step or operation in the Schedule Viewer.
Figure 1: Analysis Perspective
The Module Hierarchy view provides an overview of the entire RTL design. You can use this view to quickly navigate the hierarchy of the RTL design.
  • The Module Hierarchy view shows the resources and latency contribution for each block in the RTL hierarchy.
  • The Module Hierarchy indicates directly any II or timing violation. In case of timing violations, the hierarchy window will also show the total negative slack observed in a specific module.
The Performance Profile view provides details on the performance of the block currently selected in the Module Hierarchy view.
  • Performance is measured in terms of latency and the initiation interval.
  • This view also includes details on whether the block was pipelined or not.

The Resource Profile view shows the resources used at the selected level of hierarchy, and shows the control state of the operations used.

Vitis HLS Synthesis Reports

When synthesis completes, Vitis HLS generates a Simplified Synthesis report for the top-level function that opens automatically in the information pane, and a more complete Synthesis Report that can be found in the solution/syn/report folder in the Explorer view.

The Simplified Synthesis report is intended to provide a quick summary of results. It includes elements of the complete Synthesis report, such as the General Information, and Performance and Resource Estimates which combines elements of the Performance Estimates and Utilization Estimates as described in the tables below.

The complete Synthesis report provides details on both the performance and resource utilization of the RTL design with the following sections:

Table 1. Synthesis Report Categories
Category Description
General Information Details on when the results were generated, the version of the software used, the project name, the solution name, and the technology details.
Performance Estimates > Timing The target clock frequency, clock uncertainty, and the estimate of the fastest achievable clock frequency.
Performance Estimates > Latency > Summary

Reports the latency and initiation interval for this block and any sub-blocks instantiated in this block.

Each sub-function called at this level in the C source is an instance in this RTL block, unless it was inlined.

The latency is the number of cycles it takes to produce the output. The initiation interval is the number of clock cycles before new inputs can be applied.

In the absence of any PIPELINE directives, the latency is one cycle less than the initiation interval (the next input is read when the final output is written).

Performance Estimates > Latency > Detail

The latency and initiation interval for the instances (sub-functions) and loops in this block. If any loops contain sub-loops, the loop hierarchy is shown.

The min and max latency values indicate the latency to execute all iterations of the loop. The presence of conditional branches in the code might make the min and max different.

The Iteration Latency is the latency for a single iteration of the loop.

If the loop has a variable latency, the latency values cannot be determined and are shown as a question mark (?). See the text after this table.

Any specified target initiation interval is shown beside the actual initiation interval achieved.

The tripcount shows the total number of loop iterations.

Utilization Estimates > Summary

This part of the report shows the resources (LUTS, Flip-Flops, DSPs) used to implement the design.

Some Xilinx devices using stacked silicon interconnect (SSI) technology divide the available resources over multiple super logic regions (SLRs). In this case, the Summary table includes both total available resources, and the per-SLR resources.

IMPORTANT: When targeting an SSI device, the RTL logic created by Vitis HLS must fit within a single SLR.
Utilization Estimates > Details > Instance

The resources specified here are used by the sub-blocks instantiated at this level of the hierarchy.

If the design only has no RTL hierarchy, there are no instances reported.

If any instances are present, clicking on the name of the instance opens the synthesis report for that instance.

Utilization Estimates > Details > Memory

The resources listed here are those used in the implementation of memories at this level of the hierarchy.

Vitis HLS reports a single-port BRAM as using one bank of memory and reports a dual-port BRAM as using two banks of memory.

Utilization Estimates > Details > FIFO The resources listed here are those used in the implementation of any FIFOs implemented at this level of the hierarchy.
Utilization Estimates > Details > Shift Register

A summary of all shift registers mapped into Xilinx SRL components.

Additional mapping into SRL components can occur during RTL synthesis.

Utilization Estimates > Details > Expressions

This category shows the resources used by any expressions such as multipliers, adders, and comparators at the current level of hierarchy.

The bit-widths of the input ports to the expressions are shown.

Utilization Estimates > Details > Multiplexors

This section of the report shows the resources used to implement multiplexors at this level of hierarchy.

The input widths of the multiplexors are shown.

Utilization Estimates > Details > Register A list of all registers at this level of hierarchy is shown here. The report includes the register bit-widths.
Interface Summary > Interface

This section shows how the function arguments have been synthesized into RTL ports.

The RTL port names are grouped with their protocol and source object: these are the RTL ports created when that source object is synthesized with the stated I/O protocol.

When the latency values in the Synthesis report are displayed as a "?" it means that Vitis HLS cannot determine the number of loop iterations. If the latency or throughput of the design is dependent on a loop with a variable index, Vitis HLS reports the latency of the loop as being unknown (represented in the reports by a question mark “?”).

In the following example, the maximum iteration of the for-loop is determined by the value of input num_samples. The value of num_samples is not defined in the C function, but comes into the function from the outside.

void foo (num_samples, ...) {
   int i;
   ...
   loop_1: for(i=0;i< num_samples;i++) {
     ...
     result = a + b;
   }
}

In this case you can use the TRIPCOUNT pragma, or the set_directive_tripcount command to manually specify the number of loop iterations. The TRIPCOUNT value does not impact the results of synthesis. It is only used to ensure the generated reports show meaningful ranges for latency and interval. This also allows a meaningful comparison between different solutions.

TIP: If the C assert macro is used in the code, Vitis HLS can use it to both determine the loop limits for reporting, and create hardware that is exactly sized to these limits.

Schedule Viewer

The Schedule Viewer provides a detailed view of the synthesized RTL, showing each operation and control step of the function, and the clock cycle that it executes in. It helps you to identify any loop dependencies that are preventing parallelism, timing violations, and data dependencies.

The Schedule Viewer is displayed by default in the Analysis perspective. You can open it from the Module Hierarchy window by right-clicking on a module and selecting Open Schedule Viewer from the menu.

In the Schedule Viewer,
  • The left vertical axis shows the names of operations and loops in the RTL hierarchy. Operations are in topological order, implying that an operation on line n can only be driven by operations from a previous line, and will only drive an operation in a later line.
  • The top horizontal axis shows the clock cycles in consecutive order.
  • The vertical dashed line in each clock cycle shows the reserved portion of the clock period due to clock uncertainty. This time is left by the tool for the Vivado back-end processes, like place and route.
  • Each operation is shown as a gray box in the table. The box is horizontally sized according to the delay of the operation as percentage of the total clock cycle. In case of function calls, the provided cycle information is equivalent to the operation latency.
  • Multi-cycle operations are shown as gray boxes with a horizontal line through the center of the box.
  • The Schedule Viewer also displays general operator data dependencies as solid blue lines. As shown in the figure below, when selecting an operation you can see solid blue arrows highlighting the specific operator dependencies. This gives you the ability to perform detailed analysis of data dependencies. The green dotted line indicates an inter-iteration data dependency.
  • Memory dependencies are displayed using golden lines.
  • In addition, lines of source code are associated with each operation in the Schedule Viewer report. Right click on an operation to use the Goto Source command to open the input source code associated with the operation.

In the figure below, the loop called RD_Loop_Row is selected. This is a pipelined loop and the initiation interval (II) is explicitly stated in the loop bar. Any pipelined loop is visualized unfolded, meaning one full iteration is shown in the schedule viewer. Overlap, as defined by II, is marked by a thick clock boundary on the loop marker.

The total latency of a single iteration is equivalent to the number of cycles covered by the loop marker. In this case, it is three cycles.

Figure 2: Schedule Viewer
The Schedule Viewer displays a menu bar at the top right of the report that includes the following features:
  • A drop down menu, initially labeled Focus Off, that lets you specify operations or events in the report to select
  • A text search field to search for specific operations or steps (), and commands to Scroll Up or Scroll Down through the list of objects that match your search text
  • Zoom In, Zoom Out, and Zoom Fit commands ()
  • The Filter command () lets you dynamically filter the operations that are displayed in the viewer. You can filter operations by type, or by clustered operations.
    • Filtering by type allows you to limit what operations get presented based on their functionality. For example, visualizing only adders, multipliers, and function calls will remove all of the small operations such as “and” and “or”s.
    • Filtering by clusters exploits the fact that the scheduler is able to group basic operations and then schedule them as one component. The cluster filter setting can be enabled to color the clusters or even collapse them into one large operation in the viewer. This allows a more concise view of the schedule.
Figure 3: Operation Causing Violation

You can quickly locate II violations using the drop down menu in the Schedule Viewer, as shown in the figure above. You can also select it through the context menu in the Module Hierarchy view.

To locate the operations causing the violation in the source code, right click on the operation and use the Goto Source command, or double-click on the operation and the source viewer will appear and identify the root of the object in the source.

Timing violations can also be quickly found from the Module Hierarchy view context menu, or by using the drop down menu in the Schedule Viewer menu. A timing violation is a path of operations requiring more time than the available clock cycle. To visualize this, the problematic operation is represented in the Schedule Viewer in a red box.

By default all dependencies (blue lines) are shown between each operation in the critical timing path.

Dataflow Viewer

The DATAFLOW optimization is a dynamic optimization which can only be fully understood after the RTL co-simulation is complete. Due to this fact, the Dataflow viewer lets you see the dataflow structure inferred by the tool, inspect the channels (FIFO/PIPO), and examine the effect of channel depth on performance. Performance data is back-annotated to the Dataflow viewer from the co-simulation results.
IMPORTANT: You can open the Dataflow view without running RTL co-simulation, but your view will not contain important performance information such as read/write block times, co-sim depth, and stall times.

You must apply the DATAFLOW pragma or directive to your design for the Dataflow viewer to be populated. You can apply dataflow to the top-level function, or specify regions of a function, or loops. The Dataflow viewer displays a representation of the dataflow graph structure, showing the different processes and the underlying producer-consumer connections.

In the Module Hierarchy view, the icon beside the function indicates that a Dataflow Viewer report is available. When you see this icon, you can right-click the function and use the Open Dataflow Viewer command.

Figure 4: Dataflow Viewer

Features of the Dataflow viewer include the following:

  • Source Code browser.
  • Automatic cross-probing from process/channel to source code.
  • Filtering of ports and channel types.
  • Process and Channel table details the characteristics of the design:
    • Channel Profiling (FIFO sizes etc), enabled from Solution Settings dialog box.
    • Process Read Blocking/Write Blocking/Stalling Time reported after RTL co-simulation.
      IMPORTANT: You must use cosim_design -enable_dataflow_profiling to capture data for the Dataflow viewer, and your testbench must run at least two iterations of the top-level function.
    • Process Latency and II displayed.
    • Channel type and widths are displayed in the Channel table.
    • Automatic cross-probing from Process and Channel table to the Graph and Source browser.
    • Hover over channel or process to display tooltips with design information.

The Dataflow viewer can help with performance debugging your designs. When your design deadlocks during RTL co-simulation, the GUI will open the Dataflow viewer and highlight the channels and processes involved in the deadlock so you can determine if the cause is insufficient FIFO depth, for instance.

When your design does not perform as expected, the Process and Channels table can help you understand why. A process can stall waiting to read input, or can stall because it cannot write output. The channel table provides you with stalling percentages, as well as identifying if the process is "read blocked" or "write blocked."

TIP: If you use a Tcl script to create the Vitis HLS project, you can still open it in the GUI to analyze the design.