UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

AR# 54111

2012.4 - Vivado HLS - large multipliers: XST DSP usage does not match reported DSP and timing fails

Description

The following Vivado HLS code for a 7 series meets timing requirements (under 4 ns) using two DSPs, using Vivado Synthesis.

However, when targeting an earlier series FPGA such as Virtex-6, the synthesis tool used is XST and the synthesized netlist / implementation usesfour DSPs and does not meet timing (more than 5 ns).

What is the issue?

#include "mult.h"
void mult(in_a_t in_a, in_b_t in_b, out_p_t out_p[1])
{
out_p[0] = in_a * in_b;
}

withmult.h:-

#include <ap_int.h>
typedef ap_int<18> in_a_t;
typedef ap_int<42> in_b_t;
typedef ap_int<60> out_p_t;
void mult(in_a_t in_a, in_b_t in_b, out_p_t out_p[1]);

Solution

This is due to the difference between how XST and Vivado Synthesis infer DSPs from the generated RTL.

This can be worked around for XST with the following C code:

ap_uint<17> x = in_b;
ap_int<42-17> y = in_b >> 17;
out_p[0] = in_a * x + ((in_a * y) << 17);

Using this manual decomposition in Vivado HLS gives better XST performance as the shift by 17 is supported directly in hardware and will be efficiently implemented.

This is an issue with XST - giving the same RTL equivalent code, XSTwill produce the samefour DSPs.

This issue with XST will not be fixed.

Try it yourself with attached is a zip containing this example.

1. On a Vivado HLS command prompt with the Xilinx tools available in the PATH type:

vivado_hls run_hls_vanilla.tcl

This will run the default "vanilla" code on both a Virtex-6 and Virtex-7 FPGA; you can run VHLS again using run_hls_workaround.tcl

2. Compare the results: the results of the export process are in

PROJ_NAME\SOL_NAME\impl\report\RTL_LANG\TOP_NAME_export.rpt

So, for example:

mult_vanilla\solution_v6\impl\report\verilog\mult_export.rpt

You can run this command

diff mult_vanilla\solution_v6\impl\report\verilog\mult_export.rpt mult_vanilla\solution_v7\impl\report\verilog\mult_export.rpt -y

(please note: diff is in the set of command line tools enabled by the Xilinx tools - not the Vivado HLS command prompt tool)

diff mult_vanilla\solution_v6\impl\report\verilog\mult_export.rpt mult_vanilla\solution_v7\impl\report\verilog\mult_export.rpt -y

Implementation tool: Xilinx ISE14.4 | Implementation tool: Xilinx Vivado v2012.4
Device target: xc6vsx315tff1759-1| Device target: xc7vx330tffg1761-1
Report date: Mon Feb 11 15:21:38 GMTST 2013 | Report date: Mon Feb 11 15:32:19 GMTST 2013
<

#=== Resource usage === #=== Resource usage ===
SLICE: 15 | SLICE: 19
LUT: 33 | LUT: 13
FF: 83 FF: 83
DSP: 4| DSP: 2
BRAM: 0 BRAM: 0
SRL: 9 | SRL: 0
#=== Final timing === #=== Final timing ===
CP required: 4.000 CP required: 4.000
CP achieved: 5.423 | CP achieved: 2.015
Timing not met| Timing met

diff mult_workaround\solution_v6\impl\report\verilog\mult_export.rpt mult_workaround\solution_v7\impl\report\verilog\mult_export.rpt -y

Implementation tool: Xilinx ISE14.4 | Implementation tool: Xilinx Vivado v2012.4
Device target: xc6vsx315tff1759-1 | Device target: xc7vx330tffg1761-1
Report date: Mon Feb 11 15:21:40 GMTST 2013 | Report date: Mon Feb 11 15:33:00 GMTST 2013
<

#=== Resource usage === #=== Resource usage ===
SLICE: 21 | SLICE: 25
LUT: 48 | LUT: 39
FF: 111 FF: 111
DSP: 2 DSP: 2
BRAM: 0 BRAM: 0
SRL: 0 SRL: 0
#=== Final timing === #=== Final timing ===
CP required: 4.000 CP required: 4.000
CP achieved: 2.116 | CP achieved: 2.222
Timing met Timing met

Attachments

Associated Attachments

Name File Size File Type
AR54111_VHLS_mult_XST_workaround.zip 1 KB ZIP
AR# 54111
Date Created 02/04/2013
Last Updated 03/05/2013
Status Active
Type Known Issues
Tools
  • Vivado