For Distributed RAM (small RAMs), a target of 1 sample per clock cycle can be achieved by the tool.
As soon as I use a block RAM, 2 clock cycles per sample are needed. Why?
This is the typical output:
@W [SCHED-68] Unable to enforce a carried dependency constraint (II = 1, distance = 1)
between 'store' operation (video_fir.c:52) of variable 'calc_0_2_load' on array 'buffer_0' and 'load' operation ('inDataNext', video_fir.c:45) on array 'buffer_0'.
Failed to meet target II: 1
The critical dependency path consists of the following.
[Operation latency] 'load' operation ('inDataNext', video_fir.c:45) on array 'buffer_0' takes 1 cycle.
[Precedence] from 'load' operation ('inDataNext', video_fir.c:45) on array 'buffer_0' to 'store' operation (video_fir.c:52) of variable 'calc_0_2_load' on array 'buffer_0' with length = 0.
[Carried Dependence] from 'store' operation (video_fir.c:52) of variable 'calc_0_2_load' on array 'buffer_0' to 'load' operation ('inDataNext', video_fir.c:45) on array 'buffer_0' with (distance, length) = (1, 1).
Total length of the dependency path is 2,
which exceeds II * total distance = 1 * 1 = 1
@I [SCHED-61] Pipelining result: Target II: 1, Final II: 2, Depth: 11.
This is detailled more precisely in the user guide.
This is a case where the indexing for the reads and writes are from independent variables or similar.
The HLS engine cannot determine that these variables will never be equal, hence, that is safe to do a write-after read access to the block RAM, which has a 1 cycle read latency.
In this case, the HLS tool is seeing a false loop-carry dependence on buffer, due to the block RAM latency it stretches the II to 2, as can be seen from the messages in the HLS log above.
For this kind of case, Vivado HLS provides the DEPENDENCE directive.
Because the dependency reported is between the loop iterations, it is considered an inter dependency.
So, by adding the following directive, the desired II=1 can be attained (either tcl or similar as pragma):
set_directive_dependence -variable buffer -type inter -dependent false "video_fir/shift_loop"
#pragma HLS DEPENDENCE variable=buffer inter false
Please further check the user guide or the man pages in the tool.
Please note that collision warnings might be issued during C/RTL co-simulation.
Finally, it must be emphasized that, while the DEPENDENCE directive is occasionally necessary to override the conservative default behavior of HLS, it is just that: an override.
It is recommended to recode unless absolutely sure the access are independent.
It is up to the designer to ensure that the dependency reported by the tool is, in fact, a false one. Otherwise, RTL generated by HLS might not operate properly.
You also need to make sure the RTL simulation still works with the new directives.