Dataflow

Dataflow is another digital design technique, which is similar in concept to pipelining. The goal of dataflow is to express parallelism at a coarse-grain level. In terms of software execution, this transformation applies to parallel execution of functions within a single program.

SDSoC extracts this level of parallelism by evaluating the interactions between different functions of a program based on their inputs and outputs. The simplest case of parallelism is when functions work on different data sets and do not communicate with each other. In this case, SDSoC allocates FPGA logic resources for each function and then runs the blocks independently. The more complex case, which is typical in software programs, is when one function provides results for another function. This case is referred to as the consumer-producer scenario.

Dataflow is automatically performed between functions which are marked for hardware acceleration. Given the following code example, where both function load_input and function store_output are marked for implementation in the hardware fabric and a consumer-producer relationship is present through variable buffer, SDSoC automatically performs dataflow, ensuring the consumer hardware function store_output starts operation as soon as data is available from producer function load_input.

main() {
int input_r[N], buffer[N], output_w[N];
...
load_input(input_r, buffer);
store_output(buffer, output_w);
...
}

When the functions marked for hardware acceleration contain sub-functions, a dataflow optimization directive is required to ensure the sub-functions execute in a dataflow manner and further enhance parallelism.

The following figure shows a conceptual view of dataflow pipelining. After synthesis, the default behavior is to execute and complete func_A, then func_B, and finally func_C. However, you can use the Vivado HLS DATAFLOW directive to schedule each function to execute as soon as data is available. In this example, the original function has a latency and interval of eight clock cycles. When you use dataflow optimization, the interval is reduced to only three clock cycles. The tasks shown in this example are functions, but you can perform dataflow optimization between functions, between functions and loops, and between loops.



SDSoC supports two use models for the consumer-producer scenario in sub-functions. In the first use model, the producer creates a complete data set before the consumer can start its operation. Parallelism is achieved by instantiating a pair of BRAM memories arranged as memory banks ping and pong. Each function can access only one memory bank, ping or pong, for the duration of a function call. When a new function call begins, the HLS-generated circuit switches the memory connections for both the producer and the consumer. This approach guarantees functional correctness but limits the level of achievable parallelism to across function calls.

In the second use model, the consumer can start working with partial results from the producer, and the achievable level of parallelism is extended to include execution within a function call. The HLS-generated modules for both functions are connected through the use of a first in, first out (FIFO) memory circuit. This memory circuit, which acts as a queue in software programming, provides data-level synchronization between the modules. At any point during a function call, both hardware modules are executing their programming. The only exception is that the consumer module waits for some data to be available from the producer before beginning computation. In HLS terminology, the wait time of the consumer module is referred to as the interval or initiation interval (II).