Asynchronous Function Execution

These two pragmas are paired to support manual control of the hardware function synchronization.

The syntax of these pragmas is:
#pragma SDS async(ID)
#pragma SDS wait(ID)

The async pragma is specified immediately preceding a call to a hardware function, directing the compiler not to automatically generate the wait based on data flow analysis.

The wait pragma must be inserted at an appropriate point in the program to direct the CPU to wait until the associated async function call (same ID) has completed.

  • The ID must be a compile time unsigned integer constant.
  • In the presence of an async pragma, the SDSoC system compiler does not generate an sds_wait() in the stub function for the associated call. The program must contain the matching sds_wait(ID) or #pragma SDS wait(ID) at an appropriate point to synchronize the controlling thread running on the CPU with the hardware function thread. An advantage of using the #pragma SDS wait(ID) over the sds_wait(ID) function call is that the source code can then be compiled by compilers other than sdscc (such as gcc that does not interpret either async or wait pragmas).

Example 1

The following code snippet shows an example of using these pragmas with the same ID to pipeline the data transfer and accelerator execution:
     for (int i = 0; i < pipeline_depth; i++) {
          #pragma SDS async(1)
          mmult_accel(A[i%NUM_MAT], B[i%NUM_MAT], C[i%NUM_MAT]);
     }

     for (int i = pipeline_depth; i < NUM_TESTS-pipeline_depth; i++) {
          #pragma SDS wait(1)
          #pragma SDS async(1)
          mmult_accel(A[i%NUM_MAT], B[i%NUM_MAT], C[i%NUM_MAT]);
     }
     for (int i = 0; i < pipeline_depth; i++) {
          #pragma SDS wait(1)
     }
 

In the above example, the first loop ramps up the pipeline with a depth of pipeline_depth, the second loop executes the pipeline, and the third loop ramps down the pipeline. The hardware buffer depth (discussed in Hardware Buffer Depth) should be set to the same value as pipeline_depth. The goal of this pipeline is to transfer data to the accelerator for the next execution while the current execution is not finished. Refer to Increasing System Parallelism and Concurrency for more information.

Example 2

The following code snippet shows an example of using these pragmas with different ID:
{
    #pragma SDS async(1)
    mmult(A, B, C);
    #pragma SDS async(2)
    mmult(D, E, F);
    ...
    #pragma SDS wait(1)
    #pragma SDS wait(2)
}

The program running on the hardware first transfers A and B to the mmult hardware and returns immediately. Then the program transfers D and E to the mmult hardware and returns immediately. When the program later executes to the point of #pragma SDS wait(1), it waits for the output C to be ready. When the program later excutes to the point of #pragma SDS wait(2), it waits for the output F to be ready.