xcl_array_partition

Description

Important: Currently only one-dimensional arrays can be partitioned using this attribute .

One of the advantages of the FPGA over other compute devices for OpenCL™ programs is the ability for the application programmer to customize the memory architecture all throughout the system and into the compute unit. By default, The SDAccel™ compiler generates a memory architecture within the compute unit that maximizes local and private memory bandwidth based on static code analysis of the kernel code. Further optimization of these memories is possible based on attributes in the kernel source code, which can be used to specify physical layouts and implementations of local and private memories. The attribute in the SDAccel compiler to control the physical layout of memories in a compute unit is array_partition.

For one dimensional arrays, the array_partition attribute implements an array declared within kernel code as multiple physical memories instead of a single physical memory. The selection of which partitioning scheme to use depends on the specific application and its performance goals. The array partitioning schemes available in the SDAccel compiler are cyclic, block, and complete.

Syntax

Place the attribute with the definition of the array variable:

__attribute__((xcl_array_partition(<partition_type>, <partition_factor>, <array_dimension>)))

Where:

  • <partition_type>: Specifies one of the following partition types:
    • cyclic: Cyclic partitioning is the implementation of an array as a set of smaller physical memories that can be accessed simultaneously by the logic in the compute unit. The array is partitioned cyclically by putting one element into each memory before coming back to the first memory to repeat the cycle until the array is fully partitioned.
    • block: Block partitioning is the physical implementation of an array as a set of smaller memories that can be accessed simultaneously by the logic inside of the compute unit. In this case, each memory block is filled with elements from the array before moving on to the next memory.
    • complete: Complete partitioning decomposes the array into individual elements. For a one-dimensional array, this corresponds to resolving a memory into individual registers.
    • The default partition_type is complete.
  • <partition_factor>: For cyclic type partitioning, the partition_factor specifies how many physical memories to partition the original array into in the kernel code. For Block type partitioning, the partition_factor specifies the number of elements from the original array to store in each physical memory.
    Important: For complete type partitioning, the partition_factor is not specified.
  • <array_dimension>: Specifies which array dimension to partition. Specified as an integer from 1 to N. SDAccel supports arrays of N dimensions and can partition the array on any single dimension.

Example 1

For example, consider the following array declaration:

int buffer[16];

The integer array, named buffer, stores 16 values that are 32-bits wide each. Cyclic partitioning can be applied to this array with the following declaration:

int buffer[16] __attribute__((xcl_array_partition(cyclic,4,1)));

In this example, the cyclic partition_type attribute tells SDAccel to distribute the contents of the array among four physical memories. This attribute increases the immediate memory bandwidth for operations accessing the array buffer by a factor of four.

All arrays inside of a compute unit in the context of SDAccel are capable of sustaining a maximum of two concurrent accesses. By dividing the original array in the code into four physical memories, the resulting compute unit can sustain a maximum of eight concurrent accesses to the array buffer.

Example 2

Using the same integer array as found in Example 1, block partitioning can be applied to the array with the following declaration:

int buffer[16] __attribute__((xcl_array_partition(block,4,1)));

Since the size of the block is four, SDAccel will generate four physical memories, sequentially filling each memory with data from the array.

Example 3

Using the same integer array as found in Example 1, complete partitioning can be applied to the array with the following declaration:

int buffer[16] __attribute__((xcl_array_partition(complete, 1)));

In this example the array is completely partitioned into distributed RAM, or 16 independent registers in the programmable logic of the kernel. Because complete is the default, the same effect can also be accomplished with the following declaration:

int buffer[16] __attribute__((xcl_array_partition));

While this creates an implementation with the highest possible memory bandwidth, it is not suited to all applications. The way in which data is accessed by the kernel code through either constant or data dependent indexes affects the amount of supporting logic that SDx has to build around each register to ensure functional equivalence with the usage in the original code. As a general best practice guideline for SDx, the complete partitioning attribute is best suited for arrays in which at least one dimension of the array is accessed through the use of constant indexes.

See Also