Introduction

The DPU IP provides some user-configurable parameters to optimize resource usage and customize different features. Different configurations can be selected for DSP slices, LUT, block RAM, and UltraRAM usage based on the amount of available programmable logic resources. There are also options for additional functions, such as channel augmentation, average pooling, depthwise convolution, and softmax. Furthermore, there is an option to determine the number of DPU cores that will be instantiated in a single DPU IP.

The deep neural network features and the associated parameters supported by the DPU are shown in the following table.

All the parameters will be described in the HWH file which is generated by the Vivado tool. The Vitis™ AI Compiler creates the matched models based on the HWH file. For more information, see the Vitis AI User Guide in the Vitis AI User Documentation (UG1431)

Table 1. Deep Neural Network Features and Parameters Supported by the DPU
Features Description
Convolution Kernel Sizes W: 1–16 H: 1–16
Strides W: 1–4 H:1–4
Padding_w 0-(kernel_w-1)
Padding_h 0-(kernel_h-1)
Input Size Arbitrary
Input Channel 1–256 * channel_parallel
Output Channel 1–256 * channel_parallel
Activation ReLU, ReLU6 and LeakyReLU
Dilation dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1

Depthwise Convolution

Kernel Sizes W: 1–16 H: 1–16
Strides W: 1–4 H:1–4
Padding_w 0-(kernel_w-1)
Padding_h 0-(kernel_h-1)
Input Size Arbitrary
Input Channel 1–256 * channel_parallel
Output Channel 1–256 * channel_parallel
Activation ReLU, ReLU6
Dilation dilation * input_channel ≤ 256 * channel_parallel && stride_w == 1 && stride_h == 1
Deconvolution Kernel Sizes W: 1–16 H: 1–16
Stride_w stride_w * output_channel ≤ 256 * channel_parallel
Stride_h Arbitrary
Padding_w 0-(kernel_w-1)
Padding_h 0-(kernel_h-1)
Input Size Arbitrary
Input Channel 1–256 * channel_parallel
Output Channel 1–256 * channel_parallel
Activation ReLU, ReLU6 and LeakyReLU
Max Pooling Kernel Sizes W: 1–8 H: 1–8
Strides W: 1–4 H:1–4
Padding W: 0–4 H: 0–4
Average Pooling Kernel Sizes Only support square size from 2x2, 3x3 to 8x8
Strides W: 1-4 H: 1-4
Padding W: 0-7 H: 0-7
Elementwise-sum Input channel 1–256 * channel_parallel
Input size Arbitrary
Concat Output channel 1–256 * channel_parallel
Reorg Strides stride * stride * input_channel ≤ 256 * channel_parallel
BatchNormal - -
FC Input_channel Input_channel ≤ 2048 * channel_parallel
Output_channel Arbitrary
  1. The parameter channel_parallel is determined by the DPU configuration. For example, channel_parallel for the B1152 is 12, and channel_parallel for B4096 is 16 (see Parallelism for Different Convolution Architectures table in Configuration Options section).
  2. In some neural networks, the FC layer is connected with a Flatten layer. The Vitis™ AI compiler will automatically combine the Flatten+FC to a global CONV2D layer, and the CONV2D kernel size is directly equal to the input feature map size of Flatten layer. For this case, the input feature map size cannot exceed the limitation of the kernel size of CONV, otherwise an error will be generated during compilation.

    This limitation occurs only in the Flatten+FC situation. This will be optimized in future releases.