VAI_C Kernel

The kernel information generated by VAI_C is illustrated as follows. Such information is useful for the users to deploy models over edge DPU.

Kernel ID
The ID of each kernel generated by VAI_C after compilation. Every kernel has a unique id assigned by VAI_C. The neural network model will be compiled to several kernels depending on operators supported by DPU.
Kernel Topology
The kernel topology description file describes the kernels in the kernel graph view when compilation is finished. The kernel_graph file is saved in standard JPEG format with file extension .jpg in the output directory specified by the VAI_C --output_dir option. If graphviz is not installed on the host system, VAI_C will output a DOT (graph description language) format file with extension .gv instead. You can convert the .gv format file to a JPEG file using the following command:
dot -Tjpg -o kernel_graph.jpg kernel_graph.gv
Kernel Name
The name of the current kernel. For each DPU kernel, VAI_C produces one corresponding ELF object file named as dpu_kernelName.elf. For example, dpu_resnet50_0.elf and dpu_resnet50_2.elf are for DPU kernels resnet50_0 and resnet50_2 respectively. The kernel name is expected to be used in the Vitis AI programming, allowing DPU runtime to identify DPU different kernels correctly. As the container for DPU kernel, DPU ELF file encapsulates the DPU instruction codes and parameters for the network model.
Kernel Type
The type of kernel. Three types of kernel are supported by VAI_C.
Code Size
DPU instruction code size in the unit of MB, KB, or bytes for the DPU kernel.
Param Size
The size of parameters for this kernel in the unit of MB for the DPU kernel.
Workload MACs
The total computation workload in the unit of MOPS for the DPU kernel.
Mean Value
The mean values for the DPU kernel.
I/O Memory Space
Only available for DPU kernel compiled as unique memory model. It is the total size of input tenors, intermediate feature maps, and output tensors in the unit of MB. For split IO memory model, refer to the other three fields: Input Mem Size, Output Mem Size and Feature Map Mem Size, which are described below.
Input Mem Size
The total size of all the input tensors in the unit of MB(B). It is only available for DPU kernel compiled as split IO memory model.
Output Mem Size
The total size of all the outputs tensors in the unit of MB(B). It is only available for DPU kernel compiled as split IO memory model.
Feature Map Mem Size
The total size of the intermediate feature maps in the unit of MB(B). It is only available for DPU kernel compiled as split IO memory model.
Total Node Count
The number of DPU nodes for the DPU kernel.
Total Tensor Count
The number of DPU tensors for the DPU kernel.
Boundary Input Tensors
All input tensors of the kernel are listed out together with their shape information in the format of HWC (height*width*channel). The input tensor name can be used to retrieve DPUTensor via dpuGetBoundaryIOTensor() API. For ResNet50, its input tensor is data:0.
Boundary Output Tensors
All output tensors of the kernel are listed out together with their shape information in the format of HWC (height*width*channel). The output tensor name can be used to retrieve DPUTensor via dpuGetBoundaryIOTensor() API. For ResNet50, its output tensor is fc1000:0. Note that for the historical reason of edge DPU design , VAI_C compiler always produces even number channels for the output tensor with odd number channels. Regarding the additionally added one channel for the output tensor, it is always filled with zero.
Input nodes
All input nodes of the current DPU kernel and the shape information of each node are listed in the format of height*width*channel. For kernels not supported by the DPU, the user must get the output of the preceding kernel through output nodes and feed them into input nodes of the current node, using APIs provided by N2Cube.
Output nodes
All output nodes of the current DPU kernel and the shape information of each node is listed in the format of height*width*channel. The address and size of output nodes can be extracted using APIs provided by N2Cube.
Note: The fields of Code Size, Param Size, Workload MACs, Mean Value, Node Count and Tensor Count from VAI_C compilation log are only available for DPU kernel.
For ResNet-50, its kernel graph in JPEG format is shown in the following figure. The kernel graph node describes the kernel id and its type, while the edge shows the relationship between different kernels in two tuples. The first item represents the output tensor from the source kernel, while the second item shows the input tensor to the destination kernel. The tuple contains two parts: the name of input/output node binding to the tensor, and the tensor index of the input/output node. Using the node name and index provided in the tuple, users can use the APIs provided by N2Cube to get the input or output tensor address.
Figure 1: DPU Kernel Graph for ResNet-50

Regarding the operations supported by edge DPU, you can refer to the Zynq DPU v3.1 IP Product Guide (PG338) for details. After compilation process of VAI_C, network models are normally transformed into the following three kinds of kernels.

DPUKernel
Kernel running on edge DPU
CPUKernel
Kernel running on CPU side. It consists of the DPU un-supported layers/operators, which should be deployed onto the CPU by the user.
ParamKernel
Same as CPU Kernel, but also generates weights and bias parameters for the DPU un-supported layers/operators.