I/O Bandwidth Requirements

When different neural networks run on the DPU, the I/O bandwidth requirement will change depending on which neural network is currently being executed. Even the I/O bandwidth requirement of different layers in one neural network are different. The I/O bandwidth requirements for some neural networks, averaged by layer, have been tested with one DPU core running at full speed. The peak and average I/O bandwidth requirements of three different neural networks are shown in the table below. The table only shows the number of two commonly used DPU architectures (B1152 and B4096).
Note: When multiple DPU cores run in parallel, each core might not be able to run at full speed due to the I/O bandwidth limitations.
Table 1. I/O Bandwidth Requirements for DPU-B1152 and DPU-B4096
Network Model DPU-B1152 DPU-B4096
Peak (MB/s) Average (MB/s) Peak (MB/s) Average (MB/s)
Inception-v1 1704 890 4626 2474
ResNet50 2052 1017 5298 3132
SSD ADAS VEHICLE 1516 684 5724 2049
YOLO-V3-VOC 2076 986 6453 3290

If one DPU core needs to run at full speed, the peak I/O bandwidth requirement shall be met. The I/O bandwidth is mainly used for accessing data though the AXI master interfaces (DPU0_M_AXI_DATA0 and DPU0_M_AXI_DATA1).