Overview

Introduction

The Xilinx® Deep Learning Processing Unit (DPU) is a programmable engine optimized for convolutional neural networks. It is composed of a high performance scheduler module, a hybrid computing array module, an instruction fetch unit module, and a global memory pool module. The DPU uses a specialized instruction set, which allows for the efficient implementation of many convolutional neural networks. Some examples of convolutional neural networks which have been deployed include VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, and FPN among others.

The DPU IP can be implemented in the programmable logic (PL) of the selected Zynq®-7000 SoC or Zynq® UltraScale+™ MPSoC device with direct connections to the processing system (PS). The DPU requires instructions to implement a neural network and accessible memory locations for input images as well as temporary and output data. A program running on the application processing unit (APU) is also required to service interrupts and coordinate data transfers.

The top-level block diagram of the DPU is shown in the following figure.

Figure 1: DPU Top-Level Block Diagram
where,
  • APU - Application Processing Unit
  • PE - Processing Engine
  • DPU - Deep Learning Processing Unit
  • RAM - Random Access Memory

Navigating Content by Design Process

Xilinx® documentation is organized around a set of standard design processes to help you find relevant content for your current development task. This document covers the following design processes:

System and Solution Planning
Identifying the components, performance, I/O, and data transfer requirements at a system level. Includes application mapping for the solution to PS, PL, and AI Engine. Topics in this document that apply to this design process include:
Hardware, IP, and Platform Development​
Creating the PL IP blocks for the hardware platform, creating PL kernels, subsystem functional simulation, and evaluating the Vivado® timing, resource use, and power closure. Also involves developing the hardware platform for system integration. Topics in this document that apply to this design process include:
System Integration and Validation​
Integrating and validating the system functional performance, including timing, resource use, and power closure. Topics in this document that apply to this design process include:

Development Tools

Two flows are supported for integrating the DPU into your project: the Vivado flow and the Vitis flow.

The Xilinx Vivado® Design Suite is required to integrate the DPU into your projects for the Vivado flow. Vivado Design Suite 2020.2 or later version is recommended. Contact your local sales representative if the project requires an older version of Vivado.

The Vitis™ unified software platform 2020.2 or later is required to integrate the DPU for the Vitis flow.

Device Resources

The DPU logic resource usage is scalable across Xilinx UltraScale+™ MPSoC and Zynq®-7000 devices. For more information on resource utilization, see the DPU Configuration section.

DPU Development Flow

The DPU requires a device driver which is included in the Xilinx Vitis™ AI development kit.

Free developer resources can be obtained from the Xilinx website: https://github.com/Xilinx/Vitis-AI.

The Vitis AI User Guide (UG1414) describes how to use the DPU with the Vitis AI tools. The basic development flow is shown in the following figure. First, use Vivado/ Vitis to generate the bitstream. Then, download the bitstream to the target board and install the related driver. For instructions on installing the related driver and dependent libraries, see the Vitis AI User Guide (UG1414).

Figure 2: HW/SW Stack

Example System with DPU

The figure below shows an example system block diagram with the Xilinx® UltraScale+™ MPSoC using a camera input. The DPU is integrated into the system through an AXI interconnect to perform deep learning inference tasks such as image classification, object detection, and semantic segmentation.

Figure 3: Example System with Integrated DPU

Vitis AI Development Kit

The Vitis™ AI development environment is used for AI inference on Xilinx® hardware platforms. It consists of optimized IP cores, tools, libraries, models, and example designs.

As shown in the following figure, the Vitis AI development kit consists of AI Compiler, AI Quantizer, AI Optimizer, AI Profiler, AI Library, and Xilinx Runtime Library (XRT).
Figure 4: Vitis AI Stack

For more information of the Vitis AI development kit, see the Vitis AI User Guide in the Vitis AI User Documentation (UG1431).

The Vitis AI development kit can be freely downloaded from here.

Licensing and Ordering

This Xilinx® LogiCORE™ IP module is provided at no additional cost with the Xilinx Vivado® Design Suite under the terms of the Xilinx End User License.

Note: To verify that you need a license, check the License column of the IP Catalog. Included means that a license is included with the Vivado® Design Suite; Purchase means that you have to purchase a license to use the core.

Information about other Xilinx® LogiCORE™ IP modules is available at the Xilinx Intellectual Property page. For information about pricing and availability of other Xilinx LogiCORE IP modules and tools, contact your local Xilinx sales representative.