Product Specification
Hardware Architecture
The detailed hardware architecture of the DPU is shown in the following figure. After startup, the DPU fetches instructions from the offchip memory to control the operation of the computing engine. The instructions are generated by the Vitis™ AI compiler, where substantial optimizations are performed.
Onchip memory is used to buffer input, intermediate, and output data to achieve high throughput and efficiency. The data is reused as much as possible to reduce the external memory bandwidth. A deep pipelined design is used for the computing engine. The processing elements (PE) take full advantage of the finegrained building blocks such as multipliers, adders, and accumulators in Xilinx devices.
DPU with Enhanced Usage of DSP
A DSP Double Data Rate (DDR) technique is used to improve the performance achieved with the device. Therefore, two input clocks for the DPU are needed: One for general logic and another at twice the frequency for DSP slices. The difference between a DPU not using the DSP DDR technique and a DPU enhanced usage architecture is shown here.
Port Descriptions
The DPU toplevel interfaces are shown in the following figure.
The DPU I/O signals are listed and described in the table below.
Signal Name  Interface Type  Width  I/O  Description 

S_AXI  Memory mapped AXI slave interface  32  I/O  32bit memory mapped AXI interface for registers. 
s_axi_aclk  Clock  1  I  AXI clock input for S_AXI 
s_axi_aresetn  Reset  1  I  ActiveLow reset for S_AXI 
dpu_2x_clk  Clock  1  I  Input clock used for DSP blocks in the DPU. The frequency is twice that of m_axi_dpu_aclk. 
dpu_2x_resetn  Reset  1  I  ActiveLow reset for DSP blocks 
m_axi_dpu_aclk  Clock  1  I  Input clock used for DPU general logic. 
m_axi_dpu_aresetn  Reset  1  I  ActiveLow reset for DPU general logic 
DPUx_M_AXI_INSTR  Memory mapped AXI master interface  32  I/O  32bit memory mapped AXI interface for DPU instructions. 
DPUx_M_AXI_DATA0  Memory mapped AXI master interface  64 or 128  I/O  64bit AXI interface for Zynq7000 series and 128bit for Zynq UltraScale+ MPSoC series. 
DPUx_M_AXI_DATA1  Memory mapped AXI master interface  64 or 128  I/O  64bit AXI interface for Zynq7000 series and 128bit for Zynq UltraScale+ MPSoC series. 
dpu_interrupt  Interrupt  1~4  O  ActiveHigh interrupt output from DPU. The data width is determined by the number of DPU cores. 
SFM_M_AXI (optional)  Memory mapped AXI master interface  128  I/O  128bit memory mapped AXI interface for softmax data. 
sfm_interrupt (optional)  Interrupt  1  O  ActiveHigh interrupt output from softmax module. 
dpu_2x_clk_ce (optional)  Clock enable  1  O  Clock enable signal for controlling the input DPU 2x clock when DPU 2x clock gating is enabled. 

Register Space
The DPU IP implements registers in programmable logic. The following tables show the DPU IP registers. These registers are accessible from the APU through the S_AXI interface.
reg_dpu_reset
The reg_dpu_reset register controls the resets of all DPU cores integrated in the DPU IP. The lower four bits of this register control the reset of up to four DPU cores. All the reset signals are activeHigh. The details of reg_dpu_reset are shown in the following table.
Register  Address Offset  Width  Type  Description 

reg_dpu_reset  0x004  32  R/W  [n] – DPU core n reset 
reg_dpu_isr
The reg_dpu_isr register represents the interrupt status of all cores in the DPU IP. The lower four bits of this register shows the interrupt status of up to four DPU cores. The details of reg_dpu_irq are shown in the following table.
Register  Address Offset  Width  Type  Description 

reg_dpu_isr  0x608  32  R  [n] – DPU core n interrupt status 
reg_dpu_start
The reg_dpu_start register is the start signal for a DPU core. There is one start register for each DPU core. The details of reg_dpu_start are shown in the following table.
Register  Address Offset  Width  Type  Description 

reg_dpu0_start  0x220  32  R/W  DPU core0 start signal. 
reg_dpu1_start  0x320  32  R/W  DPU core1 start signal. 
reg_dpu2_start  0x420  32  R/W  DPU core2 start signal. 
reg_dpu3_start 
0x520  32  R/W  DPU core3 start signal. 
reg_dpu_instr_addr
The reg_dpu_instr_addr register is used to indicate the instruction address of a DPU core. Each DPU core has a reg_dpu_instr_addr register. Only the lower 28bits are valid. In the DPU processor, the real instructionfetch address is a 40bit signal which consists of the lower 28 bits of reg_dpu_instr_addr followed by 12 zero bits. The available instruction address for DPU ranges from 0x1000 to 0xFFFF_FFFF_FFFF_F000. The details of reg_dpu_instr_addr are shown in the following table.
Register  Address Offset  Width  Type  Description 

reg_dpu0_instr_addr  0x20C  32  R/W  Start address in external memory for DPU core0 instructions. The lower 28bit is valid. 
reg_dpu1_instr_addr  0x30C  32  R/W  Start address in external memory for DPU core1 instructions. The lower 28bit is valid. 
reg_dpu2_instr_addr  0x40C  32  R/W  Start address in external memory for DPU core2 instructions. The lower 28bit is valid. 
reg_dpu3_instr_addr 
0x50C  32  R/W  Start address in external memory for DPU core3 instructions. The lower 28bit is valid. 
reg_dpu_base_addr
The reg_dpu_base_addr register is used to indicate the address of input image and parameters for each DPU in external memory. The width of a DPU base address is 40 bits so it can support an address space up to 1 TB. All registers are 32 bits wide, so two registers are required to represent a 40bit wide base address. reg_dpu0_base_addr0_l represents the lower 32 bits of base_address0 in DPU core0 and reg_dpu0_base_addr0_h represents the upper eight bits of base_address0 in DPU core0.
There are eight groups of DPU base addresses for each DPU core and thus 32 groups of DPU base addresses for up to four DPU cores. The details of reg_dpu_base_addr are shown in the following table.
Register  Address Offset  Width  Type  Description 

reg_dpu0_base_addr0_l  0x224  32  R/W  The lower 32 bits of base_address0 of DPU core0. 
reg_dpu0_base_addr0_h  0x228  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core0. 
reg_dpu0_base_addr1_l  0x22C  32  R/W  The lower 32 bits of base_address1 of DPU core0. 
reg_dpu0_base_addr1_h  0x230  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core0. 
reg_dpu0_base_addr2_l  0x234  32  R/W  The lower 32 bits of base_address2 of DPU core0. 
reg_dpu0_base_addr2_h  0x238  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core0. 
reg_dpu0_base_addr3_l  0x23C  32  R/W  The lower 32 bits of base_address3 of DPU core0. 
reg_dpu0_base_addr3_h  0x240  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core0. 
reg_dpu0_base_addr4_l  0x244  32  R/W  The lower 32 bits of base_address4 of DPU core0. 
reg_dpu0_base_addr4_h  0x248  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core0. 
reg_dpu0_base_addr5_l  0x24C  32  R/W  The lower 32 bits of base_address5 of DPU core0. 
reg_dpu0_base_addr5_h  0x250  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core0. 
reg_dpu0_base_addr6_l  0x254  32  R/W  The lower 32 bits of base_address6 of DPU core0. 
reg_dpu0_base_addr6_h  0x258  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core0. 
reg_dpu0_base_addr7_l  0x25C  32  R/W  The lower 32 bits of base_address7 of DPU core0. 
reg_dpu0_base_addr7_h  0x260  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core0. 
reg_dpu1_base_addr0_l  0x324  32  R/W  The lower 32 bits of base_address0 of DPU core1. 
reg_dpu1_base_addr0_h  0x328  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core1. 
reg_dpu1_base_addr1_l  0x32C  32  R/W  The lower 32 bits of base_address1 of DPU core1. 
reg_dpu1_base_addr1_h  0x330  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core1. 
reg_dpu1_base_addr2_l  0x334  32  R/W  The lower 32 bits of base_address2 of DPU core1. 
reg_dpu1_base_addr2_h  0x338  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core1. 
reg_dpu1_base_addr3_l  0x33C  32  R/W  The lower 32 bits of base_address3 of DPU core1. 
reg_dpu1_base_addr3_h  0x340  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core1. 
reg_dpu1_base_addr4_l  0x344  32  R/W  The lower 32 bits of base_address4 of DPU core1. 
reg_dpu1_base_addr4_h  0x348  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core1. 
reg_dpu1_base_addr5_l  0x34C  32  R/W  The lower 32 bits of base_address5 of DPU core1. 
reg_dpu1_base_addr5_h  0x350  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core1. 
reg_dpu1_base_addr6_l  0x354  32  R/W  The lower 32 bits of base_address6 of DPU core1. 
reg_dpu1_base_addr6_h  0x358  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core1. 
reg_dpu1_base_addr7_l  0x35C  32  R/W  The lower 32 bits of base_address7 of DPU core1. 
reg_dpu1_base_addr7_h  0x360  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core1. 
reg_dpu2_base_addr1_l  0x42C  32  R/W  The lower 32 bits of base_address1 of DPU core2. 
reg_dpu2_base_addr1_h  0x430  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core2. 
reg_dpu2_base_addr2_l  0x434  32  R/W  The lower 32 bits of base_address2 of DPU core2. 
reg_dpu2_base_addr2_h  0x438  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core2. 
reg_dpu2_base_addr3_l  0x43C  32  R/W  The lower 32 bits of base_address3 of DPU core2. 
reg_dpu2_base_addr3_h  0x440  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core2. 
reg_dpu2_base_addr4_l  0x444  32  R/W  The lower 32 bits of base_address4 of DPU core2. 
reg_dpu2_base_addr4_h  0x448  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core2. 
reg_dpu2_base_addr5_l  0x44C  32  R/W  The lower 32 bits of base_address5 of DPU core2. 
reg_dpu2_base_addr5_h  0x450  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core2. 
reg_dpu2_base_addr6_l  0x454  32  R/W  The lower 32 bits of base_address6 of DPU core2. 
reg_dpu2_base_addr6_h  0x458  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core2. 
reg_dpu2_base_addr7_l  0x45C  32  R/W  The lower 32 bits of base_address7 of DPU core2. 
reg_dpu2_base_addr7_h  0x460  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core2. 
reg_dpu3_base_addr0_l  0x524  32  R/W  The lower 32 bits of base_address0 of DPU core3. 
reg_dpu3_base_addr0_h  0x528  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core3. 
reg_dpu3_base_addr1_l  0x52C  32  R/W  The lower 32 bits of base_address1 of DPU core3. 
reg_dpu3_base_addr1_h  0x530  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core3. 
reg_dpu3_base_addr2_l  0x534  32  R/W  The lower 32 bits of base_address2 of DPU core3. 
reg_dpu3_base_addr2_h  0x538  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core3. 
reg_dpu3_base_addr3_l  0x53C  32  R/W  The lower 32 bits of base_address3 of DPU core3. 
reg_dpu3_base_addr3_h  0x540  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core3. 
reg_dpu3_base_addr4_l  0x544  32  R/W  The lower 32 bits of base_address4 of DPU core3. 
reg_dpu3_base_addr4_h  0x548  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core3. 
reg_dpu3_base_addr5_l  0x54C  32  R/W  The lower 32 bits of base_address5 of DPU core3. 
reg_dpu3_base_addr5_h  0x550  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core3. 
reg_dpu3_base_addr6_l  0x554  32  R/W  The lower 32 bits of base_address6 of DPU core3 
reg_dpu3_base_addr6_h  0x558  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core3. 
reg_dpu3_base_addr7_l  0x55C  32  R/W  The lower 32 bits of base_address7 of DPU core3. 
reg_dpu3_base_addr7_h  0x560  32  R/W  The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core3. 
Interrupts
The DPU generates an interrupt to
signal the completion of a task. A high state on reg_dpu0_start signals the start of a
DPU task for DPU core0. At the end of the task, the
DPU generates an interrupt and bit0
in reg_dpu_isr is set to 1. The position of the active bit in the reg_dpu_isr depends on
the number of DPU cores. For example,
when DPU core1 finishes a task while
DPU core0 is still working,
reg_dpu_isr would maintain 2’b10
.
The width of the dpu_interrupt signal is determined by the number of DPU cores. When the parameter DPU_NUM is set to 2, then the DPU IP contains two DPU cores, and the width of the dpu_interrupt signal is two. The lower bit represents the DPU core0 interrupt and the higher bit represents the DPU core1 interrupt.
The interrupt connection between the DPU and the PS is described in the device tree file, which indicates the interrupt number of the DPU connected to the PS. Any interrupt pin may be used if the device tree file and Vivado assignments match. The reference connection is shown here.
 If the softmax option is enabled, then the softmax interrupt should be correctly connected to the PS according to the device tree description.
 irq7~irq0 corresponds to pl_ps_irq0[7:0].
 irq15~irq8 corresponds to pl_ps_irq1[7:0].