Product Specification

Hardware Architecture

The detailed hardware architecture of the DPU is shown in the following figure. After start-up, the DPU fetches instructions from the off-chip memory to control the operation of the computing engine. The instructions are generated by the Vitis™ AI compiler, where substantial optimizations have been performed.

On-chip memory is used to buffer input, intermediate, and output data to achieve high throughput and efficiency. The data is reused as much as possible to reduce the memory bandwidth. A deep pipelined design is used for the computing engine. The processing elements (PE) take full advantage of the fine-grained building blocks such as multipliers, adders, and accumulators in Xilinx devices.

Figure 1: DPU Hardware Architecture

DPU with Enhanced Usage of DSP

A DSP Double Data Rate (DDR) technique is used to improve the performance achieved with the device. Therefore, two input clocks for the DPU are needed: One for general logic and another at twice the frequency for DSP slices. The difference between a DPU not using the DSP DDR technique and a DPU enhanced usage architecture is shown here.

Note: All DPU architectures referred to in this document refer to DPU enhanced usage, unless otherwise specified.
Figure 2: Difference between DPU without DSP DDR and DPU Enhanced Usage

Port Descriptions

The core top-level interfaces are shown in the following figure.

Figure 3: Core Ports

The DPU I/O signals are listed and described in the table below.

Table 1. DPU Signal Description
Signal Name Interface Type Width I/O Description
S_AXI Memory mapped AXI slave interface 32 I/O 32-bit memory mapped AXI interface for registers.
s_axi_aclk Clock 1 I AXI clock input for S_AXI
s_axi_aresetn Reset 1 I Active-Low reset for S_AXI
dpu_2x_clk Clock 1 I Input clock used for DSP blocks in the DPU. The frequency is twice that of m_axi_dpu_aclk.
dpu_2x_resetn Reset 1 I Active-Low reset for DSP blocks
m_axi_dpu_aclk Clock 1 I Input clock used for DPU general logic.
m_axi_dpu_aresetn Reset 1 I Active-Low reset for DPU general logic
DPUx_M_AXI_INSTR Memory mapped AXI master interface 32 I/O 32-bit memory mapped AXI interface for DPU instructions.
DPUx_M_AXI_DATA0 Memory mapped AXI master interface 64 or 128 I/O 64-bit AXI interface for Zynq7000 series and 128-bit for MPSoC series.
DPUx_M_AXI_DATA1 Memory mapped AXI master interface 64 or 128 I/O 64-bit AXI interface for Zynq7000 series and 128-bit for Zynq MP series.
dpu_interrupt Interrupt 1~4 O Active-High interrupt output from DPU. The data width is determined by the number of DPU cores.
SFM_M_AXI (optional) Memory mapped AXI master interface 128 I/O 128-bit memory mapped AXI interface for softmax data.
sfm_interrupt (optional) Interrupt 1 O Active-High interrupt output from softmax module.
dpu_2x_clk_ce (optional) Clock enable 1 O Clock enable signal for controlling the input DPU 2x clock when DPU 2x clock gating is enabled.
  1. The softmax interface only appears when the softmax option in the DPU is enabled.

Register Space

The DPU IP implements registers in programmable logic. The following tables show the DPU IP registers. These registers are accessible from the APU through the S_AXI interface.

reg_dpu_reset

The reg_dpu_reset register controls the resets of all DPU cores integrated in the DPU IP. The lower four bits of this register control the reset of up to four DPU cores. All the reset signals are active-High. The details of reg_dpu_reset are shown in the following table.

Table 2. reg_dpu_reset
Register Address Offset Width Type Description
reg_dpu_reset 0x004 32 R/W [n] – DPU core n reset

reg_dpu_isr

The reg_dpu_isr register represents the interrupt status of all cores in the DPU IP. The lower four bits of this register shows the interrupt status of up to four DPU cores. The details of reg_dpu_irq are shown in the following table.

Table 3. reg_dpu_isr
Register Address Offset Width Type Description
reg_dpu_isr 0x608 32 R [n] – DPU core n interrupt status

reg_dpu_start

The reg_dpu_start register is the start signal for a DPU core. There is one start register for each DPU core. The details of reg_dpu_start are shown in the following table.

Table 4. reg_dpu_start
Register Address Offset Width Type Description
reg_dpu0_start 0x220 32 R/W DPU core0 start signal.
reg_dpu1_start 0x320 32 R/W DPU core1 start signal.
reg_dpu2_start 0x420 32 R/W DPU core2 start signal.

reg_dpu3_start

0x520 32 R/W DPU core3 start signal.

reg_dpu_instr_addr

The reg_dpu_instr_addr register is used to indicate the instruction address of a DPU core. Each DPU core has a reg_dpu_instr_addr register. Only the lower 28-bits are valid. In the DPU processor, the real instruction-fetch address is a 40-bit signal which consists of the lower 28 bits of reg_dpu_instr_addr followed by 12 zero bits. The available instruction address for DPU ranges from 0x1000 to 0xFFFF_FFFF_FFFF_F000. The details of reg_dpu_instr_addr are shown in the following table.

Table 5. reg_dpu_instr_addr
Register Address Offset Width Type Description
reg_dpu0_instr_addr 0x20C 32 R/W Start address in external memory for DPU core0 instructions. The lower 28-bit is valid.
reg_dpu1_instr_addr 0x30C 32 R/W Start address in external memory for DPU core1 instructions. The lower 28-bit is valid.
reg_dpu2_instr_addr 0x40C 32 R/W Start address in external memory for DPU core2 instructions. The lower 28-bit is valid.

reg_dpu3_instr_addr

0x50C 32 R/W Start address in external memory for DPU core3 instructions. The lower 28-bit is valid.

reg_dpu_base_addr

The reg_dpu_base_addr register is used to indicate the address of input image and parameters for each DPU in external memory. The width of a DPU base address is 40 bits so it can support an address space up to 1 TB. All registers are 32 bits wide, so two registers are required to represent a 40-bit wide base address. reg_dpu0_base_addr0_l represents the lower 32 bits of base_address0 in DPU core0 and reg_dpu0_base_addr0_h represents the upper eight bits of base_address0 in DPU core0.

There are eight groups of DPU base addresses for each DPU core and thus 24 groups of DPU base addresses for up to four DPU cores. The details of reg_dpu_base_addr are shown in the following table.

Table 6. reg_dpu_base_addr
Register Address Offset Width Type Description
reg_dpu0_base_addr0_l 0x224 32 R/W The lower 32 bits of base_address0 of DPU core0.
reg_dpu0_base_addr0_h 0x228 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core0.
reg_dpu0_base_addr1_l 0x22C 32 R/W The lower 32 bits of base_address1 of DPU core0.
reg_dpu0_base_addr1_h 0x230 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core0.
reg_dpu0_base_addr2_l 0x234 32 R/W The lower 32 bits of base_address2 of DPU core0.
reg_dpu0_base_addr2_h 0x238 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core0.
reg_dpu0_base_addr3_l 0x23C 32 R/W The lower 32 bits of base_address3 of DPU core0.
reg_dpu0_base_addr3_h 0x240 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core0.
reg_dpu0_base_addr4_l 0x244 32 R/W The lower 32 bits of base_address4 of DPU core0.
reg_dpu0_base_addr4_h 0x248 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core0.
reg_dpu0_base_addr5_l 0x24C 32 R/W The lower 32 bits of base_address5 of DPU core0.
reg_dpu0_base_addr5_h 0x250 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core0.
reg_dpu0_base_addr6_l 0x254 32 R/W The lower 32 bits of base_address6 of DPU core0.
reg_dpu0_base_addr6_h 0x258 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core0.
reg_dpu0_base_addr7_l 0x25C 32 R/W The lower 32 bits of base_address7 of DPU core0.
reg_dpu0_base_addr7_h 0x260 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core0.
reg_dpu1_base_addr0_l 0x324 32 R/W The lower 32 bits of base_address0 of DPU core1.
reg_dpu1_base_addr0_h 0x328 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core1.
reg_dpu1_base_addr1_l 0x32C 32 R/W The lower 32 bits of base_address1 of DPU core1.
reg_dpu1_base_addr1_h 0x330 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core1.
reg_dpu1_base_addr2_l 0x334 32 R/W The lower 32 bits of base_address2 of DPU core1.
reg_dpu1_base_addr2_h 0x338 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core1.
reg_dpu1_base_addr3_l 0x33C 32 R/W The lower 32 bits of base_address3 of DPU core1.
reg_dpu1_base_addr3_h 0x340 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core1.
reg_dpu1_base_addr4_l 0x344 32 R/W The lower 32 bits of base_address4 of DPU core1.
reg_dpu1_base_addr4_h 0x348 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core1.
reg_dpu1_base_addr5_l 0x34C 32 R/W The lower 32 bits of base_address5 of DPU core1.
reg_dpu1_base_addr5_h 0x350 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core1.
reg_dpu1_base_addr6_l 0x354 32 R/W The lower 32 bits of base_address6 of DPU core1.
reg_dpu1_base_addr6_h 0x358 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core1.
reg_dpu1_base_addr7_l 0x35C 32 R/W The lower 32 bits of base_address7 of DPU core1.
reg_dpu1_base_addr7_h 0x360 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core1.
reg_dpu2_base_addr1_l 0x42C 32 R/W The lower 32 bits of base_address1 of DPU core2.
reg_dpu2_base_addr1_h 0x430 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core2.
reg_dpu2_base_addr2_l 0x434 32 R/W The lower 32 bits of base_address2 of DPU core2.
reg_dpu2_base_addr2_h 0x438 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core2.
reg_dpu2_base_addr3_l 0x43C 32 R/W The lower 32 bits of base_address3 of DPU core2.
reg_dpu2_base_addr3_h 0x440 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core2.
reg_dpu2_base_addr4_l 0x444 32 R/W The lower 32 bits of base_address4 of DPU core2.
reg_dpu2_base_addr4_h 0x448 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core2.
reg_dpu2_base_addr5_l 0x44C 32 R/W The lower 32 bits of base_address5 of DPU core2.
reg_dpu2_base_addr5_h 0x450 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core2.
reg_dpu2_base_addr6_l 0x454 32 R/W The lower 32 bits of base_address6 of DPU core2.
reg_dpu2_base_addr6_h 0x458 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core2.
reg_dpu2_base_addr7_l 0x45C 32 R/W The lower 32 bits of base_address7 of DPU core2.
reg_dpu2_base_addr7_h 0x460 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core2.
reg_dpu3_base_addr0_l 0x524 32 R/W The lower 32 bits of base_address0 of DPU core3.
reg_dpu3_base_addr0_h 0x528 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address0 of DPU core3.
reg_dpu3_base_addr1_l 0x52C 32 R/W The lower 32 bits of base_address1 of DPU core3.
reg_dpu3_base_addr1_h 0x530 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address1 of DPU core3.
reg_dpu3_base_addr2_l 0x534 32 R/W The lower 32 bits of base_address2 of DPU core3.
reg_dpu3_base_addr2_h 0x538 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address2 of DPU core3.
reg_dpu3_base_addr3_l 0x53C 32 R/W The lower 32 bits of base_address3 of DPU core3.
reg_dpu3_base_addr3_h 0x540 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address3 of DPU core3.
reg_dpu3_base_addr4_l 0x544 32 R/W The lower 32 bits of base_address4 of DPU core3.
reg_dpu3_base_addr4_h 0x548 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address4 of DPU core3.
reg_dpu3_base_addr5_l 0x54C 32 R/W The lower 32 bits of base_address5 of DPU core3.
reg_dpu3_base_addr5_h 0x550 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address5 of DPU core3.
reg_dpu3_base_addr6_l 0x554 32 R/W The lower 32 bits of base_address6 of DPU core3
reg_dpu3_base_addr6_h 0x558 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address6 of DPU core3.
reg_dpu3_base_addr7_l 0x55C 32 R/W The lower 32 bits of base_address7 of DPU core3.
reg_dpu3_base_addr7_h 0x560 32 R/W The lower 8 bits in the register represent the upper 8 bits of base_address7 of DPU core3.

Interrupts

The DPU generates an interrupt to signal the completion of a task. A high state on reg_dpu0_start signals the start of a DPU task for DPU core0. At the end of the task, the DPU generates an interrupt and bit0 in reg_dpu_isr is set to 1. The position of the active bit in the reg_dpu_isr depends on the number of DPU cores. For example, when DPU core1 finishes a task while DPU core 0 is still working, reg_dpu_isr would contain 2’b10.

The width of the dpu_interrupt signal is determined by the number of DPU cores. When the parameter DPU_NUM is set to 2, then the DPU IP contains two DPU cores, and the width of the dpu_interrupt signal is two. The lower bit represents the DPU core 0 interrupt and the higher bit represents the DPU core1 interrupt.

The interrupt connection between the DPU and the PS is described in the device tree file, which indicates the interrupt number of the DPU connected to the PS. Any interrupt pin may be used if the device tree file and Vivado assignments match. The reference connection is shown here.

Figure 4: Reference Connection for DPU Interrupt
Note:
  1. If the softmax option is enabled, then the softmax interrupt should be correctly connected to the PS according to the device tree description.
  2. irq7~irq0 corresponds to pl_ps_irq0[7:0].
  3. irq15~irq8 corresponds to pl_ps_irq1[7:0].