Lab 3: Optimize the Application Code

This tutorial demonstrates how you can modify your code to optimize the hardware-software system generated by the SDx environment. You will also learn how to find more information about build errors so that you can correct your code.

Note: This tutorial is separated into steps, followed by general instructions and supplementary detailed steps allowing you to make choices based on your skill level as you progress through it. If you need help completing a general instruction, go to the detailed steps, or if you are ready, simply skip the step-by-step directions and move on to the next general instruction.
Note: You can complete this lab even if you do not have a ZC702 board. When creating the SDSoC environment project, select your board and one of the available applications if the suggested template Matrix Multiplication and Addition is not found. For example, boards such as the MicroZed with smaller Zynq-7000 devices offer the Matrix Multiplication and Addition (area reduced) application as an available template. In this tutorial you are not asked to run the application on the board, and you can complete the tutorial following the steps for the ZC702 to satisfy the learning objectives.

Introduction to System Ports and DMA

In Zynq®-7000 All Programmable SoC device systems, the memory seen by the ARM A9 processors has two levels of on-chip cache followed by a large off-chip DDR memory. From the programmable logic side, the SDx IDE creates a hardware design that might contain a Direct Memory Access (DMA) block to allow a hardware function to directly read and/or write to the processor system memory via the system interface ports.

As shown in the simplified diagram below, the processing system (PS) block in Zynq devices has three kinds of system ports that are used to transfer data from processor memory to the Zynq device programmable logic (PL) and back. They are Accelerator Coherence Port (ACP) which allows the hardware to directly access the L2 Cache of the processor in a coherent fashion, High Performance ports 0-3 (HP0-3), which provide direct buffered access to the DDR memory or the on-chip memory from the hardware bypassing the processor cache using Asynchronous FIFO Interface (AFI), and General-Purpose IO ports (GP0/GP1) which allow the processor to read/write hardware registers.

Figure: Simplified Zynq + DDR Diagram Showing Memory Access Ports and Memories

Outline Panel

When the software running on the ARM A9 processor “calls” a hardware function, it actually invokes an sds++ generated stub function that in turn calls underlying drivers to send data from the processor memory to the hardware function and to get data back from the hardware function to the processor memories over the three types of system ports shown: GPx, ACP, and AFI.

The table below shows the different system ports and their properties. The sds++ compiler automatically chooses the best possible system port to use for any data transfer, but allows you to override this selection by using pragmas.

System Port Properties
ACP Hardware functions have cache coherent access to DDR via the PS L2 cache.
AFI (HP) Hardware functions have fast non-cache coherent access to DDR via the PS memory controller.
GP Processor directly writes/reads data to/from hardware function. Inefficient for large data transfers.
MIG Hardware functions access DDR from PL via a MIG IP memory controller.

Learning Objectives

After you complete the tutorial (lab3), you should be able to:
  • Use pragmas to select ACP or AFI ports for data transfer
  • Observe the error detection and reporting capabilities of the SDSoC environment.
If you go through the additional exercises, you can also learn to:
  • Use pragmas to select different data movers for your hardware function arguments
  • Understand the use of sds_alloc()
  • Use pragmas to control the number of data elements that are transferred to/from the hardware function.