Lab 4: Optimize the Accelerator Using Directives

In this exercise, you modify the source file in the project to observe the effects of Vivado HLS pragmas on the performance of generated hardware. See Introduction for more information on this topic.

  1. Create a new project in the SDx™ environment (lab4) for the ZC702 Platform and Linux System Configuration using the design template for Matrix Multiplication and Addition.
  2. Click on the tab labeled lab4 to view the SDx Project Settings. If the tab is not visible, in the Project Explorer double click on the project.sdx file under the lab4 project.
  3. In the HW Functions panel, observe that the madd and mmult functions already appear in the list of functions marked for hardware acceleration.
  4. To get the best runtime performance, switch to use the Release configuration by clicking on the Active Build Configuration option and then selecting Release. You could also select Release from the Build icon, or by right-clicking the project and selecting Build Configuration > Set Active > Release. The Release build configuration uses a higher compiler optimization setting than the Debug build configurations.
  5. Double click the mmult.cpp in the Project Explorer view to bring up the source editor view.
  6. Find the lines where the pragmas HLS pipeline and HLS array_partition are located.
  7. Remove these pragmas by commenting out the lines.

  8. Save your file.
  9. Right click the top-level folder for the project and click Build Project in the menu.
  10. After the build completes, copy the lab4/Release/sd_card folder to an SD card.
  11. Insert the SD card into the ZC702 board and power on the board.
  12. Connect to the board from a serial terminal in the SDx Terminal tab of the SDx IDE. Click the + icon to open the settings.
  13. After the board boots up, you can execute the application at the Linux prompt. Type /mnt/lab4.elf.
    Observe the performance and compare it with the performance achieved with the commented out pragmas present (compare it with the results of lab1). Note that the array_partition pragmas increase the memory bandwidth for the inner loop by allowing array elements to be read in parallel. The pipeline pragma on the other hand performs pipelining of the loop and allows multiple iterations of a loop to run in parallel.