What's New in Vitis™

2020.2

Vitis Software Platform 2020.2 Release Highlights:

  • Vitis 2020.2 supports application acceleration and embedded software development for Versal ACAP Platforms
  • Vitis Core Development Kit now includes the AI Engine Compiler to compile C/C++ applications for Versal AI Engines. AI Engine, part of Versal AI Core Series, is a vector processor for compute-intensive applications
  • Vitis HLS is default for both accelerated-kernel compilation (Vitis) and C/C++ to RTL IP creation flow (Vivado)
  • 600+ FPGA-accelerated functions across 13 performance-optimized libraries. 2020.2 introduces the new Vitis HPC library for accelerating high-performance computing applications and several enhancements & additions to the Data Analytics, Graph, BLAS, Sparse, Security & Database libraries
  • Support for evaluating multiple implementation strategies for final FPGA binary creation & enhancements for easier RTL-kernel integration within Vitis applications
  • Other enhancements this release include support for AI Engine application profiling, Git version control for Vitis projects, Vitis AI profiler data integration within Vitis Analyzer and enhancements for emulation modes. Learn More >
  • Add-on for MATLAB® and Simulink® : Unification of Xilinx Model Composer and System Generator for DSP. AI Engine is a new domain in Add-On for MATLAB and Simulink.
    Learn More >

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2020.2. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2020.2 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

Note: Vitis Accelerated Libraries are available as a separate download. They can be downloaded from GitHub or directly from within the Vitis IDE as well.

  • FPGA-accelerated library for HPC workloads. Initial release focuses on Seismic Imaging & Geophysics Simulation use-cases
    • Reverse Time Migration (RTM) – Seismic imaging technique for accurate representation of subsurface 
    • High-precision Multi-layer Perceptron (MLP) - Reconstruction of subsurface properties using seismic reflection data (Seismic Inversion)
  • Optimized for single precision floating point data types (FP32) which is a key requirement within HPC applications
  • Version 1 of the library offers the following:
    • L1 Stencil primitive, L1 MLP activation functions including Sigmoid, Relu, and Prelu
    • L2 2D RTM forward kernel, 2D RTM backward kernels, and 3D RTM forward kernel
    • L3 2D RTM APIs for supporting shot parallelism

New Functions and Features

  •  2020.2 ISP Pipeline example design supports pixel depths up to 16 bits
  • Local tone mapping
  • Auto Exposure Correction
  • Quantization & Dithering
  • Color Correction Matrix
  • Black Level Correction
  • Lens Shading correction
  • Brute Force Feature Matching
  • Mode Filter
  • blobFromImage
  • Laplacian Operator
  • Distance Transform

Library Infrastructure & Other Enhancements

  • All library functions support Alveo U50 platform
  • GUI support for both Edge and Data Center platforms
  • Color Conversion : Supporting RGBX or fourth channel support
  • Line Stride support in Data Converters
  • Removed xf_axi_sdata.hpp file. Axiconverter functions now use the HLS ap_axi_sdata.h file instead.

Ready-to-Evaluate Apps in New Xilinx App Store

The following FPGA-accelerated applications, developed using the Vitis Vision library, are now available on the new Xilinx App Store as containers for easy evaluation and deployment on Alveo accelerator cards on the Nimbix cloud or On-premise

  • Image Classification using ML-inference engine from Vitis AI Library and Vitis Vision Pre Processing Function : Learn More
  • Image Sensor Processing (ISP) Pipeline : Learn More
  • Stereo Block Matching : Learn More
  • Text Processing APIs. Two major APIs included - the regular expression match and geo-IP lookup. The former API can be used to extract content from unstructured data like logs, while the latter is often used in processing web logs, to annotate with geographic information by IP address. A demo tool that converts Apache HTTP server log in batch into JSON file is provided with the library.
  • DataFrame APIs for in-memory Data Abstraction: DataFrame is widely popular for in-memory data abstraction in data analytics domain, the DataFrame write and read APIs should enable data analytics kernel developers to store temporal data or interact with open-source software using Apache Arrow DataFrame more easily.
  • Tree Ensemble Method. Random forest is extended to include regression. Gradient boost tree, based on boosting method, is added to support both classification and regression. Support for XGBoost on classification and regression is also included to exploit 2nd order derivative of loss function and regularization.
  • Single-Source Shortest Path API (singleSourceShortestPath): 2020.2 version now supports the Alveo U50 platform and provides a new output ‘pred32’ for the shortest path information.
  • Page Rank APIs: 2020.2 version now supports Alveo U50 platform and including two APIs both named ‘pageRankTop’ - One to leverage a single memory channel and the other to utilize multi-bank memories.  
  • Similarity APIs: 3 new APIs to cover different applications: .‘denseSimilarityKernel’ is for dense graph applications, ‘sparseSimilarityKernel’ for Sparse graph applications and ‘generalSimilarityKernel’ for both types of applications with single kernel.
  • The following APIs now support Alveo U50 platform:
    • Breadth-First search bfs API (bfs)
    • Degree calculation API (calcuDegree)
    • Connected component API (connectedComponents)
    • Converting format from CSC to CSR API (convertCsrCsc)
    • Label propagation API (labelPropagation)
    • Strongly connected component API (stronglyConnectedComponents)
    • Triangle count API (triangleCount)
  • New L2 GEMM Kernel
  • For FP32 data types, the L3 GEMM performance has been improved from 280 GFLOPS to 340 GFLOPS
  • Introduced FP32 L2 CSCMV kernel (sparse matrix vector multiplication for CSC - Compressed Sparse Column - format matrices) that utilizes 16 HBM channel support on the Alveo U280 accelerator card.
  • The 2020.2 release brings a major enhancements and updates to the General Query Engine (GQE) kernel design, and brand-new Level 3 APIs for JOIN and GROUP-BY AGGREGATE.
    • Columns as Input Buffers: The GQE kernels treat each column as an input buffer, simplifying the data preparation in the host code. Additionally, allocating multiple buffers on host side will reduce out-of-memory issues compared to big contiguous memory allocations, especially when the server is under heavy load.
    • Command Classes for generating Configuration bits : The L2 layer now provides command classes to generate the configuration bits for GQE kernels. Developers no longer have to dive into the bitmap table to understand which bit(s) to toggle to enable or disable a function in GQE pipeline. Thus, the host code can be more sustainable and less error-prone.
    • New Level-3 APIs: New experimental L3 APIs for JOIN and GROUP-BY AGGREGATE are built to scale the problem size that GQE can handle. They can breakdown the tables into parts based on hash and call the GQE kernels multiple rounds in a well-schedule fashion. The strategy of execution is separated from execution, so database gurus can fine-tune the execution based on table statistics, without messing with the OpenCL execution part.
  • LIBZ Library Acceleration using Alveo U50             
    • Seamless acceleration of libz standard APIs : deflate, compress2 and uncompress
    • Ready-to-use libz.so library to accelerate any host code without any code change
    • xzlib standalone executable for both gzip/zlib compress & decompress
  • ZSTD Decompression : New implementation of Facebook ZSTD algorithm available
  • Snappy Dual Core Kernel : New implementation of Google snappy Dual Core decompression algorithm achieves 2x throughput improvement for single file decompress.
  • GZIP Compress Kernel: New GZIP Quad Core Compress Kernel (in-built , LZ77 , TreeGen, Huffman encoder) implementation available. More than 20% reduction in overall resources and 50% reduction in DDR bandwidth requirement.
  • GZIP Compress Streaming Kernel: Fully standard compliance GZIP(include header & footer) implementation available, streaming free running kernels.
  • GZIP/ZLIB L3 Application on Alveo U50: GZIP/ZLIB Application available as an L3 API , optimized for Alveo U50 (HBM) and Alveo U250 cards. Single FPGA binary (xclbin) supports both zlib & gzip format for compress and uncompress
  • Support for  to Alveo U50 : Library functions (LZ4, Snappy, GZIP, ZLIB) ported to support the Alveo U50 platform.
  • Low Latency GZIP/ZLIB Decompress : Initial decompression latency reduced from 5K to 2.5K for 4KB/8KB/16KB block sizes
  • APIs revised to fully support Vitis HLS compiler
  • New Signature Generation and Verification Algorithms: DSA, ECC, ECDSA(secp256k1) and EdDSA(ed25519)
  • New Checksum Algorithms: Adler32 and CRC32.
  • Verifiable delay function (VDF) evaluation and verification: Pietrzak's VDF and Wesolowski's VDF.
  • Commercial Cryptography constituted by CAS: SM2, SM3 and SM4.
  • Stream Cipher: XChacha20.
  • Optimization on RSA, GMAC, AES-GCM and SHA3 to improve their performance and resource utilization.
  • Argument parser (Beta): Parses the options and flags passed from command line and offers automatic help information generation enabling developers to create unified experience on test cases and user applications.
  • FIFO multiplexer: This module wraps around a FIFO (implemented through hls::stream in kernel code ) to enable passing data of different type through the same hardware resource. When the data is too wide, it will automatically be transferred using multiple cycles. This module is expected to make the dataflow code more compact and readable.

ADF: Adaptive Data Flow

  • Compiler:
    • Event tracing on PLIO or GMIO
    • Event tracing also on Hardware
    • Heat Map generation: %utilization of all AI Engines
    • Supports different PL frequencies for PL kernels and PLIOs
  • Vitis IDE for AI Engines
    • Pipeline view
    • Vector register view
    • Internal memory views East, West North, South
    • External memory
  • Vitis HLS replaces Vivado HLS in Vivado (it was already default for Vitis and C based kernel compilation in 2020.1)
    • Adds array reshape and partitioning pragmas for top function ports
  • The tool is now installed in its own directory ./Vitis_HLS/2020.2 alongside Vitis and Vivado
  • HLS design migration information has been updated in UG1391
  • Vitis HLS user guide is UG1399, the full content is also available in HTML
  • Updated design examples on GitHub, they can also be loaded automatically from the Vitis HLS GUI (from the "Git Repositories" sub-window) for direct access
  • Support for SIMD programming
  • Support for on-chip block RAM ECC flags via the bind_storage pragma (Vivado flow only) to monitor error correction logic generated by the RAM blocks
  • GUI has a simplified toolbar icon layout, new reporting sections for interfaces and AXI4 including bursts
  • Non-default options can be filtered for quick review in "Solution Settings"→"General" then "Show only non-defaults" tick mark
  • User can create and open a project in the GUI directly starting from Tcl using the -p option and passing the Tcl file as an argument: vitis_hls -p  <file>.tcl
  • Interactive FIFO depth sizing in GUI
  • Constrained random testing for AXI interfaces now visible in the GUI

Versal Only Features

  • Vitis HLS now infers the dedicated single clock cycle accumulation for floating point (adder or multiplier) of the DSP58 block to implement efficient high throughput accumulation
  • Timing libraries updated for Versal production target devices
  • Improved RTL-Kernel Integration:  Enhancements for packaging & integrating RTL IPs as kernels within Vitis applications, including support for user-managed RTL kernels (not controlled by XRT APIs) and improvements to IP Packager within Vivado to support this flow.
  • Multiple Implementation Strategies for Timing Closure: Vitis compiler & linker (v++) now supports launching & running multiple Vivado implementation strategies at the same time during hardware builds. This enables users to explore & assess all results and select the best strategy for final FPGA binary (xclbin) creation.

Versal Only Features

  • In 2020.2, as long as the hardware design stays the same, aiecompiler will only recompile and update to the software when AIE program is modified. The v++ linking stage is not re-run and it goes directly to the package step. This allows users to easily and quickly iterate on the AIE program after the HW has been fixed.
  • System Level template will be provided which includes AIE, PL and PS design files.
  • AIE tools features integrated into Vitis IDE, such as displaying pipeline information, storage view, parallel compilation etc.
  • Version Control for Vitis Projects: Integration with Git version control for Vitis Projects enables collaboration across multiple developers and teams.
  • Improvements to Project Hierarchy: Acceleration kernel and host applications are now separate projects under top-level System Project enabling a user to compile the host application and hardware kernels separately.
  • Improvements to Board Support Package (BSP) Build times: For platform projects with standalone domains, the Board Support Package (BSP) drivers compiles in parallel to speed up application build time.
  • Ease-of-Use for Host Application Debug: Processing System registers can be now be exported as a file from the Vitis GUI for debug.
  • Profiling System Projects: Top-level System Projects now offers more control over specifying profiling features via the Vitis GUI for the Vitis application acceleration flows.
  • Improved Support for Platform Creation with Hardware Emulation: In addition to the Block Diagram as the top-level, the Hardware emulation mode now also supports RTL sources in the platform  as the top-module or reference RTL inside block diagram without packaging. You can add RTL testbench as in Vivado. It offers more flexibility for validating designs before deployment.
  • Save Signals during Emulation for Debug: Save signals to Xilinx Simulator (XSIM) waveform file during emulation. User can pass -wcfg-file-path to launch_hw_emu.sh when rerunning hardware emulation.
  • Emulation Support for Slave Bridge Feature (Alveo Platforms) : Please refer to the Alveo Platform Documentation for more details on Slave Bridge features.
  • Python/C++ APIs for emulating AXI Stream IOs : Mimic data streaming through IO ports on platform using simple Python or C++ APIs while emulating AXI Stream kernels enabling you to emulate and debug complete system with programmed traffic patterns much earlier in the design cycle
  • Questa Simulator support for U250 Alveo Platform: In addition to the Xilinx Simulator (XSIM), hardware emulation in Vitis for U250 Alveo platforms now also supports Questa. Setup is done via V++ configuration files or Vitis IDE.
  • HLS kernel deadlock detection: Deadlock or livelock code in HLS kernel can be detected during hardware emulation by compiling HLS kernel with v++ config param=compiler.deadlockDetection=true

Versal Only Features

  • 3rd party simulator support ( Questa, Xcelium, VCS) : In addition to the Xilinx Simulator (XSIM), hardware emulation in Vitis for Versal embedded platforms now also supports 3rd party simulators like Questa and Xcelium on Linux. VCS is supported in Early Access stage. Setup is done via V++ configuration files or Vitis IDE.
  • Vitis AI Profiler Data Integration: For applications that use the Deep Learning Processing Unit (DPU) for AI inference, you can access Vitis AI profiler information including DPU throughput, DDR read/write rates and timeline trace information within Vitis Analyzer to assess end-to-end application acceleration. 
  • View Package Summary Report: View the Package Summary Report within Vitis Analyzer for an overall view of application’s status from a performance and optimization perspective.  The package summary is created by v++ command after linking to build a package that can be run for software or hardware emulation or can be booted and run on the hardware device.
  • Integrated Host & Kernel Profiling: Vitis 2020.2 adds the capability to provide user event API profiling. Beyond the profiling capabilities inherently available for accelerated kernels, you can call Xilinx Runtime Library (XRT) APIs in your host code to profile arbitrary sections of the design and make decisions on overall application performance optimization.
  • Other Enhancements: Global Search across all reports accessible within Vitis Analyzer, flexibility to save/restore custom user layouts for viewing performance reports, Intuitive grouping of guidance messages to view related information in one place, Improvements to utilization reports enabling visibility into statistics on a per Super Logic Region (SLR) basis for deeper insight.

Versal Only Features

  • Profile summary report will have specific AIE design entry. More AIE related data will be shown in the compile/run summary reports, such as AIE heatmap which displays the kernel active/stall cycles running on HW.
  • Improved Visibility for Debug:  AXI-S Transaction-level view available in the Xilinx Simulator (XSIM) Transaction Viewer for System-C portions of hardware emulation designs, providing better visibility into the design at a transaction level for debug.
  • View FIFO Status in Live Waveform Viewer: Status of user-level FIFOs (denoted as hls:streams in kernel code) can be viewed in Live Waveform Viewer during Hardware Emulation, providing visibility into static FIFO depths, FIFO elements and  FIFO usage to identify performance bottlenecks for acceleration kernels

Versal Only Features

  • Event trace enhancements: Vitis 2020.2 incorporate a couple of enhancements on AIE event trace features, such as support for offloading by XRT, multiple trace stream flow enhancement support and the ability to monitor PL/AIE boundary even PL kernel is defined in the graph. Meanwhile, the PL/PS/AIE event trace are combined into a common timeline to provide better visualization of the whole design.

Note: Xilinx Runtime Library (XRT) is available as a separate download. Please refer to the Getting Started information for download and install instructions.

  • Improved Support for HBM-enabled Platforms:  Leverage the benefits of high-bandwidth memory (HBM) enabled platforms by specifying kernel port connections to HBM banks through v++ --sp HBM[#:#] Xilinx Runtime Library (XRT) APIs can also automatically assign the HBM banks and enable the host application to allocate arbitrary sized buffers of one or more HBM segments (256MB+) (on HBM segment bounds).
  • Next Generation Xilinx Board Management Utilities (Preview): Next generation Xilinx Board Management utilities (xbutil, xbmgmt) are available for preview. They can enable the Slave Bridge and DDR retention features for Xilinx platforms that support them. Note: Current generation of board management utilities will be moved to maintenance mode in 2021.1 & new features will only be added to next generation utilities.

Versal Only Features

  • AIE support is added to support RTP, error handling, full array reconfiguration and graph API.

Access the latest Vitis Target Platforms for Alveo Accelerator cards from the Alveo Packages Download Tab

Please refer to UG1120 - Alveo Data Center Accelerator Card Platforms User Guide for more details and to keep up-to-date on the latest Vitis Target Platform releases, as they become available

U200/U250 XDMA Platforms

  • Alveo Platform U200 XDMA 2RP - Production
    • Features: ERT, CMC, PLRAM, DRM capable floorplan, XDMA, 2RP, P2P, M2M, GT Kernel, PCIe Slave Bridge, DDR Self-Refresh
  • Alveo Platform U250 XDMA 2RP - Production
    • Features: ERT, CMC, PLRAM, DRM capable floorplan, XDMA, 2RP, P2P, M2M, GT Kernel, PCIe Slave Bridge, DDR Self-Refresh

Shell Upgrade DFX - 2RP ( 2 Reconfiguration Partitions)

  • Small size of static region: Base 
    • PCIe functionality
    • In-band FPGA partial reconfiguration 
  • New reconfiguration partition: Shell
    • Update DMA and utility functions
    • Dynamic swapping between platforms without rebooting the server
  • 2nd reconfiguration partition: User Logic
    • Accelerator kernel functions

AXI Slave Bridge

  • Direct host memory access by the kernel 
  • DMA bypass capability, with AXI-Slave 512-bit interface and user can provide their own data mover

Data Retention - DDR4 self-refresh

  • Data context retained in FPGA memory using DDR4 self-fresh during reconfiguration
  • Eliminates copying to host RAM as a temporary storage for different XCLBINs
  • Minimizes movement of large data sets

Note: Vitis Target Platforms for Embedded Platforms (including pre-built linux kernels, root file system and sysroot) are available as a separate download on Vitis Embedded Platforms Tab

  • ZYNQ-7000 and ZYNQ UltraScale+ MPSoC base platform functions are kept the same but platform source code has been re-structured. Directories are renamed for easy understanding; common source files across multiple platforms are grouped together. It would be easier to reuse the platform source code and port it to a new platform.
  • When building platform from source code, besides compiling PetaLinux from scratch, a new end-to-end compiling method is added if user uses downloaded common software components. User can point to those components and skip PetaLinux compiling when building a platform.

The VCK190 platform has flexible DDR + LPDDR memory subsystem and supports 63 interrupts for acceleration kernels. It is available for use with the Vitis core development kit, for both application acceleration and embedded processor software development, as described in Versal AI Engine Programmers Guide (UG1076). The platform enables development of designs that include:

  • AI Engine graphs and kernels
  • Programmable Logic kernels
  • Host application targeting the Linux or a bare metal OS running on the Arm processor in the Versal device.
  • Please refer to Getting Started with Vitis and Versal ACAP platforms to learn more.
  • Support for Kubernetes(K8s) clusters: Xilinx FPGA Resource Manager (XRM) can now be used together with the Kubernetes to run and manage compute units (CUs) across a pool of multiple Alveo accelerator cards attached to a server and scale applications to multiple servers with Alveo cards.
2020.1