What's New in Vitis™

2021.1

Vitis Software Platform 2021.1 Release Highlights:

  • Xilinx Kria System-on-Modules (SOMs) KV260 vision AI starter kit support. The full Vitis flow for ML (DPU inference engine) + X (RTL kernel and Vitis HLS based computer vision kernels). Learn More >
  • Support for new C/C++ Vision, DSP, Graph (Louvain Modularity), Codec in image processing, compression (GZIP, Facebook ZSTD, ZLIB whole application acceleration) performance-optimized libraries on FPGA and/or Versal ACAP over CPU/GPUs
  • Enhanced Vitis™  core development kit design flow on Versal ACAP devices: visualization improvements for AI engine design trace report, AI engine event tracing via GMIO, incremental recompile, new boot image wizard, and encrypted AI engine source file support
  • The new Vitis Model Composer tool enables rapid design exploration and verification within the MathWorks MATALB and Simulink® environment, enabling co-simulation of blocks targeting AI Engines and Programmable Logic, code generation, and test bench creation.  Learn More >
  • New Vitis HLS Flow Navigator GUI for quick access to flow phases and reports. Merge synthesis, analysis, and debug views into a general default context

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2021.1. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2021.1 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

Note: Vitis Accelerated Libraries are available as a separate download. They can be downloaded from GitHub or directly from within the Vitis IDE as well.

  • AIE DSP
    • DSPLib published as part of the Vitis Acceleration Library set on Github
    • DSPLib contains common parameterizable DSP functions used in many advanced signal processing applications. All functions currently support window interfaces with streaming interface support.
      • FIR Filters

        Function

        Namespace

        Single rate, asymmetrical

        dsplib::fir::sr_asym::fir_sr_asym_graph

        Single rate, symmetrical

        dsplib::fir::sr_sym::fir_sr_sym_graph

        Interpolation asymmetrical

        dsplib::fir::interpolate_asym::fir_interpolate_asym_graph

        Decimation, halfband

        dsplib::fir::decimate_hb::fir_decimate_hb_graph

        Interpolation, halfband

        dsplib::fir::interpolate_hb::fir_interpolate_hb_graph

        Decimation, asymmetric

        dsplib::fir::decimate_asym::fir_decimate_asym_graph

        Interpolation, fractional, asymmetric

        dsplib::fir::interpolate_fract_asym:: fir_interpolate_fract_asym_graph

        Decimation, symmetric

        dsplib::fir::decimate_sym::fir_decimate_sym_graph

         

      • FFT/iFFT - The DSPLib contains one FFT/iFFT solution. This is a single channel, single kernel decimation in time, (DIT), implementation with configurable point size, complex data types, cascade length and FFT/iFFT function.

        Function

        Namespace

        Single Channel FFT/iFFT

        dsplib::fft::fft_ifft_dit_1ch_graph

      • Matrix Multiply (GeMM) - The DSPLib contains one Matrix Multiply/GEMM (GEneral Matrix Multiply) solution. This supports the Matrix Multiplication of 2 Matrices A and B with configurable input data types resulting in a derived output data type.

        Function

        Namespace

        Matrix Mult / GeMM

        dsplib::blas::matrix_mult::matrix_mult_graph
      • Widget Utilities - These widgets support converting between window and streams on the input to the DSPLib function and between streams to windows on the output of the DSPLib function where desired and additional widget for converting between real and complex data-types.

        Function

        Namespace

        Stream to Window / Window to Stream

        dsplib::widget::api_cast::widget_api_cast_graph

        Real to Complex / Complex to Real

        dsplib:widget::real2complex::widget_real2complex_graph
      • DSP Library functions are supported in Vitis Model Composer, enabling users to easily plug these functions into the Matlab/Simulink environment to ease AI Engine DSP Library evaluation and overall AI Engine ADF graph development.
  • Vitis HPC Library release introduces HLS primitives, prebuild kernles and software APIs for HPC applications on FPGAs. These applications are:

    • 2D Acoustic RTM (Reverse Time Migration) FDTD (Finite Difference Time Domain) algorithm, including forward kernel and backward kernel

    • 3D Acoustic RTM (Reverse Time Migration) FDTD (Finite Difference Time Domain) algorithm, including forward kernel

    • MLP (Mult-Layer Perceptron) components: activation functions and fully connected network kernels

    • PCG (Preconditioned Conjugate Gradient) Solvers for both dense matrix and sparse matrix

  • First release of selected vision functions for Versal AI Engines: 
  • Functions available 

    • Filter2D

    • absdiff

    • accumulate

    • accumulate_weighted

    • addweighted

    • blobFromImage

    • colorconversion

    • convertscaleabs

    • erode

    • gaincontrol

    • gaussian

    • laplacian

    • pixelwise_mul

    • threshold

    • zero

  • xfcvDataMovers : Utility datamovers to facilitate easy tiling of high resolution images and transfer to local memory of AI Engines cores. Two flavors

    • Using PL kernel : higher throughput at the expense of additional PL resources. 
    • Using GMIO : lower throughput than PL kernel version but uses Versal NOC (Network on chip) and no PL resources. 
  • New Programmable Logic (PL) functions and features 
  • ISP pipeline and functions:
    • Updated 2020.2 Non-HDR Pipeline 
      • Support to change few of the ISP parameters at runtime: gain parameters for red and blue channels, AWB enable/disable option, gamma tables for R,G,B, %pixels to compute min&max for awb normalization.
      • Gamma Correction and Color Space conversion (RGB2YUYV) made part of the pipeline.
    • New 2021.1 HDR Pipeline : 2020.2 Pipeline + HDR support
      • HDR merge for 2 exposures which supports sensors with digital overlap between short exposure frame and long exposure frame. 
        • Four Bayer patterns supported : RGGB,BGGR,GRBG,GBRB
      • HDR merge + isp pipeline with runtime configurations, which returns RGB output.
      • Extraction function : HDR extraction function is preprocessing function, which takes single digital overlapped stream as input and returns the 2 output exposure frames(SEF,LEF).
    • 3DLUT : provides input-output mapping to control complex color operators, such as hue, saturation, and luminance.
    • CLAHE: Contrast Limited Adaptive Histogram Equalization is a method which limits the contrast while performing adaptive histogram equalization so that it does not over amplify the contrast in the near constant regions. This it also reduces the problem of noise amplification.
  • Flip : Flips the image along horizontal and vertical line.
  • Custom CCA : Custom version of Connected Component Analysis Algorithm for defect detection in fruits. Apart from computing defected portion of fruits , it computes defected-pixels as well as total-fruit-pixels
  • Canny updates : Canny function now supports any image resolution.

Library Related Changes

  • All tests have been upgraded from using OpenCV 3.4.2 to OpenCV 4.4
  • Added support for Versal Edge series (VCK190) 
  • A new benchmarking section with benchmarking collateral for selected pipeline/functions published.
  • The 2021.1 release provide Two-Gram text analytics:

    • Two Gram Predicate (TGP) is a search of the inverted index with a term of 2 characters. For a dataset that established an inverted index, it can find the matching id in each record in the inverted index.

  • Community Detection: Louvain Modularity
  • 2-Hop Search
  • Adds double-precision SpMV (Sparse Matrix dense Vector multiplication) implementation with L2 kernels
  • In 2021.1 release, GQE receives early-access support the following features

    • 64-bit join support: now the gqeJoin kernel and its companion gqePart kernel has been extended to 64-bit key and payload, so that a larger scale of data can be supported.

    • Initial Bloom-filter support: the gqeJoin kernel now ships with a mode in which it executes Bloom-filter probing. This improves efficiency on certain multi-node flows where minimizing data size in the early stage is important.

    • Both features are offered now as L3 pure software APIs, please check corresponding L3 test cases.

  • GZIP Multi Core Compression:
    • New GZIP Multi-Core Compress Streaming Accelerator which is purely stream only solution (free running kernel), it comes with many variant of different block size support of 4KB, 8KB, 16KB and 32KB. 
  • Facebook ZSTD Compression Core:
    • New Facebook ZSTD Single Core Compression accelerator with block size 32KB. Multi-cores ZSTD compression is in progress (for higher throughput).
  • GZIP low latency Decompression:
    • A new version of GZIP decompress with improved latency for each block, lesser resources (35% lower LUT, 83% lower BRAM) and improved FMax.
  • ZLIB Whole Application Acceleration using U50: 
    • L3 GZIP solution for U50 Platform, containing 6 Compression core to saturate full PCIe bandwidth. It is provided with Efficient GZIP SW Solution to accelerate CPU libz.so library which provide seamless Inflate and deflate API level integration to end customer software without recompiling. 
  • Versal Platform Supports.
 
  • Add AIE Support - See above
  • The 2021.1 release provide support for: * RIPEMD160 * Initial support for BLS (not complete)
  • In the 2021.1 release, Data-Mover is added to this library. Unlike other C++ based APIs, this addition is targeting people less experienced in HLS based kernel design and just want to test their stream-based designs. The Data-Mover is actually a kernel source code generator, creating a list of common helper kernels to drive or validate designs, like those on AIE devices.
  • Produce QoR metrics (Vitis QoR Generation API)
    • Cycles took by Application kernel
    • Stall cycles (computed from VCD file)
    • Measure overhead cycles in the wrapper (time spent in other functions than the kernel itself)
    • Throughput
  • 3 levels of optimization XLOPT=0, 1 (default), 2
  • New functionalities for xlopt=2:
    • loop fusion, flatten single iteration outer loops, enhance loop peeling heuristics
  • Analyze "__restrict" usage and give guidance
  • Incremental recompile: when the graph does not change, recompile only kernels that've been modified
  • Packet Switched data → up to 32-split (was limited to 4)
  • New DMA FIFO location constraint (mapper/router changes between release do not impact performances)
  • Use mapping solution as a constraint in the new compilation: prevent future mapping variations that impact performance
  • Bring x86sim feature support to aiesim level
  • Start of deprecation of PL kernels in ADF graphs (complete deprecation in 2021.2)
  • New “Flow Navigator” in GUI for quick access to flow phases and reports.  The contextual "synthesis, analysis, debug" views are merged into a general default context
  • New synthesis report section for the BIND_OP and BIND_STORAGE directives
  • A new post-synthesis text report reflects the information provided in the GUI synthesis report
  • The IP export and Vivado implementation run widgets have been redesigned with options to pass settings and constraint files to Vivado
  • New function call graph viewer to visualize functions and loops which can be highlighted with an optional heatmap to detect II, latency, or DSP/BRAM utilization hot spots
  • Versal timing calibration and new controls for DSP block native floating-point operations (the -precision option for config_op)
  • The Vitis HLS Migration guide (former UG1391) is now a chapter in UG1399
  • New methodology sections in user guide (UG1399 and web)
  • Alternate flushable pipeline option has been improved (free-running pipeline aka "frp")
  • In Vitis, a top port pointer can now simply be mapped onto the axi-lite adapter rather than a global memory
  • The aggregate directive now provides a "-compact bit" option for maximum packing
  • Adds back a "Leave Feedback" entry in Help menu with optional survey
  • Fixed bug for "Man Pages" tab not displaying information on some Linux systems
  • In Vitis, reshaping m_axi interfaces should be done via the hls::vector types
  • New customization options for s_axilite and m_axi data storage which can be "auto, "uram", "bram" or "lutram" allowing you to tweak RAM utilization in your design
  • In Vitis, introducing a new continuously (aka "never-ending") running mode for kernel
  • The axi_lite secondary clock option has been re-instated
  • Enhance support for RTL kernel packaging in Vivado IP packager
    • public and productized feature with proper methodology and documentation.
    • XRT managed kernel is the default flow.

  • Support encrypted AIE source files as input

    • AIE compiler can accept encrypted AIE source file and v++ supports the rest of the flow.

  • Add Create Boot Image Wizard support for Versal devices
  • Multiple improvements for AI Engine programming and debugging
    • Being able to turn on and off micro code labels
    • Static Cross-probing between the source code and the microcode
    • Full view of the microcode
    • Bringing the last PC in the visible area whenever Pipeline view updates the data
    • Aligning the Instruction data in Pipe line view
    • Adding "Single Instruction Mode" action to disassembly view.
  • Be able to generate a default BIF file for a platform project
  • Program Flash for SD and eMMC adds raw mode support
  • In-context help messages are added to AI Engine development flow
  • Upgraded GCC toolchain version to 10.2
  • Users can emulate AXI-MM master/slave through an external process such as Python / C++. This may help users to emulate design with quick design time of AXI Master / Slave, without investing resources in developing AXI Master or VIP. AXI-MM Inter-process communication can also help to emulate the Chip-to-Chip connection between two FPGAs.
  • Enabling compilation of Versal models for VCS.
  • Platform developers can run hardware emulation on the platform with standalone applications to test the platform in the early stage.
  • User range profiling information and user event information are aggregated into profile summary report
  • Vitis Analyzer shows a critical timing path.

    • Vitis Analyzer will display a simplified version of the Vivado GUI timing report, without the need to open a Vivado project or netlist. This allows users to quickly navigate to the failing timing path.

  • Vitis Analyzer multiple strategies support

    • Results from multiple strategies run can be visualized in Vitis Analyzer.

  • New xrt.ini switches for profiling and debug
  • Reduce memory and loading time for large applications

    • The new profile tool takes less resource for processing large csv file, which reduces the loading time and the crashing problem occurrence. 

  • PL continuous trace offloading improvement

    • Use DDR or HBM as memory resource to store trace data

    • Circular buffer support for large data offloading

    • Trace buffer size and offloading interval can be set in xrt.ini

  • Improvements to the visualization of AIE design’s trace report

    • All AIE inputs will be displayed(window, stream, cascaded stream, etc.) 

    • Support all IO data types

  • Stable native XRT API, with C++ APIs for AIE graph control and execution, Software Emulation and tracing support.
  • XRT provides new helper APIs to help users to move from OpenCL API to XRT native API in $XILINX_XRT/include/CL/cl2xrt.hpp.
  • XRT New API xrt::device.get_info() can extract device properties
  • Greatly improved next generation xbutil and xbmgmt utilities are now the default.
  • xbutil can report power status
  • xbmgmt can support runtime clk scale and setup user power threshold to protect board and server.
  • sysfs, xbmgmt and xbutil can report MAC address of Alveo board
  • KDS scheduler in xocl has been refactored to significantly improve the throughput across hundreds of processes exercising multiple compute units across multiple devices concurrently. For legacy shells you may notice small percentage of throughput degradation. Please see the AR for proper solution.
  • XRT driver debug trace support through debugfs /sys/kernel/debug/xclmgmt/ and /sys/kernel/debug/xocl/

Access the latest Vitis Target Platforms for Alveo Accelerator cards at www.xilinx.com/alveo. Please refer to the Getting Started section of the accelerator card you want to deploy your applications on.

Please refer to UG1120 - Alveo Data Center Accelerator Card Platforms User Guide for more details and to keep up-to-date on the latest Vitis Target Platform releases, as they become available.

New Platforms 

  • Alveo U200 Gen3x16 XDMA 1RP
    • Name: xilinx_u200_gen3x16_xdma_1_202110_1
    • Features: Slave Bridge, P2P, GT Kernel, DDR Self-Refresh
  • Alveo U50 Gen3x16 noDMA 1RP 
    • Name: xilinx_u50_gen3x16_nodma_1_202110_1
    • Features: Slave Bridge, P2P, GT Kernel, Clock Throttling
  • VCK190 Base Platform enables ECC on DDR and LPDDR; constraints become concise.
  • MPSoC base platforms increased CMA size to 1536M. All Vitis-AI models can run with this CMA size.
  • Embedded platform creation flow gets simplified: Device Tree Generator can automatically generate a ZOCL node; XSCT can generate BIF files. Base platform source files are reduced.
  • Support for Kubernetes(K8s) clusters: Xilinx FPGA Resource Manager (XRM) can now be used together with the Kubernetes to run and manage compute units (CUs) across a pool of multiple Alveo accelerator cards attached to a server and scale applications to multiple servers with Alveo cards.
  • A comprehensive constraint editor enables users to specify any constraint for AI Engine kernels in Vitis Model Composer. The generated ADF graph will contain these constraints. 
  • Addition of AI Engine FFT and IFFT blocks to the library browser. 
  • Users now have access to many variations of AI Engine FIR blocks in the library browser. 
  • Ability to specify filter coefficients using input ports for FIR filters. 
  • Addition of two new utility blocks "RTP Source" and "To Variable Size".
  • Enhanced AIE Kernel import block now also supports importing templatized AI Engine functions. 
  • Ability to specify Xilinx platforms for AI Engine designs in the Hub block.
  • Through the Hub block, users can relaunch Vitis Analyzer at any time after running AIE Simulation. 
  • Users can now plot cycle approximate outputs and see estimated throughput for each output using Simulink Data Inspector. 
  • Enhanced usability to import a graph as a block using only the graph header file. 
  • Revamping of the progress bar with cancel button
  • Usability improvement during importing an AI Engine kernel or simulation of a design when MATLAB working directory and model directory are not the same. 
  • New TX Chain 200MHz example. 
  • New 2d FFT examples showcasing designs with HLS, HDL, and AI Engine blocks. 
  • Simulation speed enhancement for SSR FIR (more than 10x improvement), and SSR FFT.
  • Simulation speed enhancement for memory blocks like RAMs, and FIFOs
  • Questa Simulator updated with VHDL 2008 in the Black-box import flow
  • Vitis Model Composer now contains the functionality of Xilinx System Generator for DSP.  Users who have been using Xilinx System Generator for DSP can continue development using Vitis Model Composer.
  • MATLAB Support - R2020a, R2020b & R2021a

 

2020.2

Vitis Software Platform 2020.2 Release Highlights:

  • Vitis 2020.2 supports application acceleration and embedded software development for Versal ACAP Platforms
  • Vitis Core Development Kit now includes the AI Engine Compiler to compile C/C++ applications for Versal AI Engines. AI Engine, part of Versal AI Core Series, is a vector processor for compute-intensive applications
  • Vitis HLS is default for both accelerated-kernel compilation (Vitis) and C/C++ to RTL IP creation flow (Vivado)
  • 600+ FPGA-accelerated functions across 13 performance-optimized libraries. 2020.2 introduces the new Vitis HPC library for accelerating high-performance computing applications and several enhancements & additions to the Data Analytics, Graph, BLAS, Sparse, Security & Database libraries
  • Support for evaluating multiple implementation strategies for final FPGA binary creation & enhancements for easier RTL-kernel integration within Vitis applications
  • Other enhancements this release include support for AI Engine application profiling, Git version control for Vitis projects, Vitis AI profiler data integration within Vitis Analyzer and enhancements for emulation modes. Learn More >
  • Add-on for MATLAB® and Simulink® : Unification of Xilinx Model Composer and System Generator for DSP. AI Engine is a new domain in Add-On for MATLAB and Simulink.
    Learn More >

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2020.2. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2020.2 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

Note: Vitis Accelerated Libraries are available as a separate download. They can be downloaded from GitHub or directly from within the Vitis IDE as well.

  • FPGA-accelerated library for HPC workloads. Initial release focuses on Seismic Imaging & Geophysics Simulation use-cases
    • Reverse Time Migration (RTM) – Seismic imaging technique for accurate representation of subsurface 
    • High-precision Multi-layer Perceptron (MLP) - Reconstruction of subsurface properties using seismic reflection data (Seismic Inversion)
  • Optimized for single precision floating point data types (FP32) which is a key requirement within HPC applications
  • Version 1 of the library offers the following:
    • L1 Stencil primitive, L1 MLP activation functions including Sigmoid, Relu, and Prelu
    • L2 2D RTM forward kernel, 2D RTM backward kernels, and 3D RTM forward kernel
    • L3 2D RTM APIs for supporting shot parallelism

New Functions and Features

  •  2020.2 ISP Pipeline example design supports pixel depths up to 16 bits
  • Local tone mapping
  • Auto Exposure Correction
  • Quantization & Dithering
  • Color Correction Matrix
  • Black Level Correction
  • Lens Shading correction
  • Brute Force Feature Matching
  • Mode Filter
  • blobFromImage
  • Laplacian Operator
  • Distance Transform

Library Infrastructure & Other Enhancements

  • All library functions support Alveo U50 platform
  • GUI support for both Edge and Data Center platforms
  • Color Conversion : Supporting RGBX or fourth channel support
  • Line Stride support in Data Converters
  • Removed xf_axi_sdata.hpp file. Axiconverter functions now use the HLS ap_axi_sdata.h file instead.

Ready-to-Evaluate Apps in New Xilinx App Store

The following FPGA-accelerated applications, developed using the Vitis Vision library, are now available on the new Xilinx App Store as containers for easy evaluation and deployment on Alveo accelerator cards on the Nimbix cloud or On-premise

  • Image Classification using ML-inference engine from Vitis AI Library and Vitis Vision Pre Processing Function : Learn More
  • Image Sensor Processing (ISP) Pipeline : Learn More
  • Stereo Block Matching : Learn More
  • Text Processing APIs. Two major APIs included - the regular expression match and geo-IP lookup. The former API can be used to extract content from unstructured data like logs, while the latter is often used in processing web logs, to annotate with geographic information by IP address. A demo tool that converts Apache HTTP server log in batch into JSON file is provided with the library.
  • DataFrame APIs for in-memory Data Abstraction: DataFrame is widely popular for in-memory data abstraction in data analytics domain, the DataFrame write and read APIs should enable data analytics kernel developers to store temporal data or interact with open-source software using Apache Arrow DataFrame more easily.
  • Tree Ensemble Method. Random forest is extended to include regression. Gradient boost tree, based on boosting method, is added to support both classification and regression. Support for XGBoost on classification and regression is also included to exploit 2nd order derivative of loss function and regularization.
  • Single-Source Shortest Path API (singleSourceShortestPath): 2020.2 version now supports the Alveo U50 platform and provides a new output ‘pred32’ for the shortest path information.
  • Page Rank APIs: 2020.2 version now supports Alveo U50 platform and including two APIs both named ‘pageRankTop’ - One to leverage a single memory channel and the other to utilize multi-bank memories.  
  • Similarity APIs: 3 new APIs to cover different applications: .‘denseSimilarityKernel’ is for dense graph applications, ‘sparseSimilarityKernel’ for Sparse graph applications and ‘generalSimilarityKernel’ for both types of applications with single kernel.
  • The following APIs now support Alveo U50 platform:
    • Breadth-First search bfs API (bfs)
    • Degree calculation API (calcuDegree)
    • Connected component API (connectedComponents)
    • Converting format from CSC to CSR API (convertCsrCsc)
    • Label propagation API (labelPropagation)
    • Strongly connected component API (stronglyConnectedComponents)
    • Triangle count API (triangleCount)
  • New L2 GEMM Kernel
  • For FP32 data types, the L3 GEMM performance has been improved from 280 GFLOPS to 340 GFLOPS
  • Introduced FP32 L2 CSCMV kernel (sparse matrix vector multiplication for CSC - Compressed Sparse Column - format matrices) that utilizes 16 HBM channel support on the Alveo U280 accelerator card.
  • The 2020.2 release brings a major enhancements and updates to the General Query Engine (GQE) kernel design, and brand-new Level 3 APIs for JOIN and GROUP-BY AGGREGATE.
    • Columns as Input Buffers: The GQE kernels treat each column as an input buffer, simplifying the data preparation in the host code. Additionally, allocating multiple buffers on host side will reduce out-of-memory issues compared to big contiguous memory allocations, especially when the server is under heavy load.
    • Command Classes for generating Configuration bits : The L2 layer now provides command classes to generate the configuration bits for GQE kernels. Developers no longer have to dive into the bitmap table to understand which bit(s) to toggle to enable or disable a function in GQE pipeline. Thus, the host code can be more sustainable and less error-prone.
    • New Level-3 APIs: New experimental L3 APIs for JOIN and GROUP-BY AGGREGATE are built to scale the problem size that GQE can handle. They can breakdown the tables into parts based on hash and call the GQE kernels multiple rounds in a well-schedule fashion. The strategy of execution is separated from execution, so database gurus can fine-tune the execution based on table statistics, without messing with the OpenCL execution part.
  • LIBZ Library Acceleration using Alveo U50             
    • Seamless acceleration of libz standard APIs : deflate, compress2 and uncompress
    • Ready-to-use libz.so library to accelerate any host code without any code change
    • xzlib standalone executable for both gzip/zlib compress & decompress
  • ZSTD Decompression : New implementation of Facebook ZSTD algorithm available
  • Snappy Dual Core Kernel : New implementation of Google snappy Dual Core decompression algorithm achieves 2x throughput improvement for single file decompress.
  • GZIP Compress Kernel: New GZIP Quad Core Compress Kernel (in-built , LZ77 , TreeGen, Huffman encoder) implementation available. More than 20% reduction in overall resources and 50% reduction in DDR bandwidth requirement.
  • GZIP Compress Streaming Kernel: Fully standard compliance GZIP(include header & footer) implementation available, streaming free running kernels.
  • GZIP/ZLIB L3 Application on Alveo U50: GZIP/ZLIB Application available as an L3 API , optimized for Alveo U50 (HBM) and Alveo U250 cards. Single FPGA binary (xclbin) supports both zlib & gzip format for compress and uncompress
  • Support for  to Alveo U50 : Library functions (LZ4, Snappy, GZIP, ZLIB) ported to support the Alveo U50 platform.
  • Low Latency GZIP/ZLIB Decompress : Initial decompression latency reduced from 5K to 2.5K for 4KB/8KB/16KB block sizes
  • APIs revised to fully support Vitis HLS compiler
  • New Signature Generation and Verification Algorithms: DSA, ECC, ECDSA(secp256k1) and EdDSA(ed25519)
  • New Checksum Algorithms: Adler32 and CRC32.
  • Verifiable delay function (VDF) evaluation and verification: Pietrzak's VDF and Wesolowski's VDF.
  • Commercial Cryptography constituted by CAS: SM2, SM3 and SM4.
  • Stream Cipher: XChacha20.
  • Optimization on RSA, GMAC, AES-GCM and SHA3 to improve their performance and resource utilization.
  • Argument parser (Beta): Parses the options and flags passed from command line and offers automatic help information generation enabling developers to create unified experience on test cases and user applications.
  • FIFO multiplexer: This module wraps around a FIFO (implemented through hls::stream in kernel code ) to enable passing data of different type through the same hardware resource. When the data is too wide, it will automatically be transferred using multiple cycles. This module is expected to make the dataflow code more compact and readable.

ADF: Adaptive Data Flow

  • Compiler:
    • Event tracing on PLIO or GMIO
    • Event tracing also on Hardware
    • Heat Map generation: %utilization of all AI Engines
    • Supports different PL frequencies for PL kernels and PLIOs
  • Vitis IDE for AI Engines
    • Pipeline view
    • Vector register view
    • Internal memory views East, West North, South
    • External memory
  • Vitis HLS replaces Vivado HLS in Vivado (it was already default for Vitis and C based kernel compilation in 2020.1)
    • Adds array reshape and partitioning pragmas for top function ports
  • The tool is now installed in its own directory ./Vitis_HLS/2020.2 alongside Vitis and Vivado
  • HLS design migration information has been updated in UG1391
  • Vitis HLS user guide is UG1399, the full content is also available in HTML
  • Updated design examples on GitHub, they can also be loaded automatically from the Vitis HLS GUI (from the "Git Repositories" sub-window) for direct access
  • Support for SIMD programming
  • Support for on-chip block RAM ECC flags via the bind_storage pragma (Vivado flow only) to monitor error correction logic generated by the RAM blocks
  • GUI has a simplified toolbar icon layout, new reporting sections for interfaces and AXI4 including bursts
  • Non-default options can be filtered for quick review in "Solution Settings"→"General" then "Show only non-defaults" tick mark
  • User can create and open a project in the GUI directly starting from Tcl using the -p option and passing the Tcl file as an argument: vitis_hls -p  <file>.tcl
  • Interactive FIFO depth sizing in GUI
  • Constrained random testing for AXI interfaces now visible in the GUI

Versal Only Features

  • Vitis HLS now infers the dedicated single clock cycle accumulation for floating point (adder or multiplier) of the DSP58 block to implement efficient high throughput accumulation
  • Timing libraries updated for Versal production target devices
  • Improved RTL-Kernel Integration:  Enhancements for packaging & integrating RTL IPs as kernels within Vitis applications, including support for user-managed RTL kernels (not controlled by XRT APIs) and improvements to IP Packager within Vivado to support this flow.
  • Multiple Implementation Strategies for Timing Closure: Vitis compiler & linker (v++) now supports launching & running multiple Vivado implementation strategies at the same time during hardware builds. This enables users to explore & assess all results and select the best strategy for final FPGA binary (xclbin) creation.

Versal Only Features

  • In 2020.2, as long as the hardware design stays the same, aiecompiler will only recompile and update to the software when AIE program is modified. The v++ linking stage is not re-run and it goes directly to the package step. This allows users to easily and quickly iterate on the AIE program after the HW has been fixed.
  • System Level template will be provided which includes AIE, PL and PS design files.
  • AIE tools features integrated into Vitis IDE, such as displaying pipeline information, storage view, parallel compilation etc.
  • Version Control for Vitis Projects: Integration with Git version control for Vitis Projects enables collaboration across multiple developers and teams.
  • Improvements to Project Hierarchy: Acceleration kernel and host applications are now separate projects under top-level System Project enabling a user to compile the host application and hardware kernels separately.
  • Improvements to Board Support Package (BSP) Build times: For platform projects with standalone domains, the Board Support Package (BSP) drivers compiles in parallel to speed up application build time.
  • Ease-of-Use for Host Application Debug: Processing System registers can be now be exported as a file from the Vitis GUI for debug.
  • Profiling System Projects: Top-level System Projects now offers more control over specifying profiling features via the Vitis GUI for the Vitis application acceleration flows.
  • Improved Support for Platform Creation with Hardware Emulation: In addition to the Block Diagram as the top-level, the Hardware emulation mode now also supports RTL sources in the platform  as the top-module or reference RTL inside block diagram without packaging. You can add RTL testbench as in Vivado. It offers more flexibility for validating designs before deployment.
  • Save Signals during Emulation for Debug: Save signals to Xilinx Simulator (XSIM) waveform file during emulation. User can pass -wcfg-file-path to launch_hw_emu.sh when rerunning hardware emulation.
  • Emulation Support for Slave Bridge Feature (Alveo Platforms) : Please refer to the Alveo Platform Documentation for more details on Slave Bridge features.
  • Python/C++ APIs for emulating AXI Stream IOs : Mimic data streaming through IO ports on platform using simple Python or C++ APIs while emulating AXI Stream kernels enabling you to emulate and debug complete system with programmed traffic patterns much earlier in the design cycle
  • Questa Simulator support for U250 Alveo Platform: In addition to the Xilinx Simulator (XSIM), hardware emulation in Vitis for U250 Alveo platforms now also supports Questa. Setup is done via V++ configuration files or Vitis IDE.
  • HLS kernel deadlock detection: Deadlock or livelock code in HLS kernel can be detected during hardware emulation by compiling HLS kernel with v++ config param=compiler.deadlockDetection=true

Versal Only Features

  • 3rd party simulator support ( Questa, Xcelium, VCS) : In addition to the Xilinx Simulator (XSIM), hardware emulation in Vitis for Versal embedded platforms now also supports 3rd party simulators like Questa and Xcelium on Linux. VCS is supported in Early Access stage. Setup is done via V++ configuration files or Vitis IDE.
  • Vitis AI Profiler Data Integration: For applications that use the Deep Learning Processing Unit (DPU) for AI inference, you can access Vitis AI profiler information including DPU throughput, DDR read/write rates and timeline trace information within Vitis Analyzer to assess end-to-end application acceleration. 
  • View Package Summary Report: View the Package Summary Report within Vitis Analyzer for an overall view of application’s status from a performance and optimization perspective.  The package summary is created by v++ command after linking to build a package that can be run for software or hardware emulation or can be booted and run on the hardware device.
  • Integrated Host & Kernel Profiling: Vitis 2020.2 adds the capability to provide user event API profiling. Beyond the profiling capabilities inherently available for accelerated kernels, you can call Xilinx Runtime Library (XRT) APIs in your host code to profile arbitrary sections of the design and make decisions on overall application performance optimization.
  • Other Enhancements: Global Search across all reports accessible within Vitis Analyzer, flexibility to save/restore custom user layouts for viewing performance reports, Intuitive grouping of guidance messages to view related information in one place, Improvements to utilization reports enabling visibility into statistics on a per Super Logic Region (SLR) basis for deeper insight.

Versal Only Features

  • Profile summary report will have specific AIE design entry. More AIE related data will be shown in the compile/run summary reports, such as AIE heatmap which displays the kernel active/stall cycles running on HW.
  • Improved Visibility for Debug:  AXI-S Transaction-level view available in the Xilinx Simulator (XSIM) Transaction Viewer for System-C portions of hardware emulation designs, providing better visibility into the design at a transaction level for debug.
  • View FIFO Status in Live Waveform Viewer: Status of user-level FIFOs (denoted as hls:streams in kernel code) can be viewed in Live Waveform Viewer during Hardware Emulation, providing visibility into static FIFO depths, FIFO elements and  FIFO usage to identify performance bottlenecks for acceleration kernels

Versal Only Features

  • Event trace enhancements: Vitis 2020.2 incorporate a couple of enhancements on AIE event trace features, such as support for offloading by XRT, multiple trace stream flow enhancement support and the ability to monitor PL/AIE boundary even PL kernel is defined in the graph. Meanwhile, the PL/PS/AIE event trace are combined into a common timeline to provide better visualization of the whole design.

Note: Xilinx Runtime Library (XRT) is available as a separate download. Please refer to the Getting Started information for download and install instructions.

  • Improved Support for HBM-enabled Platforms:  Leverage the benefits of high-bandwidth memory (HBM) enabled platforms by specifying kernel port connections to HBM banks through v++ --sp HBM[#:#] Xilinx Runtime Library (XRT) APIs can also automatically assign the HBM banks and enable the host application to allocate arbitrary sized buffers of one or more HBM segments (256MB+) (on HBM segment bounds).
  • Next Generation Xilinx Board Management Utilities (Preview): Next generation Xilinx Board Management utilities (xbutil, xbmgmt) are available for preview. They can enable the Slave Bridge and DDR retention features for Xilinx platforms that support them. Note: Current generation of board management utilities will be moved to maintenance mode in 2021.1 & new features will only be added to next generation utilities.

Versal Only Features

  • AIE support is added to support RTP, error handling, full array reconfiguration and graph API.

Access the latest Vitis Target Platforms for Alveo Accelerator cards from the Alveo Packages Download Tab

Please refer to UG1120 - Alveo Data Center Accelerator Card Platforms User Guide for more details and to keep up-to-date on the latest Vitis Target Platform releases, as they become available

U200/U250 XDMA Platforms

  • Alveo Platform U200 XDMA 2RP - Production
    • Features: ERT, CMC, PLRAM, DRM capable floorplan, XDMA, 2RP, P2P, M2M, GT Kernel, PCIe Slave Bridge, DDR Self-Refresh
  • Alveo Platform U250 XDMA 2RP - Production
    • Features: ERT, CMC, PLRAM, DRM capable floorplan, XDMA, 2RP, P2P, M2M, GT Kernel, PCIe Slave Bridge, DDR Self-Refresh

Shell Upgrade DFX - 2RP ( 2 Reconfiguration Partitions)

  • Small size of static region: Base 
    • PCIe functionality
    • In-band FPGA partial reconfiguration 
  • New reconfiguration partition: Shell
    • Update DMA and utility functions
    • Dynamic swapping between platforms without rebooting the server
  • 2nd reconfiguration partition: User Logic
    • Accelerator kernel functions

AXI Slave Bridge

  • Direct host memory access by the kernel 
  • DMA bypass capability, with AXI-Slave 512-bit interface and user can provide their own data mover

Data Retention - DDR4 self-refresh

  • Data context retained in FPGA memory using DDR4 self-fresh during reconfiguration
  • Eliminates copying to host RAM as a temporary storage for different XCLBINs
  • Minimizes movement of large data sets

Note: Vitis Target Platforms for Embedded Platforms (including pre-built linux kernels, root file system and sysroot) are available as a separate download on Vitis Embedded Platforms Tab

  • ZYNQ-7000 and ZYNQ UltraScale+ MPSoC base platform functions are kept the same but platform source code has been re-structured. Directories are renamed for easy understanding; common source files across multiple platforms are grouped together. It would be easier to reuse the platform source code and port it to a new platform.
  • When building platform from source code, besides compiling PetaLinux from scratch, a new end-to-end compiling method is added if user uses downloaded common software components. User can point to those components and skip PetaLinux compiling when building a platform.

The VCK190 platform has flexible DDR + LPDDR memory subsystem and supports 63 interrupts for acceleration kernels. It is available for use with the Vitis core development kit, for both application acceleration and embedded processor software development, as described in Versal AI Engine Programmers Guide (UG1076). The platform enables development of designs that include:

  • AI Engine graphs and kernels
  • Programmable Logic kernels
  • Host application targeting the Linux or a bare metal OS running on the Arm processor in the Versal device.
  • Please refer to Getting Started with Vitis and Versal ACAP platforms to learn more.
  • Support for Kubernetes(K8s) clusters: Xilinx FPGA Resource Manager (XRM) can now be used together with the Kubernetes to run and manage compute units (CUs) across a pool of multiple Alveo accelerator cards attached to a server and scale applications to multiple servers with Alveo cards.
2020.1