Vitis AI 3.0

Vitis™ AI Platform 3.0 Release Highlights

  • AI Engine-ML enablement - early access on Alveo™ V70 data center accelerator card and Versal™ AI Edge series VEK280 evaluation kit
  • Improved custom model deployment with ONNX Runtime
  • Improved WeGO ease-of-use with quantizer integration and model coverage 

Vitis AI Platform - What’s New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis AI platform 3.0. For more information on the supported models, quantizer, compiler, or the DPU IPs, please check the GitHub repository or email:

  • Added 14 new models and deprecated 28 models for a total of 130 models
  • Optimized models for applications:
    • For AI medical and image enhancement: Super Resolution 4X, 2D/3D Semantic Segmentation
  • Optimized models for benchmarks:
    • MLPerf: 3D-Unet
    • FAMBench: MaskRCNN
  • Optimized backbones:
    • Provides YOLO variants (YoloX, v4, v5, v6), EfficientNet-Lite
  • Ease-of-use enhancements: Data on Github.IO that helps to improve the user experience
  • Better support for UIF and improved UIF compatibility
  • 72 PyTorch/TensorFlow models for CPUs with ZenDNN
  • Added GPU models for AMD GPUs based on ROCm+MIGraphX
  • Support for TensorFlow 2.10
  • Updated the Vitis Inspector to show more accurate partition results from XCompiler for various DPU architectures
  • Added support for data type conversions for float models, including FP16, BFloat16, FP32, and double
  • Added support for exporting the ONNX format of a quantized model
  • Support for more layers: SeparableConv2D and PReLU
  • Added support for unsigned integer quantization
  • Added support for automatic modification of input shapes for models with variable input shapes
  • Added support for aligning input and output quantize positions for the Concat and Pooling layers
  • Added error codes and improved the readability of error and warning messages
  • Some bugs fixed
  • Separated the quantizer codes from the TensorFlow codes, making it a plug-in module to the official TensorFlow library
  • Added support for exporting the ONNX format of a quantized model
  • Added support for data type conversions for float models, including FP16, BFloat16, FP32 and double
  • Support for more operations, including Max, Transpose, and DepthToSpace
  • Added support for aligning input and output quantize positions of Concat and Pooling operations
  • Added support for automatic replacement of Softmax to Hard-Softmax operations
  • Added error codes and improved the readability of error and warning messages
  • Some bugs fixed
  • Support for PyTorch 1.11 and 1.12
  • Support for exporting torch script format quantized models
  • QAT support for exporting trained models to ONNX and torch scripts
  • Support for FP16 model quantization
  • Optimized Inspector to support more pattern types and backwards compatibility of device assignments
  • More PyTorch operators: More than 560 types of PyTorch operators are supported
  • Enhanced parsing to support control flow parsing
  • Enhanced message system with more useful message text
  • Support for fusing and quantization of BatchNorm without affine calculation
  • Support for Keras layers of ConvTranspose2D, Conv3D, ConvTranspose3D
  • Support for TFOpLambda operations
  • Added pruning config that allows users to specify pruning hyper-parameters
  • Specific exception types are defined for each type of error
  • Support for TensorFlow 2.10
  • Added fine-grained model pruning: Sparsity
  • OFA support for convolution layers with kernel=(1,3) and dilatation
  • OFA support for ConvTranspose2D
  • Added pruning config that allows users to specify pruning hyper-parameters
  • Specific exception types are defined for each type of error
  • Enhanced parallel model analysis: More robust
  • Support for PyTorch 1.11 and 1.12
  • Support for new operators: strided_slice, cost volume, correlation 1d&2d, argmax, group conv2d, reduction_max, reduction_mean
  • Support for new hardware platform: DPUCV2DX8G
  • Error messages improvement
  • Partition messages improvement
  • Support for Versal AI Edge series VEK280 evaluation kit and Alveo V70 accelerator card
  • Support for ONNX Runtime with 11 new examples provided
  • Support for 15 new models
  • Added 4 new model libraries
  • Error messages improvement
  • Support for new DPUCV2DX8G DPU IP
  • Memory bandwidth profiling solution for Versal platforms
  • Upgraded to 2022.2
  • New features:
    • Support for 1D and 2D Correlation
    • Support for Argmax and Max
  • Optimized resources and timing
  • Upgraded to 2022.2
  • New features:
    • Support for 1D and 2D Correlation
    • Support for Cost-Volume
    • Support for Argmax and Max along channel dimensions
  • Optimized resources and timing
  • Early Access release
  • Support for the most common 2D operators
  • Support for Batch 1 to 13
  • Support for 90+ CNN models
  • Updated Vitis tool from 2021.2 to 2022.2
  • Added scripts to improve timing in released XO flow
  • Early Access release
  • Support for the most common 2D operators
  • Support for Batch 13
  • Support for 70+ CNN models
  • Integrated WeGO with the quantizer for on-the-fly quantization and improved ease of use
  • Introduced serialization and deserialization over the WeGO flow to offer the capability of building once and running anytime
  • Incorporated AMD ZenDNN into WeGO to bring additional optimization opportunities on AMD EPYC CPUs
  • Improved WeGO robustness to offer solid deployment experience for more models
Vitis AI 2.5

Vitis™ AI 2.5 Release Highlights

  • AI Model Zoo added
    • 14 new models, including Bidirectional Encoder Representations from Transformers (BERT)-based Natural Language Processing (NLP), Vision Transformer (ViT), Optical Character Recognition (OCR), Simultaneous Localization and Mapping (SLAM), and more Once-for-All (OFA) models
    • 38 base and optimized models for AMD EPYC™ server processors
  • AI Quantizer added model inspector, now supports TensorFlow 2.8 and Pytorch 1.10
  • Whole Graph Optimizer (WeGO) supports Pytorch 1.x and TensorFlow 2.x
  • Deep-learning Processor Unit (DPU) for Versal™ ACAP supports multiple Compute Units (CUs), new Arithmetic Logic Unit (ALU) engine, Depthwise convolution and more operators (OPs) supported by the DPUs on VCK5000 Versal development card and Alveo™ data center accelerator cards
  • Inference server supports AMD ZenDNN as backend on AMD EPYC™ server processors
  • New examples added to Whole Application Acceleration (WAA) for VCK5000 card and Zynq™ UltraScale+™ ZCU102/ZCU104 evaluation kits

Vitis AI 2.5 What’s New by Category

Expand the sections below to learn more about the new features and enhancements.

  • 14 new models and total 134 models available
  • Expanded model categories for diverse AI workloads:
    • Added CNN models for text detection and E2E OCR
    • Added BERT-base NLP model and ViT
    • Added more OFA-optimized models, including super-resolution OFA-RCAN and object detection OFA-YOLO
    • Added models for industrial vision and SLAM, including interest point detection and description model and hierarchical localization model
  • Added 38 base and optimized models for AMD EPYC server processors
  • Ease of use enhancement:
    • Improved model index by application categories
  • Added model inspector that inspects a float model and shows partition results
  • TensorFlow 2.8 and Pytorch 1.10 upport
  • Float-scale and per-channel quantization support
  • Configuration support for different quantize strategies
  • OFA enhancement
    • Even kernel size of convolution support
    • ConvTranspose2d support
    • Updated examples
  • One-step and iterative pruning enhancement
    • Resumed model analysis or search after exception
  • ALU for DPUCZDX8G support
  • New models added in this release
  • Added 6 new model libraries
  • Supports 17 new models
  • Custom OP enhancement
  • Added new CPU operators
  • Xdputil tool enhancement
  • Two new demos on the VCK190 kit
  • Full support on custom OP and graph runner
  • Stability optimization


  • New ALU engine that replaced pool engine and Depthwise convolution engine in MISC. The ALU engine supports
    • New features such as large-kernel-size MaxPool, AveragePool, rectangle-kernel-size AveragePool, and 16bit const weights
    • HardSigmoid and HardSwish
    • DepthWiseConv + LeakyReLU
    • Parallelism configuration
  • New DPU IP and targeted reference design (TRD) on the ZCU102 kit with encrypted RTL IP on Vitis 2022.1 platform


  • Optimized ALU that better supports features like channel-attention
  • Multiple Cus support
  • DepthWiseConv + LeakyReLU support
  • New DPU IP for Versal ACAP and TRD on the VCK190 kit with encrypted RTL and AI Engine code, which still support C32B1-6/C64B1-5 based on Vitis 2022.1 platform


  • Enlarged Depthwise convolution kernel size range from 1x1 to 8x8
  • AI Engine-based pooling, and elementwise add and multiply, and big kernel size pooling
  • More Depthwise convolution kernel sizes


  • ReLU6/LeakyReLU and MobileNet series of models support
  • Fixed issue of missing directories in some cases in the .XO flow
  • PyTorch 1.x and TensorFlow 2.x in-framework inference support
  • Added 19 PyTorch 1.x/TensorFlow 2.x/TensorFlow 1.x examples, including classification, object detection, and segmentation
  • Added gRPC API to inference server flow
  • TensorFlow/PyTorch with AMD ZenDNN as backend support
  • New examples for the VCK5000 card and ZCU104 kit - ResNet & adas_detection applications
  • New ResNet example containing AI Engine-based pre-processing kernel 
  • Xclbin generation using pre-built DPU flow for the Alveo U50 card and ZCU102 kit - ResNet and adas_detection applications
  • Xclbin generation using build flow for the ZCU104 and VCK190 kit - ResNet and adas_detection applications
  • Porting of all VCK190 examples to production board with use base platform
Vitis AI 2.0

Vitis AI 2.0 Release Highlights

  • General Availability (GA) for VCK190 (Production Silicon), VCK5000 (Production Silicon) and U55C
  • Added support for newer Pytorch and Tensorflow version: Pytorch 1.8-1.9, Tensorflow 2.4-2.6
  • 22 additional new models, including Solo, Yolo-X, UltraFast, CLOCs, PSMNet, FairMOT, SESR, DRUNet, SSR as well as 3 NLP models and 2 OFA (Once-for-all) models
  • Added new custom OP flow to run models with DPU un-supported OPs with enhancement across quantizer, compiler and runtime
  • Additional layers and configurations of DPU for VCK190 and DPU for VCK5000
  • Added OFA pruning and TF2 keras support for AI optimizer
  • Run inference directly from Tensorflow (Demo) for cloud DPU

Vitis AI 2.0 What’s New by Category

Expand the sections below to learn more about the new features and enhancements.

  • 22 new models added, 130 total
    • 19 new Pytorch models including 3 NLP and 2 OFA models
    • 3 new Tensorflow models
  • Added new application models
    • AD/ADAS: Solo for instance segmentation, Yolo-X for traffic sign detection, UltraFast for lane detection, CLOCs for sensor fusion
    • Medical: SESR for super resolution, DRUNet for image denoise, SSR for spectral remove
    • Smart city and industrial vision: PSMNet for binocular depth estimation, FairMOT for joint detection and Re-ID
  • EoU Enhancements
    • Updated automatic script to search and download required models
  • TF2 quantizer
    • Add support TF 2.4-2.6
    • Add support for custom OP flow, including shape inference, quantization and dumping
    • Add support for CUDA 11
    • Add support for input_shape assignment when deploying QAT models
    • Improve support for TFOpLambda layers
    • Update support for hardware simulation, including sigmoid layer, leaky_relu layer, global and non-global average pooling layer
    • Bugfixs for sequential models and quantize position adjustment
  • TF1 quantizer
    • Add quantization support for new ops, including hard-sigmoid, hard-swish, element-wise multiply ops
    • Add support for replacing normal sigmoid with hard sigmoid
    • Update support for float weights dumping when dumping golden results
    • Bugfixs for inconsistency of python APIs and cli APIs
  • Pytorch quantizer
    • Add support for pytorch 1.8 and 1.9
    • Support CUDA 11
    • Support custom OP flow
    • Improve fast finetune performance on memory consumption and accuracy
    • Reduce memory consumption by feature map among quantization
    • Improve QAT functions including better initialization of quantization scale and new API for getting quantizer’s parameters
    • Support more quantization of operations: some 1D and 3D ops, DepthwiseConvTranspose2D, pixel-shuffle, pixel-unshuffle, const
    • Support CONV/BN merging in pattern of CONV+CONCAT+BN
    • Some message enhancement to help user locate problem
    • Bugfixs about consistency with hardware
  • TensorFlow 1.15
    • Support tf.keras.Optimizer for model training
  • TensorFlow 2.x
    • Support TensorFlow 2.3-2.6
    • Add iterative pruning
  • PyTorch
    • Support PyTorch 1.4-1.9.1
    • Support shared parameters in pruning
    • Add one-step pruning
    • Add once-for-all(OFA)
    • Unified APIs for iterative and one-step pruning
    • Enable pruned model to be used by quantizer
    • Support nn.Conv3d and nn.ConvTranspose3d
  • DPU on embedded platforms
    • Support and optimize conv3d, transposedconv3d, upsample3d and upsample2d for DPUCVDX8G(xvDPU)
    • Improve the efficiency of high resolution input for DPUCVDX8G(xvDPU)
    • Support ALUv2 new features
  • DPU on Alveo/Cloud
    • Support depthwise-conv2d, h-sigmoid and h-swish for DPUCVDX8H(DPUv4E)
    • Support depthwise-conv2d for DPUCAHX8H(DPUv3E)
    • Support high resolution model inference
  • Support custom OP flow
  • Support all the new models in Model Zoo: end-to-end deployment in Vitis AI Library
  • Improved GraphRunner to better support custom OP flow
  • Add examples on how to integrate custom OPs
  • Add more pre-implemented CPU OPs
  • DPU driver/runtime update to support AMD Device Tree Generator (DTG) for Vivado flow
  • Support CPU tasks tracking in graph runner
  • Better memory bandwidth analysis in text summary
  • Better performance to enable the analysis of large models
  • CNN DPU for Zynq SoC / MPSoC, DPUCZDX8G (DPUv2)
    • Upgraded to 2021.2
    • Update interrupt connection in Vivado flow
  • CNN DPU for Alveo-HBM, DPUCAHX8H (DPUv3E)
    • Support depth-wise convolution
    • Support U55C
  • CNN DPU for Alveo-DDR, DPUCADF8H (DPUv3Int8)
    • Updated U200/U250 xlcbins with XRT 2021.2
    • Released XO Flow
    • Released IP Product Guide (PG400)
  • CNN DPU for Versal, DPUCVDX8G (xvDPU)
    • C32 (32-aie cores for a single batch) and C64 (64-aie cores for a single batch) configurable
    • Support configurable batch size 1~5 for C64
    • Support and optimize new OPs: conv3d, transposedconv3d, upsample3d and upsample2d
    • Reduce Conv bubbles and compute redundancy
    • Support 16-bit const weights in ALUv2
  • CNN DPU for Versal, DPUCVDX8H (DPUv4E)
    • Support depth-wise convolution with 6 PE configuration
    • Support h-sigmoid and h-swish
  • Upgrade to Vitis and Vivado 2021.2
  • Custom plugin example: PSMNet using Cost Volume (RTL Based) accelerator on VCK190
  • New accelerator for Optical Flow (TV-L1) on U50
  • High resolution segmentation application on VCK190
  • Options to compare throughput & accuracy between FPGA and CPU Versions
    • Throughput improvements ranging from 25% to 368%
  • Reorganized for better usability and visibility
  • Provides new capability of deploying models with DPU unsupported OPs
    • Define custom OPs in quantization
    • Register and implement custom OPs before the deployment by graph runner
  • Add two examples
    • Pointpillars Pytorch model
    • MNIST Tensorflow 2 model
  • Add support of DPUs for U50 and U55C
  • Run inference directly from Tensorflow framework for cloud DPU
    • Automatically perform subgraph partitioning and apply optimization/acceleration for DPU subgraphs
    • Dispatch non-DPU subgraphs to TensorFlow running on CPU
  • Resnet50 and Yolov3 demos on VCK5000
  • Support xmodel serving in cloud / on-premise (EA)
  • vai_q_caffe hangs when TRAIN and TEST phases point to the same LMDB file
  • TVM compiled Inception_v3 model gives low accuracy with DPUCADF8H (DPUv3Int8)
  • TensorFlow 1.15 quantizer error in QAT caused by an incorrect pattern match
Vitis AI 1.4

Vitis AI 1.4 Release Highlights

  • Support new platforms, including Kria KV260 SoM kit and Versal ACAP platforms VCK190, VCK5000; 
  • Extended Pytorch framework support from version 1.5 to version 1.7.1;
  • Added new state-of-the-art models, including 4D Radar detection, Image-Lidar sensor fusion, 3D detection & segmentation, multi-task, depth estimation, super resolution and more models that applicable to automotive, smart medical, industrial vision applications;
  • Easier subgraph partition user experience with the new Graph Runner API;
  • Improved performance;

Vitis AI 1.4 What’s New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis AI 1.4.

  1. Added 16 new models, and total 108 models from different deep learning frameworks (Caffe, TensorFlow, TensorFlow 2 and PyTorch) are provided.
  2. Increased the diversity of models compared to Vitis AI 1.3:
    1. For autonomous driving and ADAS, added 4D Radar detection, Image-Lidar sensor fusion, surround-view 3D detection, upgraded 3D segmentation and multi-task models
    2. For medical and industrial vision, added depth estimation, RGB-D segmentation, super-resolution and other reference models
  3. EoU enhancement: provided automated download scripts for free selection of the versions according to model name and hardware platform
  1. Support fast finetune in post-training quantization (PTQ);
  2. Improved quantize-aware training(QAT) functions:
  3. Support more layers: swish/sigmoid, hard-swish, hard-sigmoid, LeakyRelu, nested tf.keras functional and sequential models
  4. Support more layers:
    1. swish/sigmoid, hard-swish, hard-sigmoid, LeakyRelu
    2. Nested tf.keras functional and sequential models
  5. Support new models: EfficientNet, EfficientNetLite, Mobilenetv3, Yolov3 and Tiny Yolov3
  6. Support custom layers via subclassing tf.keras.layers and support custom quantization strategies
  7. Support custom layers and support custom quantization strategies
  8. Improved ease-of-use and bug fixed


  1. Support Pytorch 1.5-1.7.1
  2. Support activations
    1. hard-swish, hard-sigmoid
  3. Support more operators:  
    1. Const, Upsample, etc.
  4. Support shared parameters in quantization
  5. Enhanced quantization profiling and error check functions
  6. Improved QAT functions:
    1. support training from PTQ results
    2. support reused modules
    3. support resuming training
  1. Support tf.keras APIs in TF1
  2. Supports single GPU mode for model analysis 
  1. Improved easy-of-use with simplified APIs;
  2. Support torch.nn.ConvTranspose2d;
  3. Support reused modules;
  1. Support ALU for DPUCVDX8G (xvDPU)
  2. Support cross-layer prefetch optimization option
  3. Support xmodel output nodes assignment
  4. Enabled features to implement zero-copy for:
    1. DPUCZDX8G (DPUv2)
    2. DPUCAHX8H (DPUv3E)
    3. DPUCAHX8L (DPUv3ME)
  5. Open-sourced network visualization tool Netron officially supports AMD XIR 
  1. Support the 16 new models in AI Model Zoo:
    1. 11 new Pytorch models
    2. 5 new Tensorflow models, 1 from Tensorflow 2.x
    3. 1 new Caffe models
  2. Introduced new deploy APIs graph_runner, especially for models with multiple subgraphs
  3. Introduced new tool xdputil for DPU and xmodel debug
  4. Support new KV260 SoM kit
  5. Support DPUCVDX8G (xvDPU) on VCK190
  6. DPUCVDX8H (DPUv4E) on VCK5000
  1. Support Versal platforms VCK190 and VCK5000
  2. Support Petalinux 2021.1, OpenCV v4 
    1. EoU improved by updating the samples to use INT8 as input, reduced the conversion from FP32 to INT8;
  1. Support new DPU IPs:
    1. DPUCVDX8G (xvDPU)
    2. DPUCAHX8L (DPUv3ME)
    3. DPUCVDX8H (DPUv4E)
  2. Support DPUv2 & xvDPU in vivado flow
  3. Memory IO statistics
  4. EoUs improved 
  1. DPUv2 IP upgraded to 2021.1
  1. VCK190 xvDPU TRD
  2. Support batch size 1~6 which is configurable based on C32 mode
  3. PL support new OPs:
    1. Global Average Pooling up to 256x256, Element Multiply, Hardsigmoid and Hardswish
  4. More models deployed 
  1. Release xo in Vitis AI 1.4
  1. Support latest U250 platform (2020.2) 
  2. Support latest U200 platform (2021.1)
  3. Bug fixed
  1. Improved the DPU performance of small networks processing with weight pre-fetch function
  1. Multi Object Tracking (SORT) example on ZCU102 provided
  2. Classification App example for Versal (VCK190) provided
  3. Updated existing examples to XRT APIs and zero copy
  4. U200 (DPUv3INT8) TRD provided
  5. Ported U200/250 examples to use DPUv3INT8 instead of DPUv1
  6. Example for xRNN pre-processing acceleration (embedding layer)
  7. SSD MobileNet U280 example now accelerates both pre and post-processing on hardware
  1. Support of all DPUs - ZCU102/4, U50, U200, U250, U280
  2. Using Petalinux for edge devices
  3. Increased throughput using AKS at the application level
  4. Yolov3 tutorial as python notebook
  1. Unified DPU kernels into one and added samples for Alveo U200/250 (DPUv3INT8), U280, U50, U50lv