General Availability (GA) for VCK190 (Production Silicon), VCK5000 (Production Silicon) and U55C
Added support for newer Pytorch and Tensorflow version: Pytorch 1.8-1.9, Tensorflow 2.4-2.6
22 additional new models, including Solo, Yolo-X, UltraFast, CLOCs, PSMNet, FairMOT, SESR, DRUNet, SSR as well as 3 NLP models and 2 OFA (Once-for-all) models
Added new custom OP flow to run models with DPU un-supported OPs with enhancement across quantizer, compiler and runtime
Additional layers and configurations of DPU for VCK190 and DPU for VCK5000
Added OFA pruning and TF2 keras support for AI optimizer
Run inference directly from Tensorflow (Demo) for cloud DPU
Vitis AI 2.0 What’s New by Category
Expand the sections below to learn more about the new features and enhancements.
22 new models added, 130 total
19 new Pytorch models including 3 NLP and 2 OFA models
3 new Tensorflow models
Added new application models
AD/ADAS: Solo for instance segmentation, Yolo-X for traffic sign detection, UltraFast for lane detection, CLOCs for sensor fusion
Medical: SESR for super resolution, DRUNet for image denoise, SSR for spectral remove
Smart city and industrial vision: PSMNet for binocular depth estimation, FairMOT for joint detection and Re-ID
EoU Enhancements
Updated automatic script to search and download required models
TF2 quantizer
Add support TF 2.4-2.6
Add support for custom OP flow, including shape inference, quantization and dumping
Add support for CUDA 11
Add support for input_shape assignment when deploying QAT models
Improve support for TFOpLambda layers
Update support for hardware simulation, including sigmoid layer, leaky_relu layer, global and non-global average pooling layer
Bugfixs for sequential models and quantize position adjustment
TF1 quantizer
Add quantization support for new ops, including hard-sigmoid, hard-swish, element-wise multiply ops
Add support for replacing normal sigmoid with hard sigmoid
Update support for float weights dumping when dumping golden results
Bugfixs for inconsistency of python APIs and cli APIs
Pytorch quantizer
Add support for pytorch 1.8 and 1.9
Support CUDA 11
Support custom OP flow
Improve fast finetune performance on memory consumption and accuracy
Reduce memory consumption by feature map among quantization
Improve QAT functions including better initialization of quantization scale and new API for getting quantizer’s parameters
Support more quantization of operations: some 1D and 3D ops, DepthwiseConvTranspose2D, pixel-shuffle, pixel-unshuffle, const
Support CONV/BN merging in pattern of CONV+CONCAT+BN
Some message enhancement to help user locate problem
Bugfixs about consistency with hardware
TensorFlow 1.15
Support tf.keras.Optimizer for model training
TensorFlow 2.x
Support TensorFlow 2.3-2.6
Add iterative pruning
PyTorch
Support PyTorch 1.4-1.9.1
Support shared parameters in pruning
Add one-step pruning
Add once-for-all(OFA)
Unified APIs for iterative and one-step pruning
Enable pruned model to be used by quantizer
Support nn.Conv3d and nn.ConvTranspose3d
DPU on embedded platforms
Support and optimize conv3d, transposedconv3d, upsample3d and upsample2d for DPUCVDX8G(xvDPU)
Improve the efficiency of high resolution input for DPUCVDX8G(xvDPU)
Support ALUv2 new features
DPU on Alveo/Cloud
Support depthwise-conv2d, h-sigmoid and h-swish for DPUCVDX8H(DPUv4E)
Support depthwise-conv2d for DPUCAHX8H(DPUv3E)
Support high resolution model inference
Support custom OP flow
Support all the new models in Model Zoo: end-to-end deployment in Vitis AI Library
Improved GraphRunner to better support custom OP flow
Add examples on how to integrate custom OPs
Add more pre-implemented CPU OPs
DPU driver/runtime update to support Xilinx Device Tree Generator (DTG) for Vivado flow
Support CPU tasks tracking in graph runner
Better memory bandwidth analysis in text summary
Better performance to enable the analysis of large models
CNN DPU for Zynq SoC / MPSoC, DPUCZDX8G (DPUv2)
Upgraded to 2021.2
Update interrupt connection in Vivado flow
CNN DPU for Alveo-HBM, DPUCAHX8H (DPUv3E)
Support depth-wise convolution
Support U55C
CNN DPU for Alveo-DDR, DPUCADF8H (DPUv3Int8)
Updated U200/U250 xlcbins with XRT 2021.2
Released XO Flow
Released IP Product Guide (PG400)
CNN DPU for Versal, DPUCVDX8G (xvDPU)
C32 (32-aie cores for a single batch) and C64 (64-aie cores for a single batch) configurable
Support configurable batch size 1~5 for C64
Support and optimize new OPs: conv3d, transposedconv3d, upsample3d and upsample2d
Reduce Conv bubbles and compute redundancy
Support 16-bit const weights in ALUv2
CNN DPU for Versal, DPUCVDX8H (DPUv4E)
Support depth-wise convolution with 6 PE configuration
Support h-sigmoid and h-swish
Upgrade to Vitis and Vivado 2021.2
Custom plugin example: PSMNet using Cost Volume (RTL Based) accelerator on VCK190
New accelerator for Optical Flow (TV-L1) on U50
High resolution segmentation application on VCK190
Options to compare throughput & accuracy between FPGA and CPU Versions
Throughput improvements ranging from 25% to 368%
Reorganized for better usability and visibility
Provides new capability of deploying models with DPU unsupported OPs
Define custom OPs in quantization
Register and implement custom OPs before the deployment by graph runner
Add two examples
Pointpillars Pytorch model
MNIST Tensorflow 2 model
Add support of DPUs for U50 and U55C
Run inference directly from Tensorflow framework for cloud DPU
Automatically perform subgraph partitioning and apply optimization/acceleration for DPU subgraphs
Dispatch non-DPU subgraphs to TensorFlow running on CPU
Resnet50 and Yolov3 demos on VCK5000
Support xmodel serving in cloud / on-premise (EA)
vai_q_caffe hangs when TRAIN and TEST phases point to the same LMDB file
TVM compiled Inception_v3 model gives low accuracy with DPUCADF8H (DPUv3Int8)
TensorFlow 1.15 quantizer error in QAT caused by an incorrect pattern match
Support new platforms, including Kria KV260 SoM kit and Versal ACAP platforms VCK190, VCK5000;
Extended Pytorch framework support from version 1.5 to version 1.7.1;
Added new state-of-the-art models, including 4D Radar detection, Image-Lidar sensor fusion, 3D detection & segmentation, multi-task, depth estimation, super resolution and more models that applicable to automotive, smart medical, industrial vision applications;
Easier subgraph partition user experience with the new Graph Runner API;
Improved performance;
Vitis AI 1.4 What’s New by Category
Expand the sections below to learn more about the new features and enhancements in Vitis AI 1.4.
Added 16 new models, and total 108 models from different deep learning frameworks (Caffe, TensorFlow, TensorFlow 2 and PyTorch) are provided.
Increased the diversity of models compared to Vitis AI 1.3:
For autonomous driving and ADAS, added 4D Radar detection, Image-Lidar sensor fusion, surround-view 3D detection, upgraded 3D segmentation and multi-task models
For medical and industrial vision, added depth estimation, RGB-D segmentation, super-resolution and other reference models
EoU enhancement: provided automated download scripts for free selection of the versions according to model name and hardware platform
Support fast finetune in post-training quantization (PTQ);
Improved quantize-aware training(QAT) functions:
Support more layers: swish/sigmoid, hard-swish, hard-sigmoid, LeakyRelu, nested tf.keras functional and sequential models