Achieves throughput using high-batch size. Must wait for all inputs to be ready before processing, resulting in high latency.
Achieves throughput using low-batch size. Processes each input as soon as it’s ready, resulting in low latency.
Optimized hardware acceleration of both AI inference and other performance-critical functions by tightly coupling custom accelerators into a dynamic architecture silicon device.
This delivers end-to-end application performance that is significantly greater than a fixed-architecture AI accelerator like a GPU; because with a GPU, the other performance-critical functions of the application must still run in software, without the performance or efficiency of custom hardware acceleration.
AMD delivers the highest throughput at the lowest latency. In standard benchmark tests on GoogleNet V1, the AMD Alveo U250 platform delivers more than 4x the throughput of the fastest existing GPU for real-time inference. Learn more in the whitepaper: Accelerating DNNs with AMD Alveo Accelerator Cards
AI Inference performance leadership with Vitis AI Optimizer technology.
Optimization/Acceleration Compiler Tools