Achieves throughput using high-batch size. Must wait for all inputs to be ready before processing, resulting in high latency.
Achieves throughput using low-batch size. Processes each input as soon as it’s ready, resulting in low latency.
Optimized hardware acceleration of both AI inference and other performance-critical functions by tightly coupling custom accelerators into a dynamic architecture silicon device.
This delivers end-to-end application performance that is significantly greater than a fixed-architecture AI accelerator like a GPU; because with a GPU, the other performance-critical functions of the application must still run in software, without the performance or efficiency of custom hardware acceleration.
Adaptable silicon allows Domain-Specific Architectures (DSAs) to be updated,
optimizing the latest AI models without needing new silicon
Fixed silicon devices are not optimized for the latest models due to long development cycles
AMD delivers the highest throughput at the lowest latency. In standard benchmark tests on GoogleNet V1, the AMD Alveo U250 platform delivers more than 4x the throughput of the fastest existing GPU for real-time inference. Learn more in the whitepaper: Accelerating DNNs with AMD Alveo Accelerator Cards
AI Inference performance leadership with Vitis AI Optimizer technology.
Optimization/Acceleration Compiler Tools