Cloud data centers are changing. Today’s CPUs have not been able keep up with today’s compute-intensive applications like machine learning, data analytics, and video processing. Coupled with increasing bottlenecks in networking and storage, cloud service providers have turned to accelerators to increase the overall throughput and efficiency of their cloud data centers.
Major cloud service providers like Amazon, Baidu, and Microsoft have announced deployment of FPGA technology in their Hyperscale data centers to drive their services business in an extremely competitive market. FPGAs are the perfect complement to highly agile cloud computing environments because they are programmable and can be hardware-optimized for any new application or algorithm.
The inherent ability of an FPGA to reconfigure and be reprogrammed over time is perhaps its greatest advantage in a fast-moving field. Using dynamic reconfiguration, FPGAs can quickly change – in less than a second -- to a different design that is hardware-optimized for its next workload. As a result, Xilinx FPGAs can deliver the flexibility, application breadth and feature velocity that complex and constantly changing hyperscale applications need – something that CPUs and custom ASICs cannot achieve.
Customers - Three of the top seven hyperscale cloud companies have deployed Xilinx FPGAs, including Baidu, which in October announced it had designed Xilinx UltraScale™ FPGA in pools to accelerate machine learning inference.
Partnerships - Both Qualcomm and IBM announced strategic collaborations with Xilinx for data center acceleration. The IBM engagement already has already resulted in a storage and networking acceleration framework, CAPI SNAP, making it easier for developers to accelerate applications such as NoSQL using Xilinx FGPAs.
Standards Leadership - Xilinx has been leading an industry initiative toward the development of an intelligent, cache coherent interconnect called CCIX. Formed in May 2016 by Xilinx along with AMD, ARM, Huawei, IBM, Mellanox, and Qualcomm, the initiative’s membership has since tripled in five months.
Software-Defined Tools and Products for the Data Center - The SDAccel™ Development Environment for FPGA acceleration was released in 2014. In November 2016 Xilinx unveiled details for new 16nm Virtex UltraScale+™ FPGAs with High Bandwidth Memory (HBM) and CCIX technology.
|Domain||Offering||Vendor||Acceleration vs. CPU|
|Compression||FPGA based GZIP compression||2.3x|
|Data Analytics||Search and Analytics Toolkit
|Data Analytics||Postgress Database Acceleration, execute existing Postgres SQL queries||2x-3x|
|Data Analytics||World's Fastest Memcached||9x|
|Financial Computing||High Performance Monte Carlo Option Pricing Simulation||42x-540x|
|Genomics||DRAGEN Complete Suite - Ultra-rapid analysis of Next Generation Sequencing - Exome||90x|
|Genomics||DRAGEN Complete Suite - Ultra-rapid analysis of Next Generation Sequencing - Genome
|Genomics||NGS Reference Genome Assembly||100x|
|Image Processing||WebP Image Encoding, Optimized to enable faster and smaller images on the Web||6.5x-14.2x|
|Machine Learning||Descartes Efficient Speech Recognition Engine (using LSTM)
|Latency: F1 15.07 (ms), cuDNN on P4 38.58 (ms), CPU 118.31 (ms)|
|Machine Learning||Accelerated Apache Spark MLlib||25x|
|Machine Learning||Neural Network Inference for image classification||100x-500x|
|Machine Learning||Deep-Learning Inference with Binarized Neural Networks||800x|
|Machine Learning||Machine Learning Suite for Inference (Tensorflow, Caffe and MXNet)||2x-10x|
|Math||High Performance GEMM, SPMV||N/A|
|Security||Hyperion 10G RegEx File Scan||N/A|
|Security||AMI for Hardware-accelerated RSA Operations||N/A|
|Tools||3D Rendering, Digit Recognition, Spam Filter andFace Detection||NA|
|Tools||FireSim Demo v.1.0, FPGA-accelerated hardware simulation||N/A|
|Tools||Merlin C/C++ Compiler AMI||N/A|
|Tools||InTime Automated Optimization Software for FPGA Design
|Tools||Visual Systems Integrator for FPGA and Embedded Development||N/A|
|Video||NGCodec HEVC/H.265 Encoder D01
|Video||FPGA Accelerated HEIF-to-JPEG Transcoder, HEVC Decoder||N/A|
|Virtex UltraScale+ FPGA VCU1525 Acceleration Development Kit||Developer Evaluation||Xilinx|
|Kintex UltraScale FPGA Acceleration Development Kit||Developer Evaluation||Xilinx|
|Bittware PCIe Boards
|Alpha Data ADM-PCIE-KU3||Production||Alpha Data|
|Alpha Data ADM-PCIE-7V3||Production||Alpha Data|
|Semptian NSA-120 Accelerator Card||Production||Semptian
|Storage Acceleration Cards (NVMeoF)||Production||Fidus|
Faced with exponential growth in computing requirements and the inability for CPU technology to keep pace, cloud and data center architectures are moving toward accelerated computing. Accelerators compliment CPU-based architectures and deliver both performance and power efficiency.
FPGAs can deliver 10x acceleration across a broad set of applications and are reconfigurable to provide an ideal fit for the changing workloads of the modern data center.
With acceleration capabilities a full generation ahead of any other FPGA, Xilinx UltraScale™ and UltraScale+ FPGAs are empowering hardware and application developers in many of the world’s largest and most innovative cloud computing services.
The SDAccel™ development environment for OpenCL™, C, and C++, enables up to 25X better performance/watt for data center application acceleration leveraging FPGAs. SDAccel, member of the SDx™ family, combines the industry's first architecturally optimizing compiler supporting any combination of OpenCL, C, and C++ kernels, along with libraries, development boards and the first complete CPU/GPU like development and run-time experience for FPGAs. To learn more vist the SDAccel Zone.
|FPGA Startup Gathers Funding Force for Merged Hyperscale Inference||This article discusses FPGA-based architecture that targets efficient, scalable machine learning inference from startup DeePhi Tech.|
|ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA||FPGA2017 Best Paper winner for breakthrough results with a highly efficient FPGA-accelerated speech recognition engine achieving 43x the performance and 40x the performance per watt compared to a CPU; 3x the performance and 11x the performance per watt compared to a GPU.|
|Power-Efficient Machine Learning on POWER Systems using FPGA Acceleration||This session provides an overview of how FPGA acceleration can enhance POWER systems for machine learning workloads such as image recognition.|
|Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster||This paper presents a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency.|
|From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration||This presentation discusses the use of FPGAs and trends in neural network acceleration.|
|Baidu Takes FPGA Approach to Accelerating SQL at Scale||This article discusses Baidu’s approach to big data challenges using FPGAs.|
|SDA: Software-Defined Accelerator for general-purpose big data analysis system||This presentation discusses Baidu’s Software-Defined Accelerator for a general-purpose big data analysis system.|
|SDA: Software-Defined Accelerator for Large-Scale DNN Systems||This article consists of a collection of slides from the author's conference presentation on the special features, system design and architectures, processing capabilities, and targeted markets for Baidu's family of software defined accelerator products (SDA) for large scale deep neural network (DNN) systems.|
||A community for discussing topics related to the SDAccel™ Development Environment for OpenCL™, C, and C++|