Cloud data centers are changing. Today’s CPUs have not been able keep up with today’s compute-intensive applications like machine learning, data analytics, and video processing. Coupled with increasing bottlenecks in networking and storage, cloud service providers have turned to accelerators to increase the overall throughput and efficiency of their cloud data centers.
Major cloud service providers like Microsoft and Baidu have announced deployment of FPGA technology in their Hyperscale data centers to drive their services business in an extremely competitive market. FPGAs are the perfect complement to highly agile cloud computing environments because they are programmable and can be hardware-optimized for any new application or algorithm.
The inherent ability of an FPGA to reconfigure and be reprogrammed over time is perhaps its greatest advantage in a fast-moving field. Using dynamic reconfiguration, FPGAs can quickly change – in less than a second -- to a different design that is hardware-optimized for its next workload. As a result, Xilinx FPGAs can deliver the flexibility, application breadth and feature velocity that complex and constantly changing hyperscale applications need – something that CPUs and custom ASICs cannot achieve.
Customers - Three of the top seven hyperscale cloud companies have deployed Xilinx FPGAs, including Baidu, which in October announced it had designed Xilinx UltraScale™ FPGA in pools to accelerate machine learning inference.
Partnerships - Both Qualcomm and IBM announced strategic collaborations with Xilinx for data center acceleration. The IBM engagement already has already resulted in a storage and networking acceleration framework, CAPI SNAP, making it easier for developers to accelerate applications such as NoSQL using Xilinx FGPAs.
Standards Leadership - Xilinx has been leading an industry initiative toward the development of an intelligent, cache coherent interconnect called CCIX. Formed in May 2016 by Xilinx along with AMD, ARM, Huawei, IBM, Mellanox, and Qualcomm, the initiative’s membership has since tripled in five months.
Software-Defined Tools and Products for the Data Center - The SDAccel™ Development Environment for FPGA acceleration was released in 2014. In November 2016 Xilinx unveiled details for new 16nm Virtex UltraScale+™ FPGAs with High Bandwidth Memory (HBM) and CCIX technology.
The new Xilinx Reconfigurable Acceleration Stack enables the world’s largest cloud service providers to develop and deploy acceleration platforms at cloud scale and delivers ultimate flexibility for complex cloud computing applications like machine learning, data analytics, and video transcoding. Designed for cloud-native applications, this FPGA-powered acceleration stack includes libraries, framework integration, developer board and resources, and OpenStack support while providing up to 40x better compute efficiency of a CPU and up to six times the compute efficiency of any other FPGA on the market today.
Hear the latest on FPGA acceleration in hyperscale data centers from the Xilinx R&D team.
Review the Xilinx technical paper on “Deep Learning with INT8 Optimization on Xilinx Devices”.
Learn about FPGA acceleration in the Amazon Cloud.
Get started now with a cloud-based test drive.
Sign up to be notified of acceleration news and updates from Xilinx.
DNN -- Deep Neural Network (DNN) library from Xilinx is a highly optimized library for building deep learning inference applications. It is designed for maximum compute efficiency at 16-bit and 8-bit integer data types.
GEMM -- General Matrix Multiply (GEMM) library, based on the level-3 Basic Linear Algebra Subprograms (BLAS), from Xilinx delivers optimized performance at 16-bit and 8-bit integer data types and supports any matrices of any size.
HEVC Decoder & Encoder -- HEVC/H.265 is the latest video compression standard coming out of the MPEG and ITU standards bodies. It is the successor to H.264 and offers up to 50% bandwidth reduction. Xilinx provides two encoders – a high quality, real-time and flexible encoder to address the majority of video data center workloads and an alternate for non-camera generated content. The decoder supports all the applications for both encoders.
Data Mover (SQL) – The SQL data mover library makes it easy to accelerate data analytics workloads with a Xilinx FPGA. The data mover library orchestrates standard connections to SQL databases by sending blocks of data from the database tables to the on-chip memory of the FPGA accelerator card over PCIe. The library has been optimized to maximally utilize PCIe bandwidth between the host CPU and the accelerator functions on the FPGA device.
Compute Kernel (SQL) – A library that accelerates numerous core SQL functions on the FPGA hardware such as decimal type, date type, scan, compare, filter and many others. The compute functions are optimized for exploiting the massive hardware parallelization of FPGAs.
|Xilinx® Kintex® UltraScale™ FPGA Acceleration Development Kit||Developer Evaluation||Xilinx|
|Bittware PCIe Boards
|Alpha Data ADM-PCIE-KU3||Production||Alpha Data|
|Alpha Data ADM-PCIE-7V3||Production||Alpha Data|
|Semptian NSA-120 Accelerator Card||Production||Semptian
|Storage Acceleration Cards (NVMeoF)||Production||Fidus|
Xilinx UltraScale+™ FPGAs are now available in the Amazon Elastic Cloud Compute (Amazon EC2) F1 instances. F1 instances are designed to accelerate FPGA-based hardware acceleration for key data center workloads including genomics, financial analytics, video processing, big data, security, and machine learning inference.
F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an FPGA Developer AMI and Hardware Developer Kit (HDK). Once your FPGA design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse your AFIs as many times, and across as many F1 instances as you like. Best of all, FPGAs in F1 instances are reprogrammable, so you get the flexibility to update and optimize your hardware acceleration without having to redesign any hardware.
Amazon EC2 F1 instances are available today in two different sizes that include up to eight Virtex® UltraScale+ VU9P FPGAs with a combined peak compute capability over 170 TOP/sec (INT8).
In addition to Amazon EC2 F1 instances, AWS is also offering an FPGA Developer Amazon Machine Image (AMI), which is a pre-built cloud-based resource which includes scripts and Xilinx's Vivado® Design Suite and SDAccel™ development environment. Access the FPGA Developer AMI. If you prefer your dev system to be on premise, Vivado Design Suite together with the SDAccel development environment can be purchased or upgraded to work with the F1 instances in the Amazon cloud. Purchase or upgrade now.
If you are interested in F1 instances as part of our academic partner program, go to the Xilinx University Program (XUP) cloud page.
View the following videos to see how you can access, configure, and run an F1 instance in less than 10 minutes and/or learn how the SDAccel development environment accelerates integration of RTL accelerators with software frameworks.
Hear from partners using F1 instances for acceleration of services and applications such as video transcoding, data analytics, machine learning and developer productivity.
Xilinx has partnered with Nimbix Inc., a leading provider of heterogeneous accelerator clouds for big data and machine learning to create the next generation of applications leveraging the computational density of an FPGA from C/ C++ and OpenCL.
The offering from Nimbix will dramatically lower the barrier to leveraging the high performance, energy efficient power of FPGAs to accelerate high end computational workflows across all industries. Developers can now run Xilinx tools in the cloud and then test and deploy on the latest Xilinx-accelerated hardware with no upfront investment or equipment purchases.
To get started with application acceleration on the cloud, visit http://www.nimbix.net/xilinx
The SDAccel™ development environment for OpenCL™, C, and C++, enables up to 25X better performance/watt for data center application acceleration leveraging FPGAs. SDAccel, member of the SDx™ family, combines the industry's first architecturally optimizing compiler supporting any combination of OpenCL, C, and C++ kernels, along with libraries, development boards and the first complete CPU/GPU like development and run-time experience for FPGAs. To learn more vist the SDAccel Zone.
|FPGA Startup Gathers Funding Force for Merged Hyperscale Inference||This article discusses FPGA-based architecture that targets efficient, scalable machine learning inference from startup DeePhi Tech.|
|ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA||FPGA2017 Best Paper winner for breakthrough results with a highly efficient FPGA-accelerated speech recognition engine achieving 43x the performance and 40x the performance per watt compared to a CPU; 3x the performance and 11x the performance per watt compared to a GPU.|
|Power-Efficient Machine Learning on POWER Systems using FPGA Acceleration||This session provides an overview of how FPGA acceleration can enhance POWER systems for machine learning workloads such as image recognition.|
|Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster||This paper presents a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency.|
|From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration||This presentation discusses the use of FPGAs and trends in neural network acceleration.|
|Baidu Takes FPGA Approach to Accelerating SQL at Scale||This article discusses Baidu’s approach to big data challenges using FPGAs.|
|SDA: Software-Defined Accelerator for general-purpose big data analysis system||This presentation discusses Baidu’s Software-Defined Accelerator for a general-purpose big data analysis system.|
|SDA: Software-Defined Accelerator for Large-Scale DNN Systems||This article consists of a collection of slides from the author's conference presentation on the special features, system design and architectures, processing capabilities, and targeted markets for Baidu's family of software defined accelerator products (SDA) for large scale deep neural network (DNN) systems.|
||A community for discussing topics related to the SDAccel™ Development Environment for OpenCL™, C, and C++|