We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Page Bookmarked

Reconfigurable Acceleration

Develop and Deploy Platforms at Cloud Scale

Cloud data centers are changing.  Today’s CPUs have not been able keep up with today’s compute-intensive applications like machine learning, data analytics, and video processing. Coupled with increasing bottlenecks in networking and storage, cloud service providers have turned to accelerators to increase the overall throughput and efficiency of their cloud data centers.

Major cloud service providers like Microsoft and Baidu have announced deployment of FPGA technology in their Hyperscale data centers to drive their services business in an extremely competitive market. FPGAs are the perfect complement to highly agile cloud computing environments because they are programmable and can be hardware-optimized for any new application or algorithm. 

The inherent ability of an FPGA to reconfigure and be reprogrammed over time is perhaps its greatest advantage in a fast-moving field. Using dynamic reconfiguration, FPGAs can quickly change – in less than a second -- to a different design that is hardware-optimized for its next workload.  As a result, Xilinx FPGAs can deliver the flexibility, application breadth and feature velocity that complex and constantly changing hyperscale applications need – something that CPUs and custom ASICs cannot achieve.  

Xilinx Momentum in the Data Center

Customers - Three of the top seven hyperscale cloud companies have deployed Xilinx FPGAs, including Baidu, which in October announced it had designed Xilinx UltraScale™ FPGA in pools to accelerate machine learning inference. 

Partnerships - Both Qualcomm and IBM announced strategic collaborations with Xilinx for data center acceleration.  The IBM engagement already has already resulted in a storage and networking acceleration framework, CAPI SNAP, making it easier for developers to accelerate applications such as NoSQL using Xilinx FGPAs.

Standards Leadership - Xilinx has been leading an industry initiative toward the development of an intelligent, cache coherent interconnect called CCIX.  Formed in May 2016 by Xilinx along with AMD, ARM, Huawei, IBM, Mellanox, and Qualcomm, the initiative’s membership has since tripled in five months. 

Software-Defined Tools and Products for the Data Center -  The SDAccel™ Development Environment for FPGA acceleration was released in 2014.  In November 2016 Xilinx unveiled details for new 16nm Virtex UltraScale+™ FPGAs with High Bandwidth Memory (HBM) and CCIX technology.

The new Xilinx Reconfigurable Acceleration Stack enables the world’s largest cloud service providers to develop and deploy acceleration platforms at cloud scale and delivers ultimate flexibility for complex cloud computing applications like machine learning, data analytics, and video transcoding. Designed for cloud-native applications, this FPGA-powered acceleration stack includes libraries, framework integration, developer board and resources, and OpenStack support while providing up to 40x better compute efficiency of a CPU and up to six times the compute efficiency of any other FPGA on the market today.

Hear the latest on FPGA acceleration in hyperscale data centers from the Xilinx R&D team.

Review the Xilinx technical paper on “Deep Learning with INT8 Optimization on Xilinx Devices”. 

Learn about FPGA acceleration in the Amazon Cloud.

Get started now with a cloud-based test drive.

Sign up to be notified of acceleration news and updates from Xilinx.


Libraries in the Stack

DNN -- Deep Neural Network (DNN) library from Xilinx is a highly optimized library for building deep learning inference applications.  It is designed for maximum compute efficiency at 16-bit and 8-bit integer data types.

GEMM -- General Matrix Multiply (GEMM) library, based on the level-3 Basic Linear Algebra Subprograms (BLAS), from Xilinx delivers optimized performance at 16-bit and 8-bit integer data types and supports any matrices of any size.

HEVC Decoder & Encoder --  HEVC/H.265 is the latest video compression standard coming out of the MPEG and ITU standards bodies. It is the successor to H.264 and offers up to 50% bandwidth reduction. Xilinx provides two encoders – a high quality, real-time and flexible encoder to address the majority of video data center workloads and an alternate for non-camera generated content.  The decoder supports all the applications for both encoders.

Data Mover (SQL) – The SQL data mover library makes it easy to accelerate data analytics workloads with a Xilinx FPGA. The data mover library orchestrates standard connections to SQL databases by sending blocks of data from the database tables to the on-chip memory of the FPGA accelerator card over PCIe. The library has been optimized to maximally utilize PCIe bandwidth between the host CPU and the accelerator functions on the FPGA device.

Compute Kernel (SQL) – A library that accelerates numerous core SQL functions on the FPGA hardware such as decimal type, date type, scan, compare, filter and many others. The compute functions are optimized for exploiting the massive hardware parallelization of FPGAs.

Boards Type
Xilinx® Kintex® UltraScale™ FPGA Acceleration Development Kit Developer Evaluation Xilinx
Bittware PCIe Boards
Alpha Data ADM-PCIE-KU3 Production Alpha Data
Alpha Data ADM-PCIE-7V3 Production Alpha Data
COTS PEA-C8K0-040 Production COTS
Semptian NSA-120 Accelerator Card Production Semptian
Storage Acceleration Cards (NVMeoF) Production Fidus


Xilinx UltraScale+™ FPGAs are now available in the Amazon Elastic Cloud Compute (Amazon EC2) F1 instances. F1 instances are designed to accelerate FPGA-based hardware acceleration for key data center workloads including genomics, financial analytics, video processing, big data, security, and machine learning inference.

F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an FPGA Developer AMI and Hardware Developer Kit (HDK). Once your FPGA design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse your AFIs as many times, and across as many F1 instances as you like. Best of all, FPGAs in F1 instances are reprogrammable, so you get the flexibility to update and optimize your hardware acceleration without having to redesign any hardware. 

Amazon EC2 F1 instances are available today in two different sizes that include up to eight Virtex® UltraScale+ VU9P FPGAs with a combined peak compute capability over 170 TOP/sec (INT8). 

Run Custom FPGAs
in the AWS Cloud

In addition to Amazon EC2 F1 instances, AWS is also offering an FPGA Developer Amazon Machine Image (AMI), which is a pre-built cloud-based resource which includes scripts and Xilinx's Vivado® Design Suite and SDAccel™ development environment. Access the FPGA Developer AMI. If you prefer your dev system to be on premise, Vivado Design Suite together with the SDAccel development environment can be purchased or upgraded to work with the F1 instances in the Amazon cloud. Purchase or upgrade now.

If you are interested in F1 instances as part of our academic partner program, go to the Xilinx University Program (XUP) cloud page.

Featured Videos

View the following videos to see how you can access, configure, and run an F1 instance in less than 10 minutes and/or learn how the SDAccel development environment accelerates integration of RTL accelerators with software frameworks.

Applications and Services Running on F1 Instances

Hear from partners using F1 instances for acceleration of services and applications such as video transcoding, data analytics, machine learning and developer productivity.

Additional Resources

  • Read what’s been said in these Forbes and The Next Platform articles.
  • Watch deep dive on Amazon EC2 F1 Instance – May 2017 Online Tech Talk.
  • Get started now in the Amazon Cloud using F1 Instances.
  • Sign up to receive updates on Xilinx acceleration products and technology.


Xilinx has partnered with Nimbix Inc., a leading provider of heterogeneous accelerator clouds for big data and machine learning to create the next generation of applications leveraging the computational density of an FPGA from C/ C++ and OpenCL.

The offering from Nimbix will dramatically lower the barrier to leveraging the high performance, energy efficient power of FPGAs to accelerate high end computational workflows across all industries.  Developers can now run Xilinx tools in the cloud and then test and deploy on the latest Xilinx-accelerated hardware with no upfront investment or equipment purchases.

To get started with application acceleration on the cloud, visit http://www.nimbix.net/xilinx

The SDAccel™ development environment for OpenCL™, C, and C++, enables up to 25X better performance/watt for data center application acceleration leveraging FPGAs. SDAccel, member of the SDx™ family, combines the industry's first architecturally optimizing compiler supporting any combination of OpenCL, C, and C++ kernels, along with libraries, development boards and the first complete CPU/GPU like development and run-time experience for FPGAs. To learn more vist the SDAccel Zone.




Acceleration Resources  Description
FPGA Startup Gathers Funding Force for Merged Hyperscale Inference This article discusses FPGA-based architecture that targets efficient, scalable machine learning inference from startup DeePhi Tech.
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA FPGA2017 Best Paper winner for breakthrough results with a highly efficient FPGA-accelerated speech recognition engine achieving 43x the performance and 40x the performance per watt compared to a CPU; 3x the performance and 11x the performance per watt compared to a GPU.
Power-Efficient Machine Learning on POWER Systems using FPGA Acceleration This session provides an overview of how FPGA acceleration can enhance POWER systems for machine learning workloads such as image recognition.
Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster This paper presents a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency.
From Model to FPGA: Software-Hardware Co-Design for Efficient Neural Network Acceleration This presentation discusses  the use of FPGAs and trends in neural network acceleration.
Baidu Takes FPGA Approach to Accelerating SQL at Scale This article discusses Baidu’s approach to big data challenges using FPGAs.
SDA: Software-Defined Accelerator for general-purpose big data analysis system This presentation discusses Baidu’s Software-Defined Accelerator for a general-purpose big data analysis system.
SDA: Software-Defined Accelerator for Large-Scale DNN Systems This article consists of a collection of slides from the author's conference presentation on the special features, system design and architectures, processing capabilities, and targeted markets for Baidu's family of software defined accelerator products (SDA) for large scale deep neural network (DNN) systems.
User Forums
A community for discussing topics related to the SDAccel™ Development Environment for OpenCL™, C, and C++