The SDAccel™ development environment for OpenCL™, C, and C++, enables up to 25X better performance/watt for data center application acceleration leveraging FPGAs. SDAccel, member of the SDx™ family, combines the industry's first architecturally optimizing compiler supporting any combination of OpenCL, C, and C++ kernels, along with libraries, development boards and the first complete CPU/GPU like development and run-time experience for FPGAs.
SDAccel™ is a development environment for OpenCL™ applications targeting Xilinx® FPGA-based accelerator cards. This environment enables concurrent programming of the in-system processor and the FPGA fabric without the need for RTL design experience. The application is captured as a host program written in C/C++ and a set of computation kernels expressed in C, C++, or the OpenCL C language.
Xilinx has partnered with Nimbix Inc., a leading provider of heterogeneous accelerator clouds for big data and machine learning to create the next generation of applications leveraging the computational density of an FPGA from C/ C++ and OpenCL.
The offering from Nimbix will dramatically lower the barrier to leveraging the high performance, energy efficient power of FPGAs to accelerate high end computational workflows across all industries. Developers can now run these tools in the cloud and then test and deploy on the latest Xilinx-accelerated hardware with no upfront investment or equipment purchases.
To get started with application acceleration on the cloud, visit http://www.nimbix.net/xilinx
Xilinx Application Acceleration on the Nimbix Cloud
|Getting Started||Hello||The hello world example is a simple design which tests the correct installation of the FPGA acceleration boards. The example uses the printf function call inside of the kernel code to report on the values provided from the host to the kernel.|
|Host_global_bandwidth||Host to global memory bandwidth test|
|Kernel_global_bandwidth||Bandwidth test of global to local memory|
|Sum_scan||Example of parallel prefix sum|
|Vadd||Simple example of vector addition.|
|Vdotprod||Simple example of vector dot-product.|
|Vmul_vadd||This example shows how data stored in global memory can be shared between kernels in different binary containers.|
|Acceleration||bfgminer||Bitcoin Mining Application implemented on SDAccel platforms||
80 Megahashes / second
|nearest_neighbor_linear_search||This is an optimized implementation of a nearest neighbor linear search algorithm||
256 Measurements/ Cycle
|smithwaterman||This is an optimized implementation of the smithwaterman algorithm. The main algorithm characteristics of this application are 1. Compute MaxScore 2. Systolic array implementation|
|Security||aes_decrypt||Implementation of an AES-128 ECB Encrypt in software, followed by decryption written in OpenCL and targeting execution on an SDAccel supported FPGA acceleration card.|
|rsa||This is an implementation of a RSA Decryption algorithm||
1,024 bits Cipher Text Length
|sha1||This is an optimized implementation of SHA1 secure hash algorithm targeting execution on an SDAccel|
|tiny_encryption||Implementation example of Tiny Encryption Algorithm (TEA), which is a block cipher.|
|Vision||Affine||Affine transformation is a linear mapping method that preserves points, straight lines, and planes.||
|Convolve||The convolve example is a performant design which showcases convolutional image filtering. The example processes the image 8 pixels at a time.||
|Edge_detection||Implementation of a Sobel Filter for edge detection.|
|Histogram_codec||This is an optimized implementation of a 12-bit histogram equalizer targeting execution on an SDAccel supported FPGA acceleration card.||
|Huffman_codec||This is an implementation of a huffman encoding/decoding algorithm targeting execution on an SDAccel supported FPGA acceleration card.|
|Median_filer||This is an optimized implementation of a median filter being used to remove noise in images.||
|Watermarking||This is an optimized implementation of a watermarking application to add watermarking to images.||
|Contributed Examples||ArrayFire – Fast Corner||Demo of FAST feature detection developed by ArrayFire|
|Polito – K-Nearest Neighbor||K-Nearest Neighbor Algorithm derived from the Rodinia Benchmark suite. This project is aimed at using SDAccel to implement the k-Nearest Neighbor algorithm onto a Xilinx FPGA.||
Realtime throughput at 1.23ms
|Polito – Black Sholes Monte Carlo||This project implements a Monte Carlo simulation of the Black-Scholes financial model, using both the European and the Asian options. It contains an OpenCL C++ kernel, to be mapped to FPGA via SDAccel. It provides much better energy-per-operation than a GPU implementation, at a comparable performance level.||
|Board Name & Description||Devices Supported||Vendor|
|Xilinx® Kintex® UltraScale™ FPGA Acceleration Development Kit
The Kintex® UltraScale™ FPGA Acceleration Development Kit is an excellent starting point for hyperscale application developers.
The ADM-PCIE-KU3 is a high performance reconfigurable half-length, low profile x16 PCIe form factor board based on the Xilinx Kintex UltraScale range of platform FPGAs.
|Kintex UltraScale||Alpha Data|
The ADM-PCIE-7V3 is a high performance reconfigurable half-length low profile x8 PCIe form factor board based on the Xilinx Virtex-7 range of Platform FPGAs.
|Board Name & Description||Devices Supported||Vendor|
The SB-850 is a full height, GPU-length, PCI Express board featuring up to eight HMC devices and a single high-performance Xilinx UltraScale FPGA.
|Kintex UltraScale||Micron Pico Computing|
The business card-sized M-505-K325T is a powerful computing element composed of FPGA logic (with loading system), a local memory sub-system, and a fully-switched PCIe x8 communication structure.
|Kintex-7||Micron Pico Computing|
The PEA-C8K0-060 is high performance reconfigurable Half-Length, Low profile Single x8 PCI Express(PCIe) 3.0 form factor board on the Xilinx Kintex Ultrascale FPGAs. Ideal for demanding applications including high Performance computing, data processing, data center and system modeling.
The PEA-C8K0-040 is high performance reconfigurable Half-Length, Low profile Single x8 PCI Express(PCIe) 3.0 form factor board on the Xilinx Kintex Ultrascale FPGAs. Ideal for demanding applications including high Performance computing, data processing, data center and system modeling.
|Semptian NSA-120 Accelerator Card
Semptian NSA-120 provides a new Xilinx FPGA based heterogeneous computing platform for big data analysis, cloud computing and network application acceleration. It can be used in big data analysis, image recognition/processing, video encoding/decoding, data compression/decompression, data encryption/decryption, voice recognition, neural network, machine learning, network security, etc.
|Fundamental Concepts of Application Host
The OpenCL standard for heterogeneous computing defines a programming model for transferring data between host processors and acceleration devices. This video provides an introduction to the minimum set of OpenCL APIs required for data transfer and control of accelerators on a device such as the FPGA.
|N-Dimensional Kernel Range
One of the key concepts in OpenCL is the division of the application problem into a multi-dimensional problem space. Each block of the problem space referred to as the N-Dimensional Kernel Range executes the same computation in parallel across all accelerators available in a device. This video introduces the N-Dimensional kernel range concept and its application to solving computation problems on parallel computing systems.
|OpenCL Application Structure
The OpenCL standard for heterogenous computing defines a basic programming model for all compute devices implementing the OpenCL standard. This video introduces the host code and kernel elements of an OpenCL application. The mapping of these elements to systems containing FPGA accelerator co-processing cards is explained.
|OpenCL Memory Architecture
OpenCL defines a memory architecture and abstraction model that is common to all computing devices implementing the standard. This means that a programmer only has to learn about 1 memory model, which simplifies application coding. This video provides an overview of the OpenCL memory model and how it is implemented in an FPGA acceleration device.
|Design Services Alliance Members||Markets|
|Cluster Technology Limited
Cluster Tech specializes in the provision of advanced computing technology solutions and utilizes High Performance Computing, Cloud, Business Intelligence and Financial Engineering to improve operational efficiency.
|High Performance Computing, Cloud, Business Intelligence and Financial Engineering|
|Irish Centre for High-End Computing (ICHEC)
ICHEC offers services to help clients enable, optimize and deploy OpenCL-based software solutions on high performance low energy Xilinx FPGAs. With a dynamic team of engineers with domain, systems and software expertise, ICHEC offers design services in finance, energy, life sciences, and analytics.
|Finance, Energy, Life Sciences, Analytics|
Instigate Design specializes in system level design of electronic systems, EDA specific software design and parallel programming. Design services range from software design and quality assurance to comprehensive application engineering with an emphasis on audio/video coding and communication.
|High Performance Computing|
MulticoreWare develops and licenses a wide range of computer vision and video processing libraries, while also providing design services to Xilinx customers.
|Audio, Video & Broadcast, Automotive & Transport|
ArrayFire is an industry leader in high performance computing software development and coding services.
|Defense/Aerospace, Consumer, Industrial Scientific Medical|