We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

DeePhi Tech

Deep Learning Technology



DeePhi technology is a recognized leader in deep learning acceleration. They provide end-to-end solutions utilizing unique deep compression and a configurable Deep Learning platform.

Through synergistical optimization of neural networks and FPGAs, DeePhi provides more efficient, convenient and economical inference platforms for both embedded end and sever-side, including but not limited to data centers and surveillance.

The DeePhi team consists of renowned researchers and experienced professionals known for their pioneering work in the field of deep learning. Their work is recognized in the field of deep learning in optimizing neural networks for image and speech recognition.

DeePhi Technology was acquired by Xilinx in July 2018.

DNNDK™ (Deep Neural Network Development Kit)

DeePhi™ deep learning SDK, is designed as an integrated framework, which aims to simplify & accelerate DL (Deep Learning) applications development and deployment on DeePhi DPU™ (Deep Learning Processing Unit) platform. (Click DNNDK for more information.)


Key Features

  • Industry-leading technology & the first public release of deep learning SDK in China
  • Innovative full-stack solution for deep learning development
  • A complete set of solid optimization toolchains, covering compression, compilation and runtime
  • Lightweight standard C/C++ programming APIs
  • Easy-to-use & flat/gentle learning curve

DNNDK consists of:

  • DEep ComprEssioN Tool (DECENT)
  • Deep Neural Network Compiler (DNNC)
  • Deep Neural Network Assembler (DNNAS)
  • Neural Network Runtime (N2Cube)
  • DPU Simulator and Profiler

DNNDK Components

Description Block Diagram

Aristotle ArchitectureDECENT (DEep ComprEssioN Tool)

There is lots of redundant information in DNNs (Deep Neural Networks), including the number and precision of parameters, which leaves us great opportunities for optimization. With our world-leading research in neural network model compression, DeePhi developed DECENT (DEep ComprEssioN Tool). It introduces pruning, quantization, weight-sharing and Huffman encoding to reduce model size from 5x to 50x without loss of accuracy. Therefore, it greatly brings DPU platform higher computation efficiency, better energy efficiency and lower system memory bandwidth requirement.

Click to enlarge

DNNDK Hybrid Compilation Model

DeePhi-patented hybrid compilation technique initiatively resolves the programing complexities and deployment difficulties of DL applications under heterogeneous AI computing environment. Users-developed C/C++ application source code and DPU instruction code generated by DNNC for neural network are compiled and linked together, empowering a rapid turn-key deployment solution for DPU platform.

Click to enlarge

Deep Neural Network Compiler (DNNC)

DNNC is the key to maximize the computation power of DPU via efficiently mapping neural network into high performance DPU instructions. After parsing the topology of input trained & compressed neural network, it constructs internal computation graph IR in DAG format, including corresponding control flow & data flow information. It performs multiple kinds of compiler optimizing and transforming techniques, including computation nodes fusion, efficient instruction scheduling, full data reuse of DPU on-chip feature map and weights, etc. DNNC significantly improves DPU computation resource utilization under the constraint of low system memory bandwidth and power requirements.

Click to enlarge

Hardware Architecture

Block Diagram Description Applications

Click to enlarge

Aristotle Architecture

In order to compute convolutional neural networks (CNN) , DeePhi designed the Aristotle Architecture from the ground up. While currently used for video and image recognition tasks, the architecture is flexible and scalable for both servers and portable devices.

Video and Image Recognition

Click to enlarge

Descartes Architecture

DeePhi's Descartes Architecture is designed for compressed Recurrent Neural Networks (RNN) including LSTM. By taking advantage of sparsity, the DeePhi Descartes Architecture can achieve over 2.5 TOPS on a KU060 FPGA at 300MHz allowing for instantaneous speech recognition, natural language processing, and many other recognition tasks.

Based on the Descartes Architecture, DDESE (DeePhi Descartes Efficient Speech Recognition Engine) is proposed for speech recognition. We have released the solution on AWS Marketplace, you can test our solution using AWS F1 instance. (Click DDESE for more information.)

Compressed Recurrent Neural Networks (RNN)

Speech Recognition

DeePhi Descartes Efficient Speech Recognition Engine (DDESE)

DDESE is an efficient end-to-end automatic speech recognition (ASR) engine with the deep learning acceleration solution of algorithm, software and hardware co-design (containing pruning, quantization, compilation and FPGA inference) by DeePhi. We Use Baidu DeepSpeech2 framework with LibriSpeech 1000h dataset for model training and compression. Users could run the test scripts for both performance comparison of CPU/FPGA and single sentence recognition.


Innovative full-stack acceleration solution for deep learning in acoustic speech recognition (ESE: best paper of FPGA2017)

  • Support both unidirectional and bi-directional LSTM acceleration on FPGA for model inference
  • Support CNN layers, Fully-Connected (FC) layers, Batch Normalization layers and varieties of activation functions such as Sigmoid, Tanh and HardTanh
  • Support testing for both performance comparison of CPU/FPGA and single sentence recognition
  • Supporting user’s own test audio recognition (English, 16kHz sample rate, no longer than 3 seconds)
Page Bookmarked