Accelerating Subgraph with ML Frameworks

Partitioning is the process of splitting the inference execution of a model between the FPGA and the host. Partitioning is necessary to execute models that contain layers unsupported by the FPGA. Partitioning can also be useful for debugging and exploring different computation graph partitioning and execution to meet a target objective. Following is an example of a Resnet based SSD object detection model. Notice the parts in the following figure, in red that is replaced by fpga_func_0 node in the partitioned graph. The partitioned code is complete and executes on both CPU and FPGA.

Note: This support is currently available for Alveo™ based deep learning solution.
Figure 1: Original Graph
Figure 2: Partitioned Graph