UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

AR# 68049

DMA Subsystem for PCI Express (Vivado 2016.3) - Performance Numbers

Description

This answer record provides performance numbers for the DMA Subsystem for PCI Express. The provided numbers are separated into Hardware Performance and Software Performance.



This article is part of the PCI Express Solution Centre

(Xilinx Answer 34536)Xilinx Solution Center for PCI Express

Solution

Hardware performance:

This represents a pure DMA hardware data rate. User software and kernel driver involvement is not accounted for. 

The user application sets up the transfer and reads a few registers after a fixed time to check for performance values. The experiments performed are with configuration of Gen3x8, 256 bits and 4 channels of H2C and C2H enabled on a VCU108 board.

The numbers in the graph below are with one channel of H2C and C2H running sequentially. The system configuration is as follows:

Z77, MPS 256Bytes, MRRS 512Bytes and OS: CentOS 6.2





Software performance:

This is measured as a ratio of the number of bytes transferred to the total transfer time. In this example, the total time includes the processing time in the user application and the kernel, and the latency incurred in the hardware.

There are several factors that have an influence, including data rate, interrupt processing, the system on which the application is being run, OS etc.. Systems with better MPS will give better performance; a typical system would have 128Bytes MPS.

One of the main factors affecting data throughput is interrupt processing. Once data transfer is completed, the DMA sends an interrupt to the host and waits for ISR to process the status.

However, this wait time is not predictable and so the overall total data transfer time is slow and unpredictable.


There are a couple of options you can try to work around this.


1) MSI-X interrupt : Users can try using MSI-X interrupt Instead of MSI or legacy interrupts. With MSI-X interrupt, the data rate is better than with an MSI or legacy interrupt based design.

2) Poll mode : Users can try using Poll mode which gives the best data rate. With Poll mode, there are no interrupts to process. However, the driver needs to monitor the data completion process all of the time.

The chart below shows software number comparisons between Poll mode design and an interrupt (MSI) based design. Both Host to Card (H2C) and Card to Host (C2H) are plotted below. 

System configuration : MPS 256 Bytes, MRRS 512 Bytes and OS: CentOS 6.2


 


 


 


 


When poll mode transfer and interrupt mode transfer are compared, poll mode data rates are significantly better.


 



Revision History:

10/13/2016Initial Release
10/21/2016Added additional graphs.
AR# 68049
Date Created 10/10/2016
Last Updated 11/04/2016
Status Active
Type General Article
IP
  • UltraScale FPGA Gen3 Integrated Block for PCI Express (PCIe)