We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

AR# 20858

LogiCORE PCI/PCI-X - What is the throughput or bandwidth of the PCI or PCI-X Core?


General Description:

What is the throughput or bandwidth of the PCI or PCI-X Core?


The throughput of the PCI or PCI-X Core depends on the user application. Except for the transaction overhead, which is roughly 5 or 6 clock cycles (depending on the type of transfer), data is pipelined directly through the core. In other words, whatever is on the bus shows up later on the back side of a cycle and vice versa. You do not have the concept of flow control (acks/naks) with PCI, which could slow your throughput from the link's perspective like you do with other standards, such as RapidIO or HyperTransport. However, you do have wait states, disconnects, and transaction overhead.

If your user application can sustain a long burst, and the device you are talking to can sustain the burst without disconnecting or inserting wait states, then you can begin to reach the theoretical maximum bandwidth for the bus size/speed that you are running.

The maximum theoretical bandwidth for 32/33 PCI is 132 MB/sec, but you will rarely achieve this 100%. This assumes infinitely long bursts, no wait states, and no bus delays. But in PCI, you have initial transaction overhead, possible wait states, and disconnections that can occur. If you want to find the exact numbers your design can achieve, you have to calculate the amount of data transferred, divided by the time it took to transfer it.

For example, your user application wants to do a memory write of 256 Dwords or 1024 bytes on a 32-bit/33 MHz bus. There are 5 cycles of transaction overhead (this is not exact). If there are no wait states, you can transfer 1 Dword every 30 ns; and if you can do it all in one burst, then you can transfer 256 Dwords in 7680 ns. Add on the 5 cycles of overhead (5*30), and you get 1024 bytes transferred in 7830 ns, or roughly 130MB/sec.

Assume your user application cannot sustain this burst length, for whatever reason, and it breaks it into two 128 Dword bursts. In that case, you would get 2 128 Dword transfers at 30 ns each, giving 3840 ns per transfer, plus 10 cycles of overhead, giving 300 ns of overhead, giving a total of 7980 ns for 1024 bytes of data, or 128 MB/sec. This does not count for the time between transfers, which would make this even lower.

The user application is the limiting factor, and not the core. If you know how well your user application will work, then you can calculate the throughput. You can review the design guide to determine the transaction overhead introduced by the core on the various transaction types. This will be consistent on each transfer.

AR# 20858
Date Created 09/03/2007
Last Updated 12/15/2012
Status Active
Type General Article