Algorithm with Poor Data Access Patterns

A standard convolution function applied to an image is used here to demonstrate how the C code can negatively impact the performance that is possible from an FPGA. In this example, a horizontal and then vertical convolution is performed on the data. Because the data at the edge of the image lies outside the convolution windows, the final step is to address the data around the border.

The algorithm structure can be summarized as follows:

  • A horizontal convolution.
  • Followed by a vertical convolution.
  • Followed by a manipulation of the border pixels.
static void convolution_orig(
  int width,
  int height,
  const T *src,
  T *dst,
  const T *hcoeff,
  const T *vcoeff) {

// Horizontal convolution
  HconvH:for(int col = 0; col < height; col++){
    HconvWfor(int row = border_width; row < width - border_width; row++){
      Hconv:for(int i = - border_width; i <= border_width; i++){

// Vertical convolution
  VconvH:for(int col = border_width; col < height - border_width; col++){
    VconvW:for(int row = 0; row < width; row++){
      Vconv:for(int i = - border_width; i <= border_width; i++){

// Border pixels
  Top_Border:for(int col = 0; col < border_width; col++){
  Side_Border:for(int col = border_width; col < height - border_width; col++){
  Bottom_Border:for(int col = height - border_width; col < height; col++){