Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Developing Image Processing Algorithms with System Generator



by Daniel E. Michek, Staff Field Applications Engineer, Xilinx, Inc.
daniel.michek@xilinx.com (10/15/04)


System Generator allows you to easily implement and analyze any image processing algorithm.
article link to PDF
Article PDF 255 KB


Converting image processing algorithms to FPGA implementations can be tedious. The algorithm may be proven in software but with no direct link to actual implementation. Additionally, it can be difficult to subjectively verify the implementation.

Using a mathematical simulator to verify and create HDL implementation files bridges the gap from the algorithm architect to the FPGA engineer. Xilinx® System Generator for DSP allows for high-level mathematical verification and converts the heart of the algorithm into ready-to-use HDL.

Simulation inside of The MathWorks Simulink® tool enables you to easily verify the image algorithm qualitatively and subjectively when used with the Image Processing Toolbox, also from The MathWorks. Using System Generator to develop and implement image processing algorithms allows for a thoroughly verified and easily executed design; plus, you save time on subjective analysis of the HDL. The high-level block diagram allows for easy communication between team members, resulting in less time spent crossing skill boundaries when determining implementation trade-offs.

The Basics
The Image Processing Toolbox is an excellent starting point with which to develop image processing algorithms for FPGAs. It allows you to easily load and view many image types. Although not directly usable within System Generator, it can also rotate, resize, filter, convert to and from the frequency domain, and operate on the image mathematically and morphologically. You can use these latter functions as a qualitative measure against the actual System Generator implementation.

Exchanging Data
Images are often stored and manipulated as two-dimensional arrays that can be quite large. For example, a 1,024 x 1,024 x 8-bits-per-pixel image is 1 MB. Typically, a portion of the image is moved into the FPGA one pixel at a time and aggregated into a larger unit (often a line), with manipulations performed on these points. An example pre-processing MATLAB® m-file script might contain:

  • Reading in a source test image using the IMREAD function in the Image Processing Toolbox.
  • Analyzing variables such as width, height, and color depth of the image to pass as arguments to Xilinx block set tokens. This enables easy parameterization and scaling of the application.
  • Storage and creation of other variables necessary to the application. Examples include rotation angle, resizing percentages, and bit precision within the algorithm.
  • Converting the matrix data from an m x n array to a 1 x (m*n) with “for loops” and concatenation. This allows The MathWorks DSP block set “Signal From Workspace” token to pass elements as samples to the Xilinx block set “Gateway In” token.
  • Viewing the source test image for later subjective analysis using the IMSHOW function in the Image Processing Toolbox.
An example post-processing m-file might contain:
  • Conversion of “ToWorkspace” variables from a 1 x (p*q) array to a p x q array for easy manipulation by using “for loops.”
  • Displaying the resulting matrices using IMSHOW for subjective analysis.
  • Computing qualitative analysis of the results versus the original image or an algorithm developed with the Image Processing Toolbox.
Using M-Files
You can use m-files to model the algorithm before implementation with System Generator. Although this step is optional, it can be a highly effective aid in the slight paradigm shift from matrix operations in software to raster scan operations in a highly parallel architecture.

Some of the key benefits include:

  • Deconstruction proofs of Image Processing Toolbox functions as an aid to understand the algorithm.
  • Creation of intermediate variables to assist in debugging the System Generator version of the design. Try to set variables similar to how they will appear in the design. Two-dimensional matrices should be used as memory, while raster order works better between blocks.
  • Making algorithm trade-offs. Ask yourself if you should place the bi-linear interpolation stage before or after the sending of data to memory.
  • A fast proof that the algorithm is on the right track.
  • Qualitative analysis from the deconstruction m-file to the corresponding Image Processing Toolbox function.
Design Considerations – An Example
Real-time image rotation has many challenges when going from the algorithm architect at a high level to the FPGA engineer at the HDL and board level. The algorithm level choices that you will make about how to implement the design in the FPGA will affect device utilization. Conversely, board and system requirements may place limitations on your design, such as number, width, and type of frame buffer memories.

Design Choices
A key decision about the architectural style of the rotation algorithm is quality. For example, when moving the pixels of the original image grid to fractional locations on the rotated image grid, is a nearest neighbor selection adequate? Should you perform bi-linear interpolation on the image to reduce shear (the tendency to see discontinuities along object edges)? Is bicubic interpolation (determining the new pixel value based on weighting and summing its closest 16 neighbors) the preferred method? Your choice of quality will impact resources in both the FPGA and the frame buffer.

Our example is based upon the Hotelling transform to determine the original image versus the rotated image pixel addresses. You have two choices as to where the rotation engine occurs with respect to frame buffering:

  1. Place the rotation engine after the frame buffer. This has the effect of sequential raster scan address writes into the frame buffer and allows for raster scan format of the output data by reading from the frame buffer in non-sequential address form. The Hotelling transform with basis vectors cos(t) and sin(t) for rotation angle (t), {Sx,Sy} representing source x and y coordinates, and {Dx,Dy} representing destination x and y coordinates in the 2-D image are:
    Sx = Dx * cos(t) + Dy * sin(t)
    Sy = Dy * cos(t) – Dx * sin(t)
  2. Place the rotation engine before the frame buffer. In effect, this method predetermines and weights the values stored through non-sequential addressing into the frame buffer. The output data is read in raster scan format from the frame buffer. The Hotelling transforms with similar representation as above are:
    Dx = Sx * cos(t) – Sy * sin(t)
    Dy = Sy * cos(t) + Sx * sin(t)
In their paper, “Real Time Image Rotation and Resizing, Algorithms and Implementations” (www.xilinx.com/products/logicore/dsp/rotation_resize.pdf), my Xilinx colleagues Robert Turney and Chris Dick showed that this approach is more economical in terms of memory bandwidth. Because the data is stored to the frame buffer in non-sequential order, you must take care to ensure that all valid pixel locations are written to.

Also, if the rotation angle is changed, you must clear artifacts in the corners from the previous frame from the frame buffer. A nearest neighbor rotation of 45 degrees clockwise demonstrates the void artifacts, as shown in Figure 1, that occur due to multiple input pixels being written to the same frame buffer address.

To avoid a complex control path and concentrate mostly on the data path, our example goes with the first choice previously described (place the rotation engine after the frame buffer). The net effect will be an increase in memory bandwidth of 4 or 16 times for bi-linear and bi-cubic interpolation, respectively. The choice of how to deal with the increase in bandwidth requirements is design- and memory-dependent.

You can increase the number of read accesses from the memory by a factor of four to minimize the total amount of memory for bi-linear interpolation. This will decrease the overall incoming pixel rate by a factor of four unless the memory access speed is sufficient to handle the offset. Alternatively, you can store the four pixels necessary for bi-linear interpolation at the same address location, thus increasing the total amount of memory by a factor of four. This methodology can be mitigated based on pixel bit width – that is, 8-bit-wide pixels can use this approach to fit into a 32-bit-wide memory footprint.

Design Implementation with System Generator
Our example will implement image rotation with the following assumptions:

  • Input image size and pixel widths are known at the time of implementation
  • Use the Hotelling transform of the form: Sx = Dx * cos(t) + Dy * sin(t) Sy = Dy * cos(t) – Dx * sin(t)
  • Bi-linear interpolation
  • Increased memory bandwidth is accounted for by reading the data out of the frame buffer four times faster than the incoming pixel rate
Frame Buffer Read Address Generation
Because the data will be output in raster scan format, you can create the destination address with two counters representing the two dimensions. These values must be center- adjusted around zero before being transformed. At this point, any adjustment changes the center of rotation for the output image.

The following two examples demonstrate manipulating the output image by using minimal extra logic:

  • Two adders are all that are required to add pan control for the output image.
  • Using two multipliers, you can scale the destination addresses to zoom in or out from the center point.
As interpolation occurs at a later stage, zooming and panning at this point will result in an output image with each pixel uniquely interpolated, versus a simple copy or additional interpolation procedure later. Figure 2 demonstrates a 45-degree counterclockwise rotation with 8x zoom. Once the destination address is correct for center, pan, and zoom, the transform is applied. The resulting addresses are reversecorrected for the centering function. The numbers are of the form integer with a decimal portion. The integer portions are the required source address locations to read from the frame buffer. The decimal portions are used to create weighting factors used for the bi-linear interpolation.

Weighting the Data
Four pixels are fetched from the frame buffer to create a 2 x 2 matrix containing the source address pixel and its original neighbors to the right, below, and below-right, which we will identify as XY, XpY, XYp, and XpYp, respectively. The source address transform uses the decimal portions of {Sx,Sy} represented as {Rx,Ry} to create the following weighting equations:

  • Weighting(XY) = Wxy = (1 – Rx) (1–Ry)
  • Weighting(XpY) = Wxpy = (Rx) (1 –Ry)
  • Weighting(XYp) = Wxyp = (1 – Rx) (Ry)
  • Weighting(XpYp) = Wxpyp = (Rx) (Ry)
These equations can be shown to be equal to the following equations to reduce the number of multipliers necessary from four to one:
  • Wxy = (1 – Rx) – Wxyp
  • Wxpy = (Rx) – Wxpyp
  • Wxyp = (Ry) – Wxpyp
  • Wxpyp = (Rx) (Ry)
The interpolated pixel value is the summation of the 2 x 2 matrix pixels multiplied by their respective weighting functions. Compare the original image in Figure 3 with an image that was rotated through 7.5 degrees, followed by a -7.5 degree rotation in Figure 4.

Conclusion
In this article, I’ve shown how to use System Generator to explore and implement image processing algorithms.

The use of a mathematical simulation tool with image handling capabilities allows you to easily investigate various options in an intuitive capacity. Although this example shows an image rotation implementation, these same methodologies can help you develop any image processing algorithm.

If you have any questions or suggestions, call me, Daniel Michek, at (858) 431-5901, or send me an e-mail at daniel.michek@xilinx.com.

Printable PDF version of this article with graphics. PDF logo (10/15/04) 255 KB

 
Jobs Events Webcasts News Investors Feedback Legal Privacy Trademarks Sitemap
© 1994-2008 Xilinx, Inc. All Rights Reserved.