Xcell Journal Online
  Xcell Journal Article
  Partner Yellow Pages
   
  Xcell Archives
  Order Free Xcell Journal
  Comments & Suggestions
  Write Articles for Xcell

    

Home : Documentation : Xcell Journal Online : Article
Increase Image Processing System Performance with FPGAs



by Richard Williams, Senior Engineer, Hunt Engineering (UK) Ltd.
sales@hunteng.co.uk (3/1/04)

Using FPGAs instead of DSPs to perform common image-processing functions can offer a wide range of benefits.

article link to PDF
Article PDF 270 KB


The main goal of image processing is to create systems that can scan objects and make judgments on those objects at rates many times faster than human observers. When creating an image processing system, the first step is to identify the imaging functions that allow the computer to behave like a trained human operator. Once you’ve accomplished that, you can then concentrate on making that system run faster by finding – and removing – the biggest performance bottleneck.

For most complex imaging systems, the biggest bottleneck is the time taken to process each image captured. As a simple solution, you could use more advanced processors to implement the algorithms – the faster the processor, the faster the production line. Alternatively, you could use dedicated hardware built specially for the job, although that can be very expensive. The most innovative solution is to use programmable electronics in the form of field programmable gate arrays.

Real-World Application
One of our customers, Visiglas SA, uses DSP-based boards to inspect glass containers. The systems are successfully installed all over the world, inspecting hundreds of objects per minute. Figure 1 shows some of the image processing used in these systems.

For their next-generation systems, Visiglas would like to:

  • Improve fault detection by using higher resolution images
  • Increase system throughput by processing larger images faster than the current systems allow.
Hunt Engineering has been able to achieve these requirements through the use of Virtex-II™ FPGAs.

The Mathematics of Image Processing
Image processing typically involves applying the same repetitive function to each pixel in the image to create a new output image. We can categorize the techniques involved into three types:

  1. Where one fixed-coefficient operation is performed identically on each pixel in the image.
  2. Where there are two input images rather than one. In this type of operation, the mathematics performed may be the same as for the fixed coefficients, but now the operation is based on the position of the pixel in the image.
  3. Neighborhood processing, or convolution. There is only one input frame, and the result created for each pixel location is related to a window of pixels centered at that location.
So although the exact mathematical operation may vary, all three techniques require repetitive functions to be performed across the entire image. Thus, this kind of processing is ideally suited to a hardware pipeline that can perform fixed mathematical operations over and over on a stream of data.

DSPs versus FPGAs
DSPs typically must execute several instructions to perform an image processing function. Because it is a sequential device, these instructions will probably take several processor clock cycles to complete. Add to that the cycles needed to fetch the image data, store the results, and handle interrupts, and you have a large number of clock cycles needed to process each pixel.

Because the majority of image processing can be broken down into highly repetitive tasks, FPGAs present a very interesting alternative to DSPs. Additionally, you can use FPGAs to perform lots of steps in parallel, using dedicated logic for each step.

Through the use of Virtex-II FPGAs, we can implement image-processing tasks at very high data rates, reaching hundreds of megahertz. These functions can be directly performed on a stream of camera data as it arrives without introducing extra processing delays -- significantly reducing and sometimes removing performance bottlenecks.

In particular, you can map more complex functions such as convolution very successfully to FPGAs. When convolving an image, a window of pixels is treated with a mask, where individual locations in the window are "weighted" according to a set of previously defined coefficients. For each position of the window, all pixels are multiplied against their respective coefficients. The final result is then scaled to produce a single output pixel for the center location of the window.

In essence, the whole convolution process is a matrix-multiplication, and as such requires several multiplications for each pixel. The exact number of multipliers required is dependent on the size of window used for convolution. For example, a 3x3 kernel (window) requires nine multipliers; a 5x5 kernel requires 25 multipliers.

Conventional DSPs have a fixed number of multiplication units inside the processor core ¨C fewer multiplier units than what are needed to perform the matrix multiplication in one step. Thus, a DSP would introduce a performance drop by reusing multiplier units to complete the matrix multiplication.

FPGAs, however, can implement as many multipliers as necessary to calculate one pixel at the full input data rate, whether the convolution uses a 3x3 kernel or a larger 5x5. With the one-million-gate Virtex-II, 40 multipliers are available; in the eight-million-gate version, this number increases to 168. By mapping convolution to FPGAs that already provide dedicated multipliers among their sea of gates, it becomes easy to build a processing pipeline that can convolve at very high data rates.

A Role for the DSP?
Although a large proportion of image processing algorithms are simply highly repetitive processes, there is still a role for the DSP. In a system that can benefit from the performance advantages of FPGAs, there is a point in the data flow where a decision has to be made. This decision will often take the form of "if, then, else" logic rather than a pixel-by-pixel iteration.

For control loops and complex branches in operation, DSPs can still prove to be highly effective. Implementing equivalent logic in FPGAs can quickly eat up the available gates and reduce the overall data rate.

A simple solution is to use both types of resources in a single system: a high-datarate FPGA as the data-reducing engine, feeding results downstream to a DSP as the accept/reject, pass/fail decision maker.

Image Acquisition and Processing with HERON
The HERON module range from Hunt Engineering provides a flexible, high-performance solution to image processing. HERON-FPGA modules, which include the Virtex-II series of FPGAs, present resource nodes that are suited to a wide range of tasks, particularly the repetitive tasks of image processing.

These FPGA modules can also be directly connected to cameras, accepting data in formats such as Camera Link and RS422. Combine that with HERON processor modules based around Texas Instruments’™ TMS320C6000 DSP series, and a complete imaging solution becomes possible.

In addition to the hardware resources required at the heart of the system, firmware and software are also necessary to implement the appropriate algorithms in the FPGA and DSP. Hunt Engineering offers imaging libraries for both DSPs (in C) and FPGAs (in VHDL), downloadable from www.hunteng.co.uk. These libraries enable you to quickly and easily assemble the key algorithm components into a working imaging system.

Memory Requirements
If you use the multi-frame operations provided in our VHDL imaging libraries (such as the addition of two images), you must have an area of available memory that can store an entire frame.

Unless the size of one frame is very small, the FPGA’s internal RAM resources will be insufficient for this type of operation. In this situation, you could use a module like the HERON-FPGA5 (Figure 2). The reference image is stored in SDRAM external to the FPGA and read into the FPGA as required.

Because separate dedicated logic is used to receive the incoming image, access the stored image, and perform the processing, image processing can still be performed at pixel rates greater than 100 megapixels/sec. With a processor-based approach, the processor has to access both images from memory, and these operations will be slower than pixel-based operations when using a DSP.

Neighborhood processing, on the other hand, requires several lines of image data to be stored before processing can begin. The image size determines the amount of storage required per line, and the kernel size of the operation determines the number of lines. It’s possible to use the FPGA’s internal block RAM for this storage, but the amount available depends on the size of the FPGA and the design requirements.

For example, a one-million-gate Virtex-II FPGA has 90 Kb of block RAM. If nothing else in the design requires block RAM, then the convolution can use all 90 Kb. With 8-bit monochrome data, you can store 90 Kpixels. If the image is 2K pixels per line, then 45 lines of data is more than enough for a large convolution function.

If the FPGA design uses block RAM for other functions, using hardware like the HERON-FPGA5 enables you to store the image in off-chip SDRAM.

Conclusion
Many key imaging functions break down into highly repetitive tasks that are well suited to modern FPGAs, especially those with built-in hardware multipliers and onchip RAM. The remaining tasks require hardware more suited to control flows and decision making, such as DSPs.

To gain the full benefit of both approaches, systems can effectively combine both FPGAs and DSPs. With the addition of standard imaging functions written in either VHDL or C, all of the key building blocks are available to create an image processing system. Hunt Engineering has developed a demonstration framework of such a system, shown in Figure 3.

For our customer Visiglas, a system such as the one shown in Figure 4 allows them to achieve their performance goals.

The next logical step is an addition to the HERON module range of devices for Virtex-II Pro™ FPGAs. With a PowerPC™ processor core, a sea of gates, built-in multipliers, and on-chip RAM, a self-contained high-performance imaging solution becomes possible in a single chip.

Printable PDF version of this article with graphics. PDF logo (3/1/04) 270 KB

 
Jobs Events Webcasts News Investors Feedback Legal Privacy Trademarks Sitemap
© 1994-2008 Xilinx, Inc. All Rights Reserved.