|
The main goal of image processing is to
create systems that can scan objects and
make judgments on those objects at rates
many times faster than human observers.
When creating an image processing system,
the first step is to identify the imaging
functions that allow the computer to
behave like a trained human operator.
Once you’ve accomplished that, you can
then concentrate on making that system
run faster by finding – and removing – the
biggest performance bottleneck.
For most complex imaging systems, the
biggest bottleneck is the time taken to
process each image captured. As a simple
solution, you could use more advanced
processors to implement the algorithms –
the faster the processor, the faster the production
line. Alternatively, you could use
dedicated hardware built specially for the
job, although that can be very expensive.
The most innovative solution is to use programmable
electronics in the form of field
programmable gate arrays.
Real-World Application
One of our customers, Visiglas SA, uses
DSP-based boards to inspect glass containers.
The systems are successfully installed all
over the world, inspecting hundreds of
objects per minute. Figure 1 shows some of
the image processing used in these systems.
For their next-generation systems,
Visiglas would like to:
- Improve fault detection by using
higher resolution images
- Increase system throughput by
processing larger images faster than
the current systems allow.
Hunt Engineering has been able to
achieve these requirements through the use
of Virtex-II™ FPGAs.
The Mathematics of Image Processing
Image processing typically involves applying
the same repetitive function to each
pixel in the image to create a new output
image. We can categorize the techniques
involved into three types:
-
- Where one fixed-coefficient operation
is performed identically on each pixel
in the image.
- Where there are two input images
rather than one. In this type of operation,
the mathematics performed may
be the same as for the fixed coefficients,
but now the operation is based on the
position of the pixel in the image.
- Neighborhood processing, or convolution.
There is only one input frame,
and the result created for each pixel
location is related to a window of pixels
centered at that location.
So although the exact mathematical operation
may vary, all three techniques require
repetitive functions to be performed across
the entire image. Thus, this kind of processing
is ideally suited to a hardware pipeline
that can perform fixed mathematical operations
over and over on a stream of data.
DSPs versus FPGAs
DSPs typically must execute several
instructions to perform an image processing
function. Because it is a sequential
device, these instructions will probably
take several processor clock cycles to complete.
Add to that the cycles needed to
fetch the image data, store the results, and
handle interrupts, and you have a large
number of clock cycles needed to process
each pixel.
Because the majority of image processing
can be broken down into highly repetitive
tasks, FPGAs present a very interesting alternative
to DSPs. Additionally, you can use
FPGAs to perform lots of steps in parallel,
using dedicated logic for each step.
Through the use of Virtex-II FPGAs, we
can implement image-processing tasks at
very high data rates, reaching hundreds of
megahertz. These functions can be directly
performed on a stream of camera data as it
arrives without introducing extra processing delays -- significantly reducing and sometimes
removing performance bottlenecks.
In particular, you can map more complex
functions such as convolution very
successfully to FPGAs. When convolving
an image, a window of pixels is treated with
a mask, where individual locations in the
window are "weighted" according to a set
of previously defined coefficients. For each
position of the window, all pixels are multiplied
against their respective coefficients.
The final result is then scaled to produce a
single output pixel for the center location
of the window.
In essence, the whole convolution
process is a matrix-multiplication, and as
such requires several multiplications for
each pixel. The exact number of multipliers
required is dependent on the size of window
used for convolution. For example, a
3x3 kernel (window) requires nine multipliers;
a 5x5 kernel requires 25 multipliers.
Conventional DSPs have a fixed number
of multiplication units inside the
processor core ¨C fewer multiplier units than
what are needed to perform the matrix
multiplication in one step. Thus, a DSP
would introduce a performance drop by
reusing multiplier units to complete the
matrix multiplication.
FPGAs, however, can implement as many
multipliers as necessary to calculate one pixel
at the full input data rate, whether the convolution
uses a 3x3 kernel or a larger 5x5. With
the one-million-gate Virtex-II, 40 multipliers
are available; in the eight-million-gate version,
this number increases to 168. By mapping
convolution to FPGAs
that already provide dedicated
multipliers among
their sea of gates, it becomes
easy to build a processing
pipeline that can convolve
at very high data rates.
A Role for the DSP?
Although a large proportion
of image processing
algorithms are simply highly
repetitive processes, there
is still a role for the DSP. In
a system that can benefit
from the performance
advantages of FPGAs, there
is a point in the data flow
where a decision has to be
made. This decision will
often take the form of "if,
then, else" logic rather than
a pixel-by-pixel iteration.
For control loops and
complex branches in operation,
DSPs can still prove to be highly effective.
Implementing equivalent logic in
FPGAs can quickly eat up the available gates
and reduce the overall data rate.
A simple solution is to use both types of
resources in a single system: a high-datarate
FPGA as the data-reducing engine,
feeding results downstream to a DSP as the
accept/reject, pass/fail decision maker.
Image Acquisition and Processing with HERON
The HERON module range from Hunt
Engineering provides a flexible, high-performance
solution to image processing.
HERON-FPGA modules, which include
the Virtex-II series of FPGAs, present
resource nodes that are suited to a wide
range of tasks, particularly the repetitive
tasks of image processing.
These FPGA modules
can also be directly connected
to cameras, accepting
data in formats such
as Camera Link and
RS422. Combine that
with HERON processor
modules based around
Texas Instruments’™
TMS320C6000 DSP
series, and a complete
imaging solution becomes
possible.
In addition to the hardware
resources required at
the heart of the system,
firmware and software are
also necessary to implement
the appropriate algorithms
in the FPGA and
DSP. Hunt Engineering offers imaging
libraries for both DSPs (in C) and FPGAs (in
VHDL), downloadable from www.hunteng.co.uk. These libraries enable you to quickly
and easily assemble the key algorithm components
into a working imaging system.
Memory Requirements
If you use the multi-frame operations provided
in our VHDL imaging libraries (such
as the addition of two images), you must
have an area of available memory that can
store an entire frame.
Unless the size of one frame is very
small, the FPGA’s internal RAM resources
will be insufficient for this type of operation.
In this situation, you could use a module
like the HERON-FPGA5 (Figure 2).
The reference image is stored in SDRAM
external to the FPGA and read into the
FPGA as required.
Because separate dedicated logic is
used to receive the incoming image,
access the stored image, and perform the
processing, image processing can still be
performed at pixel rates greater than 100
megapixels/sec. With a processor-based
approach, the processor has to access both
images from memory, and these operations
will be slower than pixel-based
operations when using a DSP.
Neighborhood processing, on the other
hand, requires several lines of image data to
be stored before processing can begin. The
image size determines the amount of storage
required per line, and the kernel size of
the operation determines
the number of lines. It’s
possible to use the FPGA’s
internal block RAM for
this storage, but the
amount available depends
on the size of the FPGA
and the design requirements.
For example, a one-million-gate Virtex-II
FPGA has 90 Kb of
block RAM. If nothing
else in the design
requires block RAM,
then the convolution can use all 90
Kb. With 8-bit monochrome data,
you can store 90 Kpixels. If the image
is 2K pixels per line, then 45 lines of
data is more than enough for a large
convolution function.
If the FPGA design uses block RAM
for other functions, using hardware like
the HERON-FPGA5 enables you to
store the image in off-chip SDRAM.
Conclusion
Many key imaging functions break down
into highly repetitive tasks that are well
suited to modern FPGAs, especially those
with built-in hardware multipliers and onchip
RAM. The remaining tasks require hardware
more suited to control flows and
decision making, such as DSPs.
To gain the full benefit of both
approaches, systems can effectively combine
both FPGAs and DSPs. With the addition
of standard imaging functions written in
either VHDL or C, all of the key building
blocks are available to create an image processing
system. Hunt Engineering has
developed a demonstration framework of
such a system, shown in Figure 3.
For our customer Visiglas, a system such
as the one shown in Figure 4 allows them
to achieve their performance goals.
The next logical step is an addition to the
HERON module range of devices for Virtex-II Pro™ FPGAs. With a PowerPC™
processor core, a sea of gates, built-in multipliers,
and on-chip RAM, a self-contained
high-performance imaging solution becomes
possible in a single chip.
Printable PDF version of this article with graphics. (3/1/04) 270 KB |