|
Converting image processing algorithms to
FPGA implementations can be tedious. The
algorithm may be proven in software but
with no direct link to actual implementation.
Additionally, it can be difficult to subjectively
verify the implementation.
Using a mathematical simulator to verify
and create HDL implementation files bridges
the gap from the algorithm architect to the
FPGA engineer. Xilinx® System Generator
for DSP allows for high-level mathematical
verification and converts the heart of the
algorithm into ready-to-use HDL.
Simulation inside of The MathWorks
Simulink® tool enables you to easily verify
the image algorithm qualitatively and subjectively
when used with the Image Processing
Toolbox, also from The MathWorks. Using
System Generator to develop and implement
image processing algorithms allows for a thoroughly
verified and easily executed design;
plus, you save time on subjective analysis of
the HDL. The high-level block diagram
allows for easy communication between team
members, resulting in less time spent crossing
skill boundaries when determining implementation
trade-offs.
The Basics
The Image Processing Toolbox is an excellent
starting point with which to develop
image processing algorithms for FPGAs. It
allows you to easily load and view many
image types. Although not directly usable
within System Generator, it can also
rotate, resize, filter, convert to and from
the frequency domain, and operate on the
image mathematically and morphologically.
You can use these latter functions as a
qualitative measure against the actual
System Generator implementation.
Exchanging Data
Images are often stored and manipulated
as two-dimensional arrays that can be
quite large. For example, a 1,024 x 1,024 x
8-bits-per-pixel image is 1 MB. Typically, a
portion of the image is moved into the
FPGA one pixel at a time and aggregated
into a larger unit (often a line), with
manipulations performed on these points.
An example pre-processing MATLAB®
m-file script might contain:
- Reading in a source test image using
the IMREAD function in the Image
Processing Toolbox.
- Analyzing variables such as width,
height, and color depth of the image to
pass as arguments to Xilinx block set
tokens. This enables easy parameterization
and scaling of the application.
- Storage and creation of other variables
necessary to the application. Examples
include rotation angle, resizing percentages,
and bit precision within the
algorithm.
- Converting the matrix data from an m
x n array to a 1 x (m*n) with “for
loops” and concatenation. This allows
The MathWorks DSP block set “Signal
From Workspace” token to pass elements
as samples to the Xilinx block set
“Gateway In” token.
- Viewing the source test image for later
subjective analysis using the IMSHOW
function in the Image Processing
Toolbox.
An example post-processing m-file
might contain:
- Conversion of “ToWorkspace” variables
from a 1 x (p*q) array to a p x q array for
easy manipulation by using “for loops.”
- Displaying the resulting matrices using
IMSHOW for subjective analysis.
- Computing qualitative analysis of the
results versus the original image or an
algorithm developed with the Image
Processing Toolbox.
Using M-Files
You can use m-files to model the algorithm
before implementation with System
Generator. Although this step is optional, it
can be a highly effective aid in the slight
paradigm shift from matrix operations in
software to raster scan operations in a highly
parallel architecture.
Some of the key benefits include:
- Deconstruction proofs of Image
Processing Toolbox functions as an aid
to understand the algorithm.
- Creation of intermediate variables to
assist in debugging the System
Generator version of the design. Try to
set variables similar to how they will
appear in the design. Two-dimensional
matrices should be used as memory,
while raster order works better between
blocks.
- Making algorithm trade-offs. Ask yourself
if you should place the bi-linear
interpolation stage before or after the
sending of data to memory.
- A fast proof that the algorithm is on
the right track.
- Qualitative analysis from the deconstruction
m-file to the corresponding
Image Processing Toolbox function.
Design Considerations – An Example
Real-time image rotation has many challenges
when going from the algorithm
architect at a high level to the FPGA engineer
at the HDL and board level. The algorithm
level choices that you will make
about how to implement the design in the
FPGA will affect device utilization.
Conversely, board and system requirements
may place limitations on your design, such
as number, width, and type of frame buffer
memories.
Design Choices
A key decision about the architectural style
of the rotation algorithm is quality. For
example, when moving the pixels of the
original image grid to fractional locations
on the rotated image grid, is a nearest
neighbor selection adequate? Should you
perform bi-linear interpolation on the
image to reduce shear (the tendency to see
discontinuities along object edges)? Is
bicubic interpolation (determining the
new pixel value based on weighting and
summing its closest 16 neighbors) the preferred
method? Your choice of quality will
impact resources in both the FPGA and
the frame buffer.
Our example is based upon the Hotelling
transform to determine the original image
versus the rotated image pixel
addresses. You have two choices as to
where the rotation engine occurs
with respect to frame buffering:
- Place the rotation engine after
the frame buffer. This has the
effect of sequential raster scan
address writes into the frame
buffer and allows for raster
scan format of the output data
by reading from the frame
buffer in non-sequential
address form. The Hotelling
transform with basis vectors
cos(t) and sin(t) for rotation
angle (t), {Sx,Sy} representing
source x and y coordinates, and
{Dx,Dy} representing destination
x and y coordinates in the
2-D image are:
Sx = Dx * cos(t) + Dy * sin(t)
Sy = Dy * cos(t) – Dx * sin(t)
- Place the rotation engine before the
frame buffer. In effect, this method
predetermines and weights the values
stored through non-sequential
addressing into the frame buffer.
The output data is read in raster scan
format from the frame buffer. The
Hotelling transforms with similar
representation as above are:
Dx = Sx * cos(t) – Sy * sin(t)
Dy = Sy * cos(t) + Sx * sin(t)
In their paper, “Real Time Image
Rotation and Resizing, Algorithms and
Implementations” (www.xilinx.com/products/logicore/dsp/rotation_resize.pdf), my Xilinx
colleagues Robert Turney and Chris Dick
showed that this approach is more economical
in terms of memory bandwidth.
Because the data is stored to the frame
buffer in non-sequential order, you must
take care to ensure that all valid pixel locations
are written to.
Also, if the rotation angle is changed,
you must clear artifacts in the corners from
the previous frame from the frame buffer. A
nearest neighbor rotation of 45 degrees
clockwise demonstrates the void artifacts,
as shown in Figure 1, that occur due to
multiple input pixels being written to the
same frame buffer address.
To avoid a complex control path and
concentrate mostly on the data path, our
example goes with the first choice previously
described (place the rotation engine
after the frame buffer). The net effect will
be an increase in memory bandwidth of 4
or 16 times for bi-linear and bi-cubic interpolation,
respectively. The choice of how to
deal with the increase in bandwidth
requirements is design- and memory-dependent.
You can increase the number of read
accesses from the memory by a factor of
four to minimize the total amount of memory
for bi-linear interpolation. This will
decrease the overall incoming pixel rate by
a factor of four unless the memory access
speed is sufficient to handle the offset.
Alternatively, you can store the four pixels
necessary for bi-linear interpolation at the
same address location, thus increasing the
total amount of memory by a factor of
four. This methodology can be mitigated
based on pixel bit width – that is, 8-bit-wide
pixels can use this approach to fit into
a 32-bit-wide memory footprint.
Design Implementation with System Generator
Our example will implement image rotation
with the following assumptions:
- Input image size and pixel widths are
known at the time of implementation
- Use the Hotelling transform of the
form:
Sx = Dx * cos(t) + Dy * sin(t)
Sy = Dy * cos(t) – Dx * sin(t)
- Bi-linear interpolation
- Increased memory bandwidth is
accounted for by reading the data out
of the frame buffer four times faster
than the incoming pixel rate
Frame Buffer Read Address Generation
Because the data will be output in raster
scan format, you can create the destination
address with two counters representing the
two dimensions. These values must be center-
adjusted around zero before being
transformed. At this point, any adjustment
changes the center of rotation for the output
image.
The following two examples demonstrate
manipulating the output image by
using minimal extra logic:
- Two adders are all that are required to
add pan control for the output image.
- Using two multipliers, you can scale
the destination addresses to zoom in or
out from the center point.
As interpolation occurs at a later stage,
zooming and panning at this point will
result in an output image with each pixel
uniquely interpolated, versus a simple copy
or additional interpolation procedure later.
Figure 2 demonstrates a 45-degree counterclockwise
rotation with 8x zoom.
Once the destination address is correct
for center, pan, and zoom, the transform is
applied. The resulting addresses are reversecorrected
for the centering function. The
numbers are of the form integer with a decimal
portion. The integer portions are the
required source address locations to read
from the frame buffer. The decimal portions
are used to create weighting factors
used for the bi-linear interpolation.
Weighting the Data
Four pixels are fetched from the frame
buffer to create a 2 x 2 matrix containing
the source address pixel and its original
neighbors to the right, below, and below-right,
which we will identify as XY, XpY,
XYp, and XpYp, respectively. The source
address transform uses the decimal portions
of {Sx,Sy} represented as {Rx,Ry} to create
the following weighting equations:
- Weighting(XY) = Wxy = (1 – Rx) (1–Ry)
- Weighting(XpY) = Wxpy = (Rx) (1 –Ry)
- Weighting(XYp) = Wxyp = (1 – Rx) (Ry)
- Weighting(XpYp) = Wxpyp = (Rx) (Ry)
These equations can be shown to be
equal to the following equations to reduce
the number of multipliers necessary from
four to one:
- Wxy = (1 – Rx) – Wxyp
- Wxpy = (Rx) – Wxpyp
- Wxyp = (Ry) – Wxpyp
- Wxpyp = (Rx) (Ry)
The interpolated pixel value is the summation
of the 2 x 2 matrix pixels multiplied
by their respective weighting functions.
Compare the original image in Figure 3
with an image that was rotated through 7.5
degrees, followed by a -7.5 degree rotation
in Figure 4.
Conclusion
In this article, I’ve shown how to use System
Generator to explore and implement image
processing algorithms.
The use of a mathematical simulation
tool with image handling capabilities allows
you to easily investigate various options in
an intuitive capacity. Although this example
shows an image rotation implementation,
these same methodologies can help you
develop any image processing algorithm.
If you have any questions or suggestions,
call me, Daniel Michek, at (858)
431-5901, or send me an e-mail at
daniel.michek@xilinx.com.
Printable PDF version of this article with graphics. (10/15/04) 255 KB |