|
Visually guided tele-operation is becoming
ubiquitous in a variety of fields, including
medicine, defense, and industry. A key
requirement is low latency – there should
be minimum delay between capturing
video at the sensor and displaying it at the
remote viewer. With training, people can
get used to as much as a half-second of
delay, but often the result is vehicle oscillation,
as the operator over-corrects controls
without having the intuitive
immediate feedback.
Titan Corporation’s Advanced
Products and Design Division works with
aerospace defense primary contractors,
who provide unmanned aerial vehicles
(UAVs) to the DoD. Figure 1 shows a
General Atomics™ Predator UAV, and
Figure 2 shows a ground station used for
UAV remote control.
In surveillance missions, MPEG-2
encoded video from a pan-tilt-zoom
(PTZ) camera mounted on the UAV is
transmitted to a ground station. There the
imagery is presented on a console to the
operators. For the most effective control
of the camera and vehicle, we had to
reduce delay through the MPEG-2
decoder to less than 75 ms.
To accomplish this task, we used our
commercial off-the-shelf multimedia video
processing board, the VigraWATCH™.
VigraWATCH (VW) is equipped with a
Xilinx® Virtex-II™ FPGA and an IBM™
PowerPC™ 440GP (PPC) processor. This
provides more than enough processing
power to easily implement a customized
MPEG-2 I-frame decoder, which far surpasses
the minimum latency requirement.
With the overhead of circuit board
development and a basic software framework
in place, and by taking advantage of
IP included in the Xilinx development
toolset, we were able to get the job done in
four months.
MPEG-2
MPEG-2 is a widely used video compression
standard rich with diverse encoding
methods. Its diversity includes three distinct
techniques for coding individual
video frames as either intra (I-frames), predicted
(P-frames), or bi-directionally interpolated
(B-frames). P-frames and B-frames
introduce additional latency, both encoding
and decoding. To cut latency to the
absolute minimum, we used only I-frame
encoding and decoding. Intra-frame
encoding consists of a pipelined set of
functions.
Basics
The three MPEG-2 coding methods are:
- I-frame – Intra-frame encoding is
based solely on information within a
single frame. Furthermore, the I-frame
encoding and decoding process may
begin as soon as the first 16 lines of a
frame are received.
- P-frame – Predictive encoding uses a
previous frame and encodes only the
differences between that frame and the
current frame to be encoded.
- B-frame – Bidirectionally encoded
frames use both a previous (I or P)
frame and a future (I or P) frame,
forming a “best match” interpolation
between those two frames and the current
frame and encoding the resulting
differences.
B-Frames Impede Low
Latency and P-Frames Don’t Help
Because B-frames use a future frame to
encode the current frame, B-frame encoding
and decoding impose a delay; the
encoder (or decoder) must wait for the
future frame to arrive before coding the
current frame. Thus, B-frames must be
tossed in the quest for minimum latency.
\
The P-frame’s principal contribution to
MPEG-2 is in improving compression
ratios, as they are smaller than I-frames. A
greater compression ratio means reduced
transmission bandwidth. However, because
low latency is the primary concern, the
bandwidth needs to be enough to accommodate
I-frames without buffering delays.
We also had another latency issue – development
time. Thus, we developed an Iframe-
only decoder.
(Without P- and B-frames, MPEG-2
video becomes essentially the same as motion
JPEG. In this case, we were constrained to
MPEG because that was the source format.)
The VigraWATCH Video Processor
The VW platform allows you to rapidly
develop high-performance audio, video,
and image processing functions using the
XC2V3000, the microprocessor, or both.
Figure 3 illustrates the VW’s primary components,
peripherals, and available I/O.
Primary Components
The VW contains five large ICs: an IBM
PPC440GP, a Xilinx XC2V3000 FPGA,
two Cirrus Logic™ MPEG-2 codecs, and
a PCI-PCI bus bridge.
The PPC provides general-purpose processing.
It has a dual-issue superscalar
RISC core with 64-way associative I- and
D-caches. It also manages PCI, RS-232,
IIC, and Ethernet I/O. Chain-controlled
DMA units are available in the PPC for
moving data between the PCI bus, the
external peripheral bus (EPB), the PPC
DRAM, and I/O registers.
The FPGA handles raw (uncompressed)
audio, raw video, and raw and compressed
I/O for the MPEG codecs. Part of the
FPGA fabric is dedicated to video generators
and mixers, I/O multiplexers,
standard video processing such as scaling,
and RAM interfaces. About 10%
of the XC2V3000 is dedicated to a
basic 2-D graphics engine. You can
use the remainder of the FPGA fabric
for custom processing functions. The
default FPGA internal clock is 100
MHz, which matches the clock used
for the DRAMs.
Each of the two MPEG-2 codecs is
capable of encoding or decoding elementary
streams. They are independent
of each other. For example, in a
video application where the raw video
is enhanced by the FPGA, you can
compress both the original and the
enhanced video. In a communications
scenario, one codec may be compressing
local video for transmission, while
the other is de-compressing remote
video. Or you can use the two codecs
to decompress video from two distinct
remote sources.
The PCI-PCI bridge allows you to
install VW in either 3.3V PCI or 5V
PCI systems (the PPC is not 5V I/O
tolerant).
Peripherals
Inputs to the VW FPGA include:
- A stereo audio digitizer
- A video digitizer/decoder
The video decoder accepts standard-definition
NTSC and PAL format analog
video from one of four composite sources
or one of two S-video sources.
Outputs from the VW FPGA include:
- Two SVGA DACs, capable of driving
independent displays
- An audio DAC producing standard
line-level stereo audio output
The two DRAM banks attached to the
FPGA are independent; each is capable of
1.6 Gbps peak bandwidth. One is associated
with the graphics engine in the FPGA;
the other is typically used by video processing
functions.
Digital I/O connectors 1 and 2 each
support 22 bi-directional LVTTL signals,
as well as a few auxiliary connections.
You can use the digital I/O to connect
another board directly to the FPGA, or
to connect two VW boards together.
Digital I/O connector 3 has 16 LVTTL
pins and can be used for a video interface
port or as a convenient place to bring out
de-bugging test points.
Software
The PPC runs MontaVista™ Linux™, an
embedded Linux supporting real-time
functionality, multi-processes, and multithreading.
You can operate VW standalone,
independent of any host computer,
or as an add-in board driven by a host system.
On Sun™ Solaris™-, Microsoft™
Windows™-, Wind River Systems™
VxWorks™-, or Linux-based host systems,
graphic drivers allow VW to function as
primary or secondary display. An API provides
control of basic VW functions.
Building the I-Frame Decoder
We had a “clean room” software
decoder developed from the MPEG
specification available in-house at the
start of the project. We partitioned the
I-frame decoding functions into modules
and did software profiling and
hardware simulation to determine how
to distribute the modules across the
FPGA hardware and PPC software.
Integration with
VW FPGA Internals
We connected the I-frame decoder
inside the FPGA as a standard video
input. Figure 4 shows a portion of the
VW FPGA internals and how the
decoder’s two ports connect to the
pre-existing circuitry. The EPB port
carries encoded data, tables, and control
register setup data from the PPC.
The CCIR-656 video out port connects
to a video multiplexer that
selects between all of the video inputs.
This allows us to re-use the existing
design’s video storage circuitry to
move frame data into video memory,
and ultimately to the display. Because
the I-frame is processed sequentially,
we can use internal block RAM to
assemble macro blocks; a port to connect
to external RAM is not required.
Decoding Modules
The pipeline layout of the decoder is
shown in Figure 5. Input on the left is fed
by the PPC. Output on the right is CCIR-656 format 4:2:2 YCbCr 8-bit video. This
format matches the output from the VW
peripheral analog video decoders. The layout
was designed to allow progressive
incremental design, integration, and testing
of the modules.
The input buffer uses a 512-deep x 32-bit-wide FIFO to receive all data from the
PPC. This FIFO allows the relatively slow
66 MHz EPB bus to operate at full speed,
without having to implement low-level
hardware handshakes. A high-level handshake
is implemented by making the FIFO’s
fill level available for read-back by the PPC.
The PPC core can keep track internally
of the FIFO fill level and make decisions as
to whether to work on filling the FIFO or
perform other useful functions. The input
buffer also contains an auto-incrementing
register used to generate indirect addresses
for rapidly filling tables in other modules,
to keep the decoder’s I/O address range on
the EPB bus small.
The variable length (VL) decoder
decodes the Huffman-encoded block coefficients
according to MPEG-2 tables B-14
and B-15. State machines to traverse the
Huffman code trees and a look-up table to
extract run/level value pairs from the leafs
both fit into a single Virtex-II block RAM
configured as 1K deep x 16 bits wide.
We used some extra FPGA fabric for
shift registers to handle escape codes for
run/level values not included in the
Huffman code tables. The ISDSM block
handles the functions of inverting zigzag
scanning, dequantization, and scaling.
The iDCT was the easiest block to
design: it is included as a standard core in the
Xilinx ISE CORE Generator™ package.
The format converter assembles the Y,
Cb, and Cr sample blocks into slices in a
slice-assembly RAM buffer comprising 16
block RAMs. The slices are then scanned
out line by line and the lines are wrapped
in CCIR-656 start and end active video
(SAV/EAV) marker codes. We used an
address rotation technique so new blocks
can be assembled in the buffer as soon as a
single line is removed, allowing the pipeline
to run continuously without having to
double-buffer the slice assembly RAM.
Results
The original unoptimized MPEG-2 codec
chip external to the FPGA had a latency of
~1800 ms. Working with the codec chip
manufacturer, we reduced their latency to
45 ms. The I-frame decoder we developed
using the Xilinx FPGA and PPC has a
latency of less than 2 ms.
Conclusion
We saved a lot of time and effort using prebuilt
boards and IP in the development
process. If we had to develop the board, all
of the associated software and all of the IP
that went into the low-latency decoder and
display system would have taken years
instead of months.
You can rapidly develop other video
processing functions, including:
- Other codecs – H.264, MPEG-4,
Motion JPEG2000
- Enhancement – linear and non-linear
filters, super-resolution, histogram
equalization/specification, de-convolution,
warping
- Stabilization and mosaicing
For more information on MPEG-2, read
the book, “MPEG Video Compression
Standard,” edited by Joan L. Mitchell et al.
And for more information on the
VigraWATCH system, visit www.titan.com,
Printable PDF version of this article with graphics. (10/15/04) 300 KB |