Xcell Journal Online
  Xcell Journal Archives
   
  Writing for Xcell
  Advertising in Xcell
  FREE Subscription
   
  Partner Yellow Pages
  Reference Pages
  Contact Us

    

Home : Documentation : Xcell Journal Online : Article
Build Custom Real-Time Video Applications Quickly and Easily



by John L. Smith, Principal Engineer, Titan Corp., AP&D Division
john.l.smith@titan.com (10/15/04)


You can use a board equipped with all required video I/O and a Virtex-II FPGA to rapidly develop custom video processing functions.
article link to PDF
Article PDF 300 KB


Visually guided tele-operation is becoming ubiquitous in a variety of fields, including medicine, defense, and industry. A key requirement is low latency – there should be minimum delay between capturing video at the sensor and displaying it at the remote viewer. With training, people can get used to as much as a half-second of delay, but often the result is vehicle oscillation, as the operator over-corrects controls without having the intuitive immediate feedback.

Titan Corporation’s Advanced Products and Design Division works with aerospace defense primary contractors, who provide unmanned aerial vehicles (UAVs) to the DoD. Figure 1 shows a General Atomics™ Predator UAV, and Figure 2 shows a ground station used for UAV remote control.

In surveillance missions, MPEG-2 encoded video from a pan-tilt-zoom (PTZ) camera mounted on the UAV is transmitted to a ground station. There the imagery is presented on a console to the operators. For the most effective control of the camera and vehicle, we had to reduce delay through the MPEG-2 decoder to less than 75 ms.

To accomplish this task, we used our commercial off-the-shelf multimedia video processing board, the VigraWATCH™. VigraWATCH (VW) is equipped with a Xilinx® Virtex-II™ FPGA and an IBM™ PowerPC™ 440GP (PPC) processor. This provides more than enough processing power to easily implement a customized MPEG-2 I-frame decoder, which far surpasses the minimum latency requirement.

With the overhead of circuit board development and a basic software framework in place, and by taking advantage of IP included in the Xilinx development toolset, we were able to get the job done in four months.

MPEG-2
MPEG-2 is a widely used video compression standard rich with diverse encoding methods. Its diversity includes three distinct techniques for coding individual video frames as either intra (I-frames), predicted (P-frames), or bi-directionally interpolated (B-frames). P-frames and B-frames introduce additional latency, both encoding and decoding. To cut latency to the absolute minimum, we used only I-frame encoding and decoding. Intra-frame encoding consists of a pipelined set of functions.

Basics
The three MPEG-2 coding methods are:

  • I-frame – Intra-frame encoding is based solely on information within a single frame. Furthermore, the I-frame encoding and decoding process may begin as soon as the first 16 lines of a frame are received.
  • P-frame – Predictive encoding uses a previous frame and encodes only the differences between that frame and the current frame to be encoded.
  • B-frame – Bidirectionally encoded frames use both a previous (I or P) frame and a future (I or P) frame, forming a “best match” interpolation between those two frames and the current frame and encoding the resulting differences.
B-Frames Impede Low Latency and P-Frames Don’t Help
Because B-frames use a future frame to encode the current frame, B-frame encoding and decoding impose a delay; the encoder (or decoder) must wait for the future frame to arrive before coding the current frame. Thus, B-frames must be tossed in the quest for minimum latency. \

The P-frame’s principal contribution to MPEG-2 is in improving compression ratios, as they are smaller than I-frames. A greater compression ratio means reduced transmission bandwidth. However, because low latency is the primary concern, the bandwidth needs to be enough to accommodate I-frames without buffering delays. We also had another latency issue – development time. Thus, we developed an Iframe- only decoder.

(Without P- and B-frames, MPEG-2 video becomes essentially the same as motion JPEG. In this case, we were constrained to MPEG because that was the source format.)

The VigraWATCH Video Processor
The VW platform allows you to rapidly develop high-performance audio, video, and image processing functions using the XC2V3000, the microprocessor, or both. Figure 3 illustrates the VW’s primary components, peripherals, and available I/O.

Primary Components
The VW contains five large ICs: an IBM PPC440GP, a Xilinx XC2V3000 FPGA, two Cirrus Logic™ MPEG-2 codecs, and a PCI-PCI bus bridge.

The PPC provides general-purpose processing. It has a dual-issue superscalar RISC core with 64-way associative I- and D-caches. It also manages PCI, RS-232, IIC, and Ethernet I/O. Chain-controlled DMA units are available in the PPC for moving data between the PCI bus, the external peripheral bus (EPB), the PPC DRAM, and I/O registers.

The FPGA handles raw (uncompressed) audio, raw video, and raw and compressed I/O for the MPEG codecs. Part of the FPGA fabric is dedicated to video generators and mixers, I/O multiplexers, standard video processing such as scaling, and RAM interfaces. About 10% of the XC2V3000 is dedicated to a basic 2-D graphics engine. You can use the remainder of the FPGA fabric for custom processing functions. The default FPGA internal clock is 100 MHz, which matches the clock used for the DRAMs.

Each of the two MPEG-2 codecs is capable of encoding or decoding elementary streams. They are independent of each other. For example, in a video application where the raw video is enhanced by the FPGA, you can compress both the original and the enhanced video. In a communications scenario, one codec may be compressing local video for transmission, while the other is de-compressing remote video. Or you can use the two codecs to decompress video from two distinct remote sources.

The PCI-PCI bridge allows you to install VW in either 3.3V PCI or 5V PCI systems (the PPC is not 5V I/O tolerant).

Peripherals
Inputs to the VW FPGA include:

  • A stereo audio digitizer
  • A video digitizer/decoder
The video decoder accepts standard-definition NTSC and PAL format analog video from one of four composite sources or one of two S-video sources.

Outputs from the VW FPGA include:

  • Two SVGA DACs, capable of driving independent displays
  • An audio DAC producing standard line-level stereo audio output
The two DRAM banks attached to the FPGA are independent; each is capable of 1.6 Gbps peak bandwidth. One is associated with the graphics engine in the FPGA; the other is typically used by video processing functions.

Digital I/O connectors 1 and 2 each support 22 bi-directional LVTTL signals, as well as a few auxiliary connections. You can use the digital I/O to connect another board directly to the FPGA, or to connect two VW boards together. Digital I/O connector 3 has 16 LVTTL pins and can be used for a video interface port or as a convenient place to bring out de-bugging test points.

Software
The PPC runs MontaVista™ Linux™, an embedded Linux supporting real-time functionality, multi-processes, and multithreading. You can operate VW standalone, independent of any host computer, or as an add-in board driven by a host system. On Sun™ Solaris™-, Microsoft™ Windows™-, Wind River Systems™ VxWorks™-, or Linux-based host systems, graphic drivers allow VW to function as primary or secondary display. An API provides control of basic VW functions.

Building the I-Frame Decoder
We had a “clean room” software decoder developed from the MPEG specification available in-house at the start of the project. We partitioned the I-frame decoding functions into modules and did software profiling and hardware simulation to determine how to distribute the modules across the FPGA hardware and PPC software.

Integration with VW FPGA Internals
We connected the I-frame decoder inside the FPGA as a standard video input. Figure 4 shows a portion of the VW FPGA internals and how the decoder’s two ports connect to the pre-existing circuitry. The EPB port carries encoded data, tables, and control register setup data from the PPC. The CCIR-656 video out port connects to a video multiplexer that selects between all of the video inputs. This allows us to re-use the existing design’s video storage circuitry to move frame data into video memory, and ultimately to the display. Because the I-frame is processed sequentially, we can use internal block RAM to assemble macro blocks; a port to connect to external RAM is not required.

Decoding Modules
The pipeline layout of the decoder is shown in Figure 5. Input on the left is fed by the PPC. Output on the right is CCIR-656 format 4:2:2 YCbCr 8-bit video. This format matches the output from the VW peripheral analog video decoders. The layout was designed to allow progressive incremental design, integration, and testing of the modules.

The input buffer uses a 512-deep x 32-bit-wide FIFO to receive all data from the PPC. This FIFO allows the relatively slow 66 MHz EPB bus to operate at full speed, without having to implement low-level hardware handshakes. A high-level handshake is implemented by making the FIFO’s fill level available for read-back by the PPC.

The PPC core can keep track internally of the FIFO fill level and make decisions as to whether to work on filling the FIFO or perform other useful functions. The input buffer also contains an auto-incrementing register used to generate indirect addresses for rapidly filling tables in other modules, to keep the decoder’s I/O address range on the EPB bus small.

The variable length (VL) decoder decodes the Huffman-encoded block coefficients according to MPEG-2 tables B-14 and B-15. State machines to traverse the Huffman code trees and a look-up table to extract run/level value pairs from the leafs both fit into a single Virtex-II block RAM configured as 1K deep x 16 bits wide.

We used some extra FPGA fabric for shift registers to handle escape codes for run/level values not included in the Huffman code tables. The ISDSM block handles the functions of inverting zigzag scanning, dequantization, and scaling.

The iDCT was the easiest block to design: it is included as a standard core in the Xilinx ISE CORE Generator™ package.

The format converter assembles the Y, Cb, and Cr sample blocks into slices in a slice-assembly RAM buffer comprising 16 block RAMs. The slices are then scanned out line by line and the lines are wrapped in CCIR-656 start and end active video (SAV/EAV) marker codes. We used an address rotation technique so new blocks can be assembled in the buffer as soon as a single line is removed, allowing the pipeline to run continuously without having to double-buffer the slice assembly RAM.

Results
The original unoptimized MPEG-2 codec chip external to the FPGA had a latency of ~1800 ms. Working with the codec chip manufacturer, we reduced their latency to 45 ms. The I-frame decoder we developed using the Xilinx FPGA and PPC has a latency of less than 2 ms.

Conclusion
We saved a lot of time and effort using prebuilt boards and IP in the development process. If we had to develop the board, all of the associated software and all of the IP that went into the low-latency decoder and display system would have taken years instead of months.

You can rapidly develop other video processing functions, including:

  • Other codecs – H.264, MPEG-4, Motion JPEG2000
  • Enhancement – linear and non-linear filters, super-resolution, histogram equalization/specification, de-convolution, warping
  • Stabilization and mosaicing
For more information on MPEG-2, read the book, “MPEG Video Compression Standard,” edited by Joan L. Mitchell et al. And for more information on the VigraWATCH system, visit www.titan.com,

Printable PDF version of this article with graphics. PDF logo (10/15/04) 300 KB

 
Jobs Events Webcasts News Investors Feedback Legal Privacy Trademarks Sitemap
© 1994-2008 Xilinx, Inc. All Rights Reserved.