Xcell Journal Home
  Xcell Journal Article
  Partner Yellow Pages
   
  Xcell Archives
  Order Free Xcell Journal
  Comments & Suggestions
  Write Articles for Xcell

    

Home : Documentation : Xcell Journal Online : Article

(NOTE: For faster downloading, all online articles are TEXT ONLY versions with no graphics. To view the complete article with graphics, download the PDF version at the end of the article.)

Accelerate JPEG2000 Compression

by Olivier Cantineau, Design Manager, Barco Silex
olivier.cantineau@barco.com (12/3/03)

Barco Silex IP cores enable high-speed picture compression on Virtex-II and Virtex-II Pro FPGAs.

JPEG2000 is the latest algorithm from the JPEG normalization group for still picture compression. Based on wavelet technology, JPEG2000 is very different from its predecessor. It has capabilities that allow it to be adopted in a wide spectrum of applications, even extending to video encoding.

As a result, this compression scheme requires much more computational power than its classic JPEG predecessor. Furthermore, software implementations make poor candidates for applications requiring very fast encoding times.

Barco Silex has developed two JPEG2000 accelerator IP cores for high-performance applications: the BA112JPEG2000E encoder and the BA111JPEG2000D decoder. These cores handle the computationally intensive tasks of the JPEG2000 algorithm. Together with a host CPU, these cores create a complete JPEG2000 encoding or decoding solution.

Implementing the BA112JPEG2000E and BA111JPEG2000D cores on Xilinx Virtex™-II and Virtex II-Pro™ FPGAs paves the way to high-speed picture encoding applications with a very flexible architecture and shortened design time.

The JPEG2000 Algorithm
Although JPEG2000 is based on a single algorithm, it offers a wide range of tools for compressing and representing images. It is suitable for a large spectrum of applications ranging from Internet streaming to medical imaging to digital cameras.

JPEG2000 encompasses capabilities traditionally encountered in separate algorithms, such as:

  • Lossy and lossless compression with excellent performance
  • Precise compression ratio control with single-pass processing
  • Bitstream progressivity, allowing consistent image previews with partial bitstream decoding
  • Support for user-defined regions of interest in the image, encoded with higher quality than the other areas
  • Error resilience.
Moreover, JPEG2000 compression quality outperforms the classical JPEG scheme for high compression ratios by generating fewer and less visible artifacts. Let’s explain the two consecutive processing stages required for JPEG2000 compression.

Tier 1 – DWT-Based Compression
The JPEG2000 algorithm divides the image into rectangular tiles of a configurable size.

Each tile undergoes a two-dimensional discrete wavelet transform (DWT), which reorders the tile’s frequency information into a series of pictures (subbands). A subband results from the filtering of the original tile for a given frequency range.

A parameter of the transform (the number of decomposition levels) defines the number of frequency intervals.

Each subband can then undergo selective quantization by a programmable factor for lossy compression. Bypassing the quantization yields lossless operation.

The algorithm further divides the resulting quantized subbands into smaller rectangular blocks (code blocks).

A modeler examines the bit planes of the current code block, beginning with the most significant one. In each plane, it scans the bits in a zigzag order and determines their context by using information such as the predominant value of the surrounding bits.

Finally, an arithmetic encoder processes the value of the bit and the context. It generates a code stream representing the compressed code block.

This arithmetic encoder also computes distortion metrics, which reflect the image distortion that would be encountered when reconstructing the code block with its currently encoded portion.

Tier 2 – Packet Selection and Reordering
The code stream generated by the arithmetic encoder, together with the distortion metrics, allows JPEG2000 to selectively build the final bitstream at the post-processing stage. This process is driven by two user-defined parameters:

  • Compression ratio – This Tier-2 stage selects incoming packets to attain a user-specified compression ratio. The algorithm rejects packets that do not sufficiently improve the compression distortion. This mechanism allows precise control of the generated compressed file size, while maintaining good image quality.
  • Progression order – JPEG2000 allows an initial preview of a picture with the first portion of the bitstream. Decoding subsequent parts of the compressed file progressively refines the image. JPEG2000 also standardizes various refinement orders by prioritizing an image characteristic, such as quality or resolution. The Tier-2 stage achieves the desired progression order by reordering incoming packets.
JPEG2000 Implementation
Because of its powerful features, JPEG2000 requires more computational resources than the classic JPEG to achieve similar encoding and decoding speeds.

The features of Xilinx Virtex-II series FPGAs make them an excellent choice for implementing JPEG2000 solutions, including fast and numerous RAM blocks and a large amount of logic resources. RAM blocks allow the implementation of a large on-chip tile buffer, increasing the core’s overall performance and integration level.

Moreover, thanks to their on-chip IBM™ PowerPC™ 405 cores, Virtex-II Pro devices allow you to construct a complete JPEG2000 system. Indeed, the IP cores, implemented in the programmable logic, will execute the first stage of the JPEG2000 algorithm (wavelet transform, quantization, and entropy encoding). A software routine running on the host processor will more suitably execute the Tier-2 stage.

Figure 1 shows a block diagram of the Barco Silex BA112JPEG2000E IP core. This illustrates the main functional modules and a simplified view of the interfaces. You can input pixel data through the pixel interface and get data streams at the compressed interfaces together with distortion metrics. The core features a simple, generic CPU interface, suitable for interfacing as a bus peripheral to various processors, including a PowerPC system.

As an example, Virtex-II devices can be used together with the BA112JPEG2000E and BA111JPEG2000D cores to perform picture encoding or decoding with timings compatible with NTSC (National Television Standards Committee) requirements. In other words, images have a resolution of 720 pixels by 480 lines in 4:2:2 color format within 33 ms. The cores require 8,200 slices and 54 RAM blocks for an irreversible solution (lossy), or 11,500 slices and 66 RAM blocks for reversible compression (lossless).

2-D DWT
The first module of the BA112JPEG2000E core is the DWT engine. This module can be configured to accept tiles of pixels as large as 128 by 128. It performs two-dimensional discrete wavelet decomposition on incoming data with as many as five programmable decomposition levels. The wavelet transform can be programmed as lossy, lossless, or bypassed.

The DWT module accepts incoming pixels of any size – up to 10 bits for lossy and up to 12 bits for lossless. It stores its results in the on-chip tile buffer to undergo quantization and code block decomposition.

Quantizer
The quantizer fetches the subbands available from the tile buffer and applies a programmable quantization step. Different quantization steps are programmable for each subband. Thus, you can weight lower frequency subbands differently than higher frequency ones.

You can also bypass the quantizer for lossless mode.

Tile Splitter
This unit further divides quantized subbands into rectangular code blocks of a programmable size (as large as 32 by 32 pixels) in preparation for entropy encoding by an arithmetic encoder.

The BA112JPEG2000E and BA111JPEG2000D cores feature a configurable number of entropy encoders, placed in parallel, to sustain high encoding rates. You can select the number of implemented chains during the IP synthesis process. Each entropy chain processes a code block independently of neighboring chains.

The tile splitter module arbitrates between the available chains, dispatching the various code blocks to be encoded. It stores code blocks in local code block buffers.

Modeler and Arithmetic Encoder
The modeler performs the first part of entropy encoding. It examines the code block bit plane by bit plane and extracts relevant bits in zigzag order for each bit plane. The modeler then computes the context information needed by the arithmetic encoder.

The arithmetic encoder processes the bits and contexts, and makes the stream and the distortion metrics available at the compressed interface. These are used by the Tier-2 part of the JPEG2000 algorithm.

Host Interface Module
This module allows the interfacing of the core to a CPU: It contains configuration registers for the various modules and gives status information about the encoding progress.

The host interface module also features a separate command-and-control interface that allows fast control of the IP core with minimal or no CPU intervention. Hence, the command-and-control interface makes it possible to build a system in which a small amount of logic drives the BA112JPEG2000E and BA111JPEG2000D cores, which are not directly connected to a host CPU. This increases the integration flexibility of the IP cores.

Conclusion
Barco Silex’s BA112JPEG2000E and BA111JPEG2000D IP cores are targeted to high-speed JPEG2000 encoding and decoding, providing access to the wide range of capabilities now possible with the JPEG2000 standard.

This new standard defines an algorithm able to offer a large spectrum of features, such as progressive bitstream, precise rate control, region of interest, and high-quality joined lossy and lossless compressions.

This rich set of advantages turns JPEG2000 into an important actor in the compression world, especially as it is not limited to still picture compression.

The computational complexity of the JPEG2000 standard requires hardware platforms for high-speed applications compatible with real-time video encoding. Barco Silex BA112JPEG2000E and BA111JPEG2000D IP cores, acting as compression accelerators on Xilinx Virtex-II and Virtex-II Pro Platform FPGAs, can give you the speed you need.

For more information about Barco Silex IP cores, visit www.barco-silex.com.

Printable PDF version of this article with graphics. PDF logo (12/3/03) 280 KB

 
Jobs Events Webcasts News Investors Feedback Legal Sitemap
© 1994-2008 Xilinx, Inc. All Rights Reserved.