|
Barco Silex IP cores enable high-speed picture compression on Virtex-II and Virtex-II Pro FPGAs.
JPEG2000 is the latest algorithm from
the JPEG normalization group for still
picture compression. Based on wavelet
technology, JPEG2000 is very different
from its predecessor. It has capabilities
that allow it to be adopted in a wide spectrum
of applications, even extending to
video encoding.
As a result, this compression scheme
requires much more computational power
than its classic JPEG predecessor.
Furthermore, software implementations
make poor candidates for applications
requiring very fast encoding times.
Barco Silex has developed two JPEG2000
accelerator IP cores for high-performance
applications: the BA112JPEG2000E
encoder and the BA111JPEG2000D
decoder. These cores handle the computationally
intensive tasks of the JPEG2000
algorithm. Together with a host CPU, these
cores create a complete JPEG2000 encoding
or decoding solution.
Implementing the BA112JPEG2000E
and BA111JPEG2000D cores on Xilinx
Virtex™-II and Virtex II-Pro™ FPGAs
paves the way to high-speed picture encoding
applications with a very flexible architecture
and shortened design time.
The JPEG2000 Algorithm
Although JPEG2000 is based on a single
algorithm, it offers a wide range of tools
for compressing and representing images.
It is suitable for a large spectrum of applications
ranging from Internet streaming to
medical imaging to digital cameras.
JPEG2000 encompasses capabilities
traditionally encountered in separate algorithms,
such as:
- Lossy and lossless compression with
excellent performance
- Precise compression ratio control with
single-pass processing
- Bitstream progressivity, allowing consistent
image previews with partial bitstream
decoding
- Support for user-defined regions of
interest in the image, encoded with
higher quality than the other areas
- Error resilience.
Moreover, JPEG2000 compression
quality outperforms the classical JPEG
scheme for high compression ratios by generating
fewer and less visible artifacts.
Let’s explain the two consecutive processing
stages required for JPEG2000
compression.
Tier 1 – DWT-Based Compression
The JPEG2000 algorithm divides the
image into rectangular tiles of a configurable
size.
Each tile undergoes a two-dimensional
discrete wavelet transform (DWT), which
reorders the tile’s frequency information
into a series of pictures (subbands). A subband
results from the filtering of the original
tile for a given frequency range.
A parameter of the transform (the number
of decomposition levels) defines the
number of frequency intervals.
Each subband can then undergo selective
quantization by a programmable factor for lossy compression. Bypassing the quantization
yields lossless operation.
The algorithm further divides the
resulting quantized subbands into smaller
rectangular blocks (code blocks).
A modeler examines the bit planes of
the current code block, beginning with the
most significant one. In each plane, it
scans the bits in a zigzag order and determines
their context by using information
such as the predominant value of the surrounding
bits.
Finally, an arithmetic encoder processes
the value of the bit and the context. It generates
a code stream representing the compressed
code block.
This arithmetic encoder also computes
distortion metrics, which reflect the image
distortion that would be encountered when
reconstructing the code block with its currently
encoded portion.
Tier 2 – Packet Selection and Reordering
The code stream generated by the arithmetic
encoder, together with the distortion
metrics, allows JPEG2000 to selectively
build the final bitstream at the post-processing
stage. This process is driven by two
user-defined parameters:
-
Compression ratio – This Tier-2 stage
selects incoming packets to attain a
user-specified compression ratio. The
algorithm rejects packets that do not sufficiently improve the compression
distortion. This mechanism allows precise
control of the generated compressed
file size, while maintaining
good image quality.
- Progression order – JPEG2000 allows
an initial preview of a picture with the
first portion of the bitstream. Decoding
subsequent parts of the compressed file
progressively refines the image.
JPEG2000 also standardizes various
refinement orders by prioritizing an
image characteristic, such as quality or
resolution. The Tier-2 stage achieves
the desired progression order by
reordering incoming packets.
JPEG2000 Implementation
Because of its powerful features, JPEG2000
requires more computational resources than
the classic JPEG to achieve similar encoding
and decoding speeds.
The features of Xilinx Virtex-II series
FPGAs make them an excellent choice for
implementing JPEG2000 solutions, including
fast and numerous RAM blocks and a
large amount of logic resources.
RAM blocks allow the implementation
of a large on-chip tile buffer, increasing
the core’s overall performance and integration
level.
Moreover, thanks to their on-chip
IBM™ PowerPC™ 405 cores, Virtex-II Pro devices allow you to construct a complete
JPEG2000 system. Indeed, the IP
cores, implemented in the programmable
logic, will execute the first stage of the
JPEG2000 algorithm (wavelet transform,
quantization, and entropy encoding). A
software routine running on the host
processor will more suitably execute the
Tier-2 stage.
Figure 1 shows a block diagram of the
Barco Silex BA112JPEG2000E IP core.
This illustrates the main functional modules
and a simplified view of the interfaces.
You can input pixel data through the
pixel interface and get data streams at the
compressed interfaces together with distortion
metrics. The core features a simple,
generic CPU interface, suitable for interfacing
as a bus peripheral to various processors,
including a PowerPC system.
As an example, Virtex-II devices can be
used together with the BA112JPEG2000E
and BA111JPEG2000D cores to perform
picture encoding or decoding with timings
compatible with NTSC (National
Television Standards Committee) requirements.
In other words, images have a resolution
of 720 pixels by 480 lines in 4:2:2
color format within 33 ms. The cores
require 8,200 slices and 54 RAM blocks for
an irreversible solution (lossy), or 11,500
slices and 66 RAM blocks for reversible
compression (lossless).
2-D DWT
The first module of the BA112JPEG2000E
core is the DWT engine. This module can
be configured to accept tiles of pixels as
large as 128 by 128. It performs two-dimensional
discrete wavelet decomposition
on incoming data with as many as five
programmable decomposition levels. The
wavelet transform can be programmed as
lossy, lossless, or bypassed.
The DWT module accepts incoming
pixels of any size – up to 10 bits for lossy and
up to 12 bits for lossless. It stores its results
in the on-chip tile buffer to undergo quantization
and code block decomposition.
Quantizer
The quantizer fetches the subbands available
from the tile buffer and applies a programmable
quantization step. Different
quantization steps are programmable for
each subband. Thus, you can weight lower
frequency subbands differently than higher
frequency ones.
You can also bypass the quantizer for
lossless mode.
Tile Splitter
This unit further divides quantized subbands
into rectangular code blocks of a
programmable size (as large as 32 by 32
pixels) in preparation for entropy encoding
by an arithmetic encoder.
The BA112JPEG2000E and
BA111JPEG2000D cores feature a configurable
number of entropy encoders,
placed in parallel, to sustain high encoding
rates. You can select the number of
implemented chains during the IP synthesis
process. Each entropy chain
processes a code block independently of
neighboring chains.
The tile splitter module arbitrates
between the available chains, dispatching the
various code blocks to be encoded. It stores
code blocks in local code block buffers.
Modeler and Arithmetic Encoder
The modeler performs the first part of
entropy encoding. It examines the code
block bit plane by bit plane and extracts
relevant bits in zigzag order for each bit
plane. The modeler then computes the
context information needed by the arithmetic
encoder.
The arithmetic encoder processes the
bits and contexts, and makes the stream
and the distortion metrics available at the
compressed interface. These are used by the
Tier-2 part of the JPEG2000 algorithm.
Host Interface Module
This module allows the interfacing of the
core to a CPU: It contains configuration registers
for the various modules and gives status
information about the encoding progress.
The host interface module also features
a separate command-and-control
interface that allows fast control of the IP
core with minimal or no CPU intervention.
Hence, the command-and-control
interface makes it possible to build a system
in which a small amount of logic
drives the BA112JPEG2000E and
BA111JPEG2000D cores, which are not
directly connected to a host CPU. This
increases the integration flexibility of the
IP cores.
Conclusion
Barco Silex’s BA112JPEG2000E and
BA111JPEG2000D IP cores are targeted
to high-speed JPEG2000 encoding and
decoding, providing access to the wide
range of capabilities now possible with the
JPEG2000 standard.
This new standard defines an algorithm
able to offer a large spectrum of features,
such as progressive bitstream, precise rate
control, region of interest, and high-quality
joined lossy and lossless compressions.
This rich set of advantages turns
JPEG2000 into an important actor in the
compression world, especially as it is not
limited to still picture compression.
The computational complexity of the
JPEG2000 standard requires hardware
platforms for high-speed applications
compatible with real-time video encoding.
Barco Silex BA112JPEG2000E and
BA111JPEG2000D IP cores, acting as
compression accelerators on Xilinx Virtex-II and Virtex-II Pro Platform FPGAs, can
give you the speed you need.
For more information about Barco Silex
IP cores, visit www.barco-silex.com.
Printable PDF version of this article with graphics. (12/3/03) 280 KB |