So Prove It
To demonstrate the capabilities of software-compiled
system design, Celoxica and Xilinx
partnered to undertake the co-design of a
JPEG 2000 codec (compressor/decompressor)
for implementation in a Virtex-II Pro
FPGA. In particular, we wanted to address
the design challenge of system partitioning,
co-verification, and the easy integration of
hardware and software.
JPEG 2000 is a standards-based image
coding system that uses state-of-the-art
compression techniques based on wavelet
technology (Figure 2). Its architecture
lends itself to a range of uses from consumer
electronics, such as digital cameras,
to medical imaging, remote sensing, surveillance
systems, and scanners.
Project Specifications
- Maximize overall system performance
- Innovate and differentiate from the
competition
- Demonstrate an efficient and effective
co-design environment
- Use software specification as a starting
point for system design
- Improve communication between
hardware and software design teams
- Simplify partitioning and migration of
code between software and hardware
for better overall Quality of Design
(QoD)
- Demonstrate a complete
system verification flow
- Deliver competitive Quality
of Results (QoR)
- Exceed time-to-market
expectations
- Support design re-use
strategies
- Maximize current EDA
and IP investments.
Project Plan
- Phase 1: Profile and verify
- Phase 2: Partition and verify
- Phase 3: Design and verify
- Phase 4: Implement and verify
Tools
- Celoxica Nexus PDK
- Celoxica DK Design Suite
- Wind River Xilinx Edition
toolset
- Xilinx ISE 5.1i
Phase 1: Profile and Verify
A multitude of applications can benefit
from hardware acceleration and product
flexibility. To demonstrate this, we selected
JPEG 2000; our starting point for the
design was the C specification code.
To drive our system verification flow, we
ran the specification code through an
appropriate target – in this instance the
IBM PowerPC™ 405 GP. From this we
simulated and verified the functionality of
our system and established a test bench
that remained constant and consistent
throughout the design.
We then began code profiling to establish
where our program spent its time and
determine which functions called other
functions during execution. We found
profiling was useful, as it quickly identified
the functions in a program that were
processor hungry or compute intensive.
That made them possible candidates for
offload into the FPGA fabric. However,
the profiling does not analyze dataflow
between hardware and software, nor burst
length or frequency, so designer intervention
is mandatory to understand the dynamics of
our hardware/ software interaction.
Using Wind River’s WindView™ visualization
and diagnostic tool, we determined
that the DWT (discrete wavelet transform)
and Tier 1 encoder were the processor-intensive
functions, consuming 87% of processing
time (Figure 3). We selected them for further
scrutiny and tradeoff analysis.
Phase 2: Partition and Verify
Validating the system partition
against the requirements of the
design specification is cardinal in
programmable system design.
Typically the system designer
maps a system-level architecture
into specific hardware and software
components, making direct
reference to the project specification
and factors such as component
availability, cost, and
technical feasibility.
The consequences of the system
partition cascade through the
design flow to physical implementation
and final system performance
is greatly dependent upon
partitioning decisions. It makes
little sense to invest time, money,
and effort optimizing and refining
an incorrect partition – it is inherently
sub-optimal.
Uniquely, software-compiled
system design provides the designer
with a flexible partitioning capability
that permits partition and repartition
at any stage in the design
process. Moreover, it is linked to a
verification flow that enables the
designer to confidently explore and innovate
in the design space, analyzing hardware/software
tradeoffs, and identifying the optimal
system partition for the best QoD.
Facilitating this is the data streaming
manager (DSM), a portable co-design API
(Figure 4) supplied with Celoxica’s DK
Design Suite. Developed specifically for
programmable system partitioning and
hardware/software integration, the DSM
allows the designer to iteratively explore,
test, and verify multiple partition alternatives.
The designer can quickly create and
easily move ports that are used to send data
between the software and hardware by
using the API standard (Figures 5 and 6).
As each option is explored, the designer can
verify the partitioning with the software
used as a test bench throughout the project.
In our project, the DSM validated the
profiling information determined in Phase
1 of our design flow. It helped analyze the
data flow and the burst length and frequency
between our hardware and software,
and fine-tuned the partition to meet
the project’s criteria. Moreover, the DSM’s
inherent portability meant the design
could be repartitioned at any stage in the
design flow, redefining the system architecture
and easily accommodating late
specification changes.
Phase 3: Design and Verify
With the optimal partition determined
and verified, we began the
design optimization phase of our
project. Software-compiled system
design makes use of HLLs for both
hardware and software design.
HLLs allow the system specification
to be written in a form that
both the hardware and software
teams can immediately use – without
costly and time-consuming
rewrites. Additionally, HLLs simplify
the migration of code between
hardware and software. Because
there is a common language base
and common level of abstraction
between the hardware and software,
there is improved communication
and shared understanding between
the development teams.
As we did in our partitioning
phase, we used the DSM throughout
the design optimization phase.
The DSM provided a functionally
accurate simulation environment
that allowed our hardware and software
to interact – keeping them
connected throughout design optimization
(Figure 7).
The software was run as a native executable
on the PPC 405 GP, and the
hardware was run using the simulation
and debugging capability of Celoxica’s
DK Design Suite. We used a utility program
to monitor the data passing
between the applications to assist with
debugging. Because all of the API functions
were provided, this allowed complete
system development to begin –
without the development platform being
available. Once we got it working, the
application was easily transferred to
the target platform for final testing. Co-simulation
between hardware and soft-ware
was made possible by connecting
DK with the Tornado™ environment
from Wind River (Figure 8).
As our system specification was
described in ANSI-C, we progressed our
design in ANSI-C and used hardware language
extensions defined in Handel-C to
describe our hardware. These hardware
extensions enable, for example, efficient
control over area, timing, clocks, RAMs,
ROMs, and interfaces.
Combining multiple DSM calls, we
made optimizations to the software. And
we applied hardware optimization techniques,
such as increasing parallelism,
replacing for() loops with while() loops,
pipelining, and syntax duplication.
Specification Change
At this stage in the design, a specification
change was introduced. A novel lifting
algorithm was developed that performs a
two-dimensional DWT and thus provides
faster processing time. The algorithm was
readily available as a HDL IP block, and
we decided, with respect to design time
and maximizing IP investment, to integrate
the IP into the design as a black
box. The integration was simplified by
using the interface declaration available
in Handel-C for connecting third-party
IP into a software-compiled system
design flow (Figure 9).
Phase 4: Implement and Verify
Implementation to the target platform was
simplified by using the platform abstraction
layer (PAL). The PAL shields designers from low-level hardware interfaces, easing the
integration of FPGAs with physical
resources. This is done by developing a
library of low-level interfaces to specific platform
resources, such as I/O or memory. This
library, called the platform support library
(PSL), is then accessed by the hardware
application on the FPGA using a simple and
consistent application programming interface
– the PAL API (Figure 10).
The target platform was a Wind River
SBC405GP (single board computer reference
design) with a Proteus FPGA daughter
card, effectively a Virtex-II Pro
prototyping platform (Figure 11). This
development environment supported timing
simulation, emulation, and block optimization,
and it was used prior to final
implementation in a Virtex-II Pro ML300
Evaluation Platform (Figure 12).
Object code was compiled into the
PPC405GP under Wind River’s
VxWorks™ RTOS, with the hardware
implementation using the direct EDIF output
generated by Celoxica’s DK Design
Suite. This EDIF netlist was optimized for
the Virtex-II Pro Platform FPGA (Figure
13), ensuring maximum efficiency for best
QoR. Optionally, the DK Design Suite can
also output RTL-level VHDL or Verilog,
pre-optimized for traditional synthesis tool
flows (Figure 14).
Results
The results (Tables 1 and 2) from the
Celoxica DK Design Suite were compared
to the handcrafted VHDL
authored by a JPEG 2000 domain expert.
Handel-C’s systematic and ANSI-C-like
approach to the problem led to substantial
savings in design time. An expert
Handel-C engineer with no prior knowledge
of the JPEG 2000 standard was able
to get the algorithm to a working hardware
implementation in less than half the
time it took to code the VHDL.


The other key success we had was that
the design was easily able to meet the system
timing constraints. The results provide
clear validation that design
abstraction leads to increased designer
productivity without necessarily compromising
performance or area.
Conclusion
Software-compiled system design is a
proven methodology for programmable
system co-design. It provides a solution for
system partitioning, co-verification, and
hardware/software integration across a
spectrum of design styles and applications.
For use by all members of your design
team, from the system architect to the verification
engineer, software-compiled system
design enables code sharing between
hardware, firmware, and software designers
from system specification through to
implementation.
The Celoxica DK Design Suite is a
fully featured development toolset that
enables a complete implementation of a
software-compiled system design. It is
interoperable with popular third-party
tools and languages and provides fast co-simulation
between C/C++, Handel-C,
HDLs, instruction set simulators (ISSs),
and modeling languages such as Open
SystemC™ and MATLAB™.
Software-compiled system design
methodology offers you compelling benefits,
and it is an efficient and effective design
strategy for Xilinx programmable systems.
Printable PDF version of this article.
(05/02/03) 400 KB