|
As larger Xilinx FPGAs become affordable
(thanks to advanced process technologies),
FPGA designers are now asked to create
systems with more complex functionalities.
By implementing these functionalities in
software, FPGA designers can achieve their
goals quickly and make their designs more
maintainable and reusable.
However, the available resources in
FPGAs are finite. Thus, the demand for
easy-to-use, resource-efficient compact
processors is always strong. Ponderosa
Design microsequencers were developed
with such demands in mind. In this article,
we’ll present our new microsequencer
products and new system debugging tools,
which enable microsequencers to be used
in a wider range of applications.
The scc-32
When we developed our first-generation
microsequencers, the main available
resources in FPGAs were 512-byte block
memories with a maximum 16-bit wide data
interface (and no multipliers). Now, Xilinx®
Spartan™-3 and Virtex™-4 devices provide
quite a different landscape for FPGA
designers. Two kilobyte block memories
with a maximum 36-bit-wide data interface
and 18 x 18 bit multipliers have become
common resources that you can expect even
for cost-sensitive FPGA projects.
We developed our newest microsequencer,
the scc-32, to fully utilize such
resources, as well as to provide 32-bit data
handling capability. As the name implies,
the scc-32 is a 32-bit controller – but how
does it compare to the MicroBlaze™ softcore
processor? The scc-32 is not designed
as a generic microcontroller like
MicroBlaze or Power PC™ processors.
Our microsequencers are designed to take
different roles and work with generic
microcontrollers instead.
Our microsequencers employ a stack
architecture, while today’s generic microcontrollers
employ a register-based architecture.
Stack architecture is suitable for
custom processors for FPGAs because the
core is compact and resource-efficient (an
effective use of block RAM). The program
size is smaller because the instruction is one
byte long. Stack architecture doesn’t use
deep pipelining, resulting in a predictable
interrupt latency.
Contrary to common perceptions, supporting
32-bit data types is not difficult,
nor does it consume a lot of resources in
the FPGA. Our microsequencers use stack
architecture, where the data size is irrelevant
to each stack operation. When it
comes to FPGA resources, all 32-bit data is
stored in block RAM rather than registers.
However, the arithmetic logic unit (ALU)
must be a 32-bit ALU.
The scc-32 uses unified memory architecture
(UMA). It needs three logically
independent memories (data stack, program
stack, and register file) in addition to
a program memory. With UMA, three logically
independent memories are unified
into a single memory. One block RAM
(38-bit wide, 512-word deep) can hold a
32-level-deep data stack, a 16-level-deep
program stack, 144 global registers, and 24
auto registers per function call.
The scc-32 has a 16-bit program space,
64 KB, while our previous generation has an 11- or 13-bit program space. A larger program
space means that a large amount of
data (such as coefficient tables or message
strings) can be included in a program. As our
microsequencers have instructions to read
from/write to program memory, program
memory space can be used as an extra storage
or data space to share with another process.
One important architectural change we
observed from older Xilinx FPGAs to newer
Xilinx FPGA families is the absence of
internal tri-state buffer (TBUF) resources.
Our older microsequencers utilize TBUFs
to construct multiplexers with a large number
of inputs. As they are no longer available
in newer Xilinx FPGA families, the
scc-32 is designed without internal TBUFs
and optimized to minimize the complexity
of such data multiplexers.
The scc-16
Our first-generation microsequencers were
16-bit controllers. They had a much tighter
programming model; for example, the scc-
IIs, one of our first microsequencers, had a
2 KB program size limit, five-level function
call, eight global registers, and eight auto
registers per function call. Not all applications
require 32-bit data types, but you
may want to use a newer programming
model like the scc-32.
The scc-16 microsequencer uses an
identical architecture and instruction set to
the scc-32 (some instructions are dropped
because there is no need to support 32-bit
data types). The scc-16 provides a smaller footprint, which is close to the one that the
first-generation microsequencer delivered
(see Table 1). For example, even with the
smallest Spartan-3 FPGA (the XC3S50),
the scc-16 “hello” project only consumes
53% of the device, leaving ample space for
other logic to be implemented.
Supporting a 32-Bit System Bus
A common practice is to use a system bus to
create a larger system using bus-compliant
modules. We originally designed our
microsequencers for stand-alone use, but in
some situations, you may want to connect a
microsequencer to a system bus. For example,
a microcontroller can download a program
to a microsequencer on the fly so that
the block can be used as a reconfigurable
functional block. A microsequencer can
also share a resource with other controllers.
The Advanced Microcontroller Bus
Architecture (AMBA) high-performance
bus (AHB bus) from ARM Ltd. is a system
bus similar to the CoreConnect used with
MicroBlaze and PowerPC processors. We
use 32-bit AHB bus, which results in a 32-bit address space (4 GB). To interface
to the AHB bus, we
developed two wrapper modules,
ahb32wrap (AHB master) and
scc32ahb/scc16ahb (AHB slave).
For the AHB master, macros are
provided to access the 32-bit
address space, as the scc-32 native
instructions cannot access 32-bit
address space directly.
The following code excerpts demonstrate
how accessing the AHB bus is coded
in the “SC” program. The SC language is a
proprietary high-level language specifically
for the SCC-II microsequencer family.
ahbwrite(DMAC_CTRL, 0x00000101);
ahb_status = ahbread(DMAC_STAT);
With the wrappers, the microsequencer’s
internal signals as well as the entire program memory are exposed to the
AHB bus. To facilitate the development of
a system with the AHB bus, we developed
a new debugging tool in addition to our
original stand-alone JTAG debugger.
The AHBDBG
The AHBDBG is a debugging tool for systems
with the AHB bus (Figure 1). It provides
a wide range of features to debug an
AHB bus-based system, while our original
JTAG debugger only provides those features
necessary to try a program with an
FPGA on the board.
The AHBDBG communicates with a
small AHB master – the jtag2ahb module –
through a JTAG interface to generate AHB
bus accesses. In addition to obvious features
such as bus read/write/dump, two important
features are worth mentioning. The first
feature is the AHB-based logic analyzer
(Figure 2). You may think this is yet another
ChipScope™ analyzer, but the logic analyzer
available with the AHBDBG is quite
different. First, it must be explicitly instantiated.
Because of this, you can provide your
trigger signal if you need a complex trigger
condition. Second, it only saves signals
when they change (recording events). This is
really necessary to capture bus activities,
which may last for many cycles. The tool
also allows you to compress much further by
sacrificing timing relationship accuracy.
With the optional compression mode
turned on, if the timing period between
event A and event B exceeds the 16-bit
cycle counter, it is truncated to the maximum
of the cycle counter. Otherwise, a
null event (an event containing no value
change information but a time stamp), is
generated to guarantee the timing accuracy.
There are four types of AHB-based logic
analyzers: 16 bits wide, 32 bits wide, 64 bits
wide, and 128 bits wide. You can configure
the depth of the logic analyzer trace memory.
A 32-bit-wide logic analyzer is sufficient
for microsequencer debug, whereas a 64-bit-wide logic analyzer is probably sufficient
for AHB bus monitoring (Table 2). The
captured events are uploaded to the host
side; the AHBDBG produces a VCD dump
file so that you can use a simulation waveform
viewer to see the waveform. The signals
can be bundled to a set of wires and
buses with meaningful signal names using a
helper tool, vcdwizard.
The second unique feature is the remote
access feature. The AHBDBG can be a
gateway to your hardware system. When
the remote access feature is turned on, it
will listen to the network (specified TCP
port) and translate a network message to
an AHB bus access. This feature allows
you to exercise the FPGA hardware using
programs such as Python, Perl, or C++.
We provided this feature because our
users asked us to include specific features
for their projects, and the remote access feature
provides a generic way to provide project-specific features. We later found that
the remote access feature could be used in
other situations beyond its original intent.
For example, you could do system prototyping
using a remote program before
developing a real embedded program.
The AHBDBG has a layered software
design; the bottom layer is a layer to talk
to a physical device. Currently, it supports
Parallel Cable III through a printer
port or Ethernet Pod (proprietary to
Ponderosa Design). However, it can also
support any device that can mimic a
printer port interface.
In addition to the devices mentioned
previously, a virtual device “sim” is also
supported. As the name implies, the “sim”
device is a virtual device for Verilog simulation,
which the AHBDBG can “control.”
Any commands given to the AHBDBG are
ultimately converted to AHB accesses in a
simulation. We found this scheme very
helpful, as it provides an intuitive way to
run Verilog simulation. The current
scheme uses Unix IPC (Inter-Process
Communications) and Verilog PLI, so it is
not a universal solution for everyone.
However, many simulators provide a special
hook so that controlling Verilog simulation
from the AHBDBG is possible even
where Unix IPC is not available.
The AHBWIZARD
When constructing a larger system with
the AHB bus, we realized that creating a
top-level file that holds all AHB submodules
and arranging bus multiplexers
is tedious, time-consuming, and errorprone.
Moreover, you need to constantly
maintain the file as new AHB modules
are added or subtracted from the system.
We developed the AHBWIZARD to
automate the process such that you can
rearrange your AHB system without
rewriting the top-level file. For example,
you may want to have a logic analyzer
module to debug the system, but you
don’t want to have such modules in the
production FPGA design.
The AHBWIZARD separates the
AHB library (definition files) and the
tool itself (GUI) such that you can add a
new AHB module definition file or modify
how codes are generated (such as signal
naming conventions). At start-up, the
AHBWIZARD scans the library directory,
enumerates the modules available,
and displays them in the window. You
can just drag-and-drop the modules you
want to add (Figure 3).
A custom property sheet pops up to
specify property information about the
module, such as address decoding or AHB
master priority, which is in turn used in the
module’s code generation. Glue modules
such as AHB slave to master multiplexer
are automatically generated accordingly.
If the generated top-level file is not
complete, you can use an optional
“touch-up” module to add or modify the
generated top-level file. A helper tool
(touchupwizard) generates the touch-up
module (Figure 4).
We provide several commonly used
AHB modules for use with the AHBWIZARD,
in addition to our AHB-ready
microsequencer modules.
Multiprocessors in FPGAs
As the footprints of our microsequencers are
small, you could use more than one microsequencer
in a FPGA. Although you can assign
a different program to each microsequencer,
you can also assign the same program to
multiple microsequencers. We think that
such a configuration – which we call SIMD
(single-instruction-stream, multiple-datastream)
configuration – could be very beneficial
for certain types of applications.
When controlling a robot, for example,
the left and right sides probably require the
same control flow. With two microsequencers
sharing one program memory,
the program only needs to deal with one
side, resulting in a simpler program.
Additionally, you can use a portion of the
program memory for processor communication,
as our microsequencers can write to
a program memory.
Of course, the required number of block
RAMs is half the required number of block
RAMs for two separate microsequencers,
such that a larger program can be crammed
into an FPGA. You can achieve such a
scheme with minor core modifications
because the block RAM in an FPGA is
dual-port memory, allowing two microsequencers
to access the program memory
without disturbing another (Figure 5).
Multiprocessors in FPGAs is an interesting
developing field. As the AHBDBG
accesses the microsequencer through the
AHB bus, any number of microsequencers
can be supported.
Conclusion
The ideas presented here are used in ASIC
design processes, so we are glad to bring
these advanced design methodologies to
FPGAs. However, this cannot be done
without the newest Xilinx FPGA families.
For more information, visit www.ponderosa-design.com, or e-mail info@ponderosa-design.com.
Printable PDF version of this article with graphics. (7/15/05) 320 KB
|