Editor’s Note: This content is republished from the MicroZed Chronicles, with permission from the author.

 

07-16-2021

It’s interesting how often I seem to go through similar development projects with different clients. Within a short period of time recently, I had several clients who are looking at DDR3/4 implementations or struggling with their implementation. As it just so happens, now I have several clients who are doing interesting things with MicroBlaze.

Spartan 7 Devices often use MicroBlaze Implementations.png

<Spartan 7 Devices often use MicroBlaze Implementations>

 

These projects range from Triple Modular Redundant versions flying in space to implementing TinyML and machine learning analyzing sensors on a IOT board.

Of course, that does mean we want our MicroBlaze solution to be as optimal as possible to increase performance in these applications and others. This is where the approach we take for memory deployment and organization is most critical in order to achieve the desired performance. At the basic level, our choices for the MicroBlaze program execution and data storage are either internal from Block RAM or external executing from DDR. However, there is always a little bit more to it than that.

1.PNG

Executing from BRAM will give the highest performance because there is no need to go off chip. However, BRAM in FPGAs is not infinite and is also needed for other elements in the design.It stands to reason then that any reasonable application requires eternal memory to execute from. In the case of many systems, this is DDR3 or DDR4. This allows for much larger applications and even the implementation of operating systems such as PetaLinux or popular real-time systems like FreeRTOS.

The use of external memory comes with a little more complexity in the design solution. If we wish to execute the application only from BRAM, the application ELF can be merged with the FPGA Bit file. Following programming, the MicroBlaze will be configured in the logic and the program will execute from BRAM, thereby making the MicroBlaze boot process straight forward.

If, however, we wish to execute the program from DDR3/4, we need to ensure the application software is stored in a non-volatile memory which is often an unused section of the configuration device. A boot loader application is then merged with the Bit file that runs from BRAM following device configuration to cross load the application from the configuration memory to the DDR3/4 memory before the start of execution.

We need to utilize a cache to get the best performance from the MicroBlaze when external DDR3/4 memory is used. Using a cache for both the data and instructions memory spaces enables higher system performance because the critical instructions and data are held closer to the processor. If cache hit occurs, a cache miss will mean the processor has to read in the value and update the cache. Caching and performance is a complex subject especially in multiprocessor systems where cache coherence is required across the multiple processors.

2.PNG

 

In a MicroBlaze system, the cache is implemented in BRAM and provides much faster access than accessing off-chip memories. The memory range allocated for the cache coverage must be outside the Local Memory Bus (LMB) address range. With both the cache and LMB implemented in BRAM, it would have little performance impact to cache the LMB.

3.PNG

 

Configuring the cache in the MicroBlaze is a straightforward once we have enabled the cache in the configuration wizard. We can configure the exact data and instruction caches. Configurable elements include:

  • Cache Size – The number of KB allocated for the cache.
  • Cache Line Size – The number of words which form a cache line. This is a balance depending on the application style. If the application has lots of sequential memory access, a long cache word is sensible. However, if the program has lots of random access, then a shorter cache word will present a better solution. In the MicroBlaze, we can select 4-, 8- or 16-word line length.
  • Data Width – This is the width of the data bus used for the BRAM. Do we want a fixed width with 32 bits or set to match the cache line size.

To achieve the best performance, we can also distribute RAM to contain the tags for each cache line. This has the advantage of reducing the BRAM required and increasing the maximum frequency.

Implementing the most optimal MicroBlaze memory architecture will help you achieve the performance requirements placed upon your development, reduce the time it takes to optimize the solution for performance, and allow more time to focus on application development.

Keep an eye out for an upcoming blog where we will look at the differences that using a cache can make versus not using a cache. We will also explore different cache configurations.