Profiling and Instrumenting Code to Measure Performance
The first major task in creating a software-defined SoC is to identify portions of application code that are suitable for implementation in hardware, and that significantly improve overall performance when run in hardware. Program hot-spots that are compute-intensive are good candidates for hardware acceleration, especially when it is possible to stream data between hardware and the CPU and memory to overlap the computation with the communication. Software profiling is a standard way to identify the most CPU-intensive portions of your program.
The SDSoC environment includes all performance and profiling capabilities that are included
in the Xilinx SDK, including gprof,
the non-intrusive Target Communication
Framework (TCF) Profiler, and the Performance Analysis perspective within Eclipse.
- Set the active build configuration to SDDebug by right-clicking on the project in the Project Explorer and selecting .
- In the SDSoC Project Overview window, click on Debug
application.Note: The board must be connected to your computer and powered on. The application automatically breaks at the entry to
main()
. - Launch the TCF Profiler by selecting .
- Start the TCF Profiler by clicking on the green Start button at the top of the TCF Profiler tab. Enable Aggregate per function in the Profiler Configuration dialog box.
- Start the profiling by clicking on the Resume button. The program
runs to completion and breaks at the
exit()
function. - View the results in the TCF Profiler tab.
Profiling provides a statistical method for finding hot spots based on sampling the CPU program counter and correlating to the program in execution. Another way to measure program performance is to instrument the application to determine the actual duration between different parts of a program in execution.
The sds_lib
library included in the SDSoC environment provides a simple,
source code annotation based time-stamping API that can be used to measure application
performance.
/*
* @return value of free-running 64-bit Zynq(TM) global counter
*/
unsigned long long sds_clock_counter(void);
class perf_counter
{
public:
uint64_t tot, cnt, calls;
perf_counter() : tot(0), cnt(0), calls(0) {};
inline void reset() { tot = cnt = calls = 0; }
inline void start() { cnt = sds_clock_counter(); calls++; };
inline void stop() { tot += (sds_clock_counter() - cnt); };
inline uint64_t avg_cpu_cycles() { return (tot / calls); };
};
extern void f();
void measure_f_runtime()
{
perf_counter f_ctr;
f_ctr.start();
f()
f_ctr.stop();
std::cout << "Cpu cycles f(): " << f_ctr.avg_cpu_cycles()
<< std::endl;
}
The performance estimation feature within the SDSoC environment employs this API by automatically instrumenting functions selected for hardware implementation, measuring actual run-times by running the application on the target, and then comparing actual times with estimated times for the hardware functions.