Estimating Performance
Profiling and Instrumenting Code to Measure Performance
The first major task in profiling and instrumenting code is to identify
portions of application code that are suitable for implementation in hardware, and
that significantly improve overall performance when run in hardware. Compute
intensive regions of code are good candidates for hardware acceleration, especially
when it is possible to stream data between hardware, the CPU, and memory to overlap
the computation with the communication. Software profiling is a standard way to
identify the most CPU-intensive portions of your program. An example of a function
that would not do well for acceleration is one that takes more time to transfer data
to/from the accelerator than to compute the result. The SDSoC™ environment includes all performance and profiling
capabilities that are included in the Xilinx®
SDK tool, including gprof
, the non-intrusive Target Communication Framework (TCF)
profiler, and the Performance Analysis perspective within Eclipse.
To run the TCF Profiler for a standalone application, use the following steps:
- Set the active build configuration to Debug by right-clicking the project in the Project Explorer and selecting .
- Launch the debugger by right-clicking the project name in the
Project Explorer and selecting Note: The board must be connected to your computer and powered on. The application automatically breaks at the entry to
main()
.
. - Launch the TCF Profiler by selecting Debug, and select TCF profiler. . In the window that is produced, expand
- To start the TCF Profiler, click the green Start button at the top of the TCF Profiler tab.
- Enable Aggregate per function in the Profiler Configuration dialog box.
- To start the profiling, click the Resume button or press F8. The program runs to completion and breaks at the
exit()
function. - View the results in the TCF Profiler tab.
Profiling provides a statistical method for finding highly used regions of code based on sampling the CPU program counter and correlating to the program in execution. Another way to measure program performance is to instrument the application to determine the actual duration between different parts of a program in execution.
Using the TCF Profiler provides more in-depth information related to either a standalone or a Linux OS application. As seen in the previous steps, no additional compilation flags were needed to use the Profiler.
The sds_lib
library included in the
SDSoC environment provides a simple, source
code annotation-based, time-stamping API that can be used to measure application
performance, as shown in the following example:
/*
* @return value of free-running 64-bit Zynq(TM) global counter
*/
unsigned long long sds_clock_counter(void);
Using this API to collect timestamps and differences between them, you can determine duration of key parts of your program. For example, you can measure data transfer or overall round trip execution time for hardware functions, as shown in the following code snippet:
class perf_counter
{
public:
uint64_t tot, cnt, calls;
perf_counter() : tot(0), cnt(0), calls(0) {};
inline void reset() { tot = cnt = calls = 0; }
inline void start() { cnt = sds_clock_counter(); calls++; };
inline void stop() { tot += (sds_clock_counter() - cnt); };
inline uint64_t avg_cpu_cycles() { return (tot / calls); };
};
extern void f();
void measure_f_runtime()
{
perf_counter f_ctr;
f_ctr.start();
f()
f_ctr.stop();
std::cout << "Cpu cycles f(): " << f_ctr.avg_cpu_cycles()
<< std::endl;
}
The performance estimation feature within the SDSoC environment employs this API by automatically instrumenting functions selected for hardware implementation, measuring actual runtimes by running the application on the target, and then comparing actual times with estimated times for the hardware functions.
SDSCC/SDS++ Performance Estimation Flow Options
A full bitstream compile can take much more time than a software compile, so
the sds++/sdscc
(referred to as sds++
) applications provide performance estimation options to compute the
estimated runtime improvement for a set of hardware function calls.
In the Application Project Settings pane, to invoke the estimator, select the Estimate Performance check box. This enables performance estimation for the current build configuration and builds the project.
Estimating the speed-up is a two phase process:
- The SDSoC environment compiles the hardware functions
and generates the system. Instead of synthesizing the system to bitstream, the
sds++
computes an estimate of the performance based on estimated latencies for the hardware functions and data transfer time estimates for the callers of hardware functions. - In the generated Performance Report, to determine a performance baseline and the performance estimate, select Click Here to run an instrumented version of the software on the target.
See the SDSoC Environment Getting Started Tutorial (UG1028) for a tutorial on how to use the Performance Report.
You can also generate a performance estimate from the command line. As a
first pass to gather data about software runtime, use the -perf-funcs
option
to specify functions to profile and -perf-root
to specify the root function
encompassing calls to the profiled functions.
The sds++
system compiler then automatically
instruments these functions to collect runtime data when the application is run on a board.
When you run an instrumented application on the target, the program creates a file on the SD
card called swdata.xml, which contains the runtime
performance data for the run.
Copy the swdata.xml to
the host, and run a build that estimates the performance gain on a per hardware function
caller basis and for the top-level function specified by the –perf-root
function in the first pass run. Use the –perf-est
option to specify swdata.xml as
input data for this build.
The following table specifies the sds++
system compiler
options normally used to build an application.
Option | Description |
---|---|
-perf-funcs function_name_list |
Specifies a comma separated list of all functions to be profiled in the instrumented software application. |
-perf-root function_name |
Specifies the root function encompassing all calls to the profiled functions. The default is the function main. |
-perf-est data_file |
Specifies the file containing runtime data generated by the instrumented software application when run on the target. Estimate performance gains for hardware accelerated functions. The default name for this file is swdata.xml. |
-perf-est-hw-only |
Runs the estimation flow without running the first pass to collect software run data. Using this option provides hardware latency and resource estimates without providing a comparison against baseline. |
cd /; sync; umount /mnt;
. This ensures that the swdata.xml file is written out to the SD card.