Introduction to Debugging in SDSoC
The SDSoC™ environment includes an Eclipse-based integrated development environment (IDE) for implementing heterogeneous embedded systems. SDSoC supports Arm® Cortex™-based applications using the Zynq®-7000 SoC and Zynq® UltraScale+™ MPSoC devices, as well as MicroBlaze™ processor-based applications on all Xilinx® SoCs and FPGAs.
This user guide introduces the debugging capabilities of the SDSoC environment, and provides you with detailed instructions on how to analyze any failure encountered within the SDSoC flow.
SDSoC Environment Overview
The SDSoC environment includes a system compiler that transforms C/C++ programs into complete hardware/software systems with select functions compiled into the programmable logic (PL). The SDSoC system compiler analyzes a program to determine the data flow between software and hardware functions, and generates an application-specific system-on-chip (SoC) to realize the program.
To achieve high performance, each hardware function runs as an independent thread; the system compiler generates hardware and software components that ensure synchronization between hardware and software threads, while enabling pipelined computation and communication. Application code can involve many hardware functions, multiple instances of a specific hardware function, and calls to a hardware function from different parts of the program.
The SDx integrated development environment (IDE) supports software development workflows including profiling, compilation, linking, system performance analysis, and debugging. It also provides a fast performance estimation capability to enable exploration of the hardware/software interface before committing to a full hardware compile.
The SDSoC system compiler targets a base platform and invokes the Vivado® High-Level Synthesis (HLS) tool to compile synthesizable C/C++ functions into programmable logic. The system compiler then generates a complete hardware system, including DMAs, interconnects, hardware buffers, other IP, and the FPGA bitstream by invoking the Vivado Design Suite tools. To ensure that all hardware function calls preserve their original behavior, the SDSoC system compiler generates system-specific software stubs and configuration data. The program includes the function calls to drivers required to use the generated IP blocks. Application and generated software is compiled and linked using a standard GNU toolchain.
By generating complete applications from a single source, the system compiler lets you iterate over design and architecture changes by refactoring at the program level, which reduces the time needed to achieve working programs running on the target platform.
Terminology
The following terms are widely used while designing in the SDSoC environment. The terms and their definitions are provided below.
- Accelerator
- Portions of the application code that have been implemented in the hardware in the FPGA general interconnect. These are also called hardware functions.
- Data Mover
- The data mover transfers data between accelerators, and between the processing system (PS) and accelerators. The SDSoC environment can generate various types of data movers based on the properties and size of the data being transferred.
- Pipelining
- Pipelining is a technique to increase instruction-level parallelism in the hardware implementation of an algorithm by overlapping independent stages of operations or functions. The data dependence in the original software implementation is preserved for functional equivalence, but the required circuit is divided into a chain of independent stages. All stages in the chain run in parallel on the same clock cycle. The only difference is the source of data for each stage. Each stage in the computation receives its data values from the result computed by the preceding stage during the previous clock cycle.
- Pragma
- Special directives that can be inserted into the source code to guide the system compiler. In the SDSoC environment, you control the system generation process by structuring hardware functions and calls to hardware functions in a way that balances communication and computation, and by inserting pragmas into your source code to guide the system compiler.
- Processor
- Processors in the context of the SDSoC environment mean a soft processor such as a MicroBlaze processor, or a hard processor such as the Arm processors on Zynq-7000 SoCs and Zynq UltraScale+ MPSoCs.
- System Port
- A system port connects a data mover to the PS. It can be an ACP, AFI (corresponding to high-performance ports), MIG (corresponding to a PL-based DDR memory controller), or a stream port on the Zynq.
Elements of SDSoC
The SDSoC environment includes the following features:
- The
sds++
system compiler, which generates complete hardware/software systems. Thesds++
system compiler employs underlying features from the Vivado Design Suite System Edition, including the Vivado High-Level Synthesis (HLS) tool, Vivado IP integrator, IP libraries for data movement and interconnect, and tools for RTL synthesis, placement, routing, and bitstream generation. - An Eclipse-based integrated development environment (IDE) to create and manage application projects and workflows.
- A system performance estimation capability to explore different scenarios for the hardware/software interface.
The SDSoC environment also inherits many of the tools in the Xilinx Software Development Kit (SDK), including GNU toolchains for Zynq-7000 SoCs and Zynq UltraScale+ MPSoCs, standard libraries (for example, glibc), and the Target Communication Framework (TCF) for communicating with embedded processor targets. It also features a performance analysis perspective within the Eclipse/CDT-based IDE.
The sds++
system compiler generates an
application-specific system-on-chip for a targeted platform. The environment includes a
number of standard base platforms for application development, and other platforms can
be developed by third-party partners, or by SDSoC
design teams. The SDSoC Environment Platform
Development Guide describes how to
create a hardware platform design in the Vivado Design Suite, configure platform interfaces, and define the
corresponding software runtime environment to build a platform for use in the SDx™ IDE.
Execution Model of an SDSoC Application
The execution model for an SDSoC environment application can be understood in terms of the normal execution of a C++ program running on the target CPU after the platform has booted. It is useful to understand how a C++ binary executable interfaces to hardware.
The set of declared hardware functions within a program is compiled into
hardware accelerators that are accessed with the standard C runtime through calls
into these functions. Each hardware function call in effect invokes the accelerator
as a task and each of the arguments to the function is transferred between the CPU
and the accelerator, accessible by the program after accelerator task completion.
Data transfers between memory and accelerators are accomplished through data movers,
such as a DMA engine, automatically inserted into the system by the sds++
system compiler taking into account user data mover pragmas such as zero_copy
.
To ensure program correctness, the system compiler intercepts each call to a hardware function, and replaces it with a call to a generated stub function that has an identical signature but with a derived name. The stub function orchestrates all data movement and accelerator operation, synchronizing software and accelerator hardware at the exit of the hardware function call. Within the stub, all accelerator and data mover control is realized through a set of send and receive APIs provided by the sds_lib library.
When program dataflow between hardware function calls involves array
arguments that are not accessed after the function calls have been invoked within
the program (other than destructors or free()
calls), and when the hardware accelerators can be connected using streams, the
system compiler transfers data from one hardware accelerator to the next through
direct hardware stream connections, rather than implementing a round trip to and
from memory. This optimization can result in significant performance gains and
reduction in hardware resources.
- Initialization of the sds_lib library occurs during the program constructor before
entering
main()
. - Within a program, every call to a hardware function is
intercepted by a function call into a stub function with the same function
signature (other than name) as the original function. Within the stub
function, the following steps occur:
- A synchronous accelerator task control command is sent to the hardware.
- For each argument to the hardware function, an
asynchronous data transfer request is sent to the appropriate data
mover, with an associated
wait()
handle. A non-void return value is treated as an implicit output scalar argument. - A barrier
wait()
is issued for each transfer request. If a data transfer between accelerators is implemented as a direct hardware stream, the barrierwait()
for this transfer occurs in the stub function for the last in the chain of accelerator functions for this argument.
- Clean up of the sds_lib library occurs during the program destructor, upon
exiting
main()
.
Sometimes, the programmer has insight of the potential concurrent execution
of accelerator tasks that cannot be automatically inferred by the system compiler.
In this case, the sds++
system compiler supports a
#pragma SDS async(ID)
that can be inserted
immediately preceding a call to a hardware function. This pragma instructs the
compiler to generate a stub function without any barrier wait()
calls for data transfers. As a result, after issuing all data
transfer requests, control returns to the program, enabling concurrent execution of
the program while the accelerator is running. In this case, it is your
responsibility to insert a #pragma SDS wait(ID)
within the program at appropriate synchronization points, which are resolved into
sds_wait(ID)
API calls to correctly
synchronize hardware accelerators, their implicit data movers, and the CPU.
async(ID)
pragma requires a matching wait(ID)
pragma.SDSoC Build Process
The SDSoC build process uses a
standard compilation and linking process. Similar to g++
, the sds++
system compiler invokes
sub-processes to accomplish compilation and linking.
As shown in the following figure, compilation is extended not only to
object code that runs on the CPU, but it also includes compilation and linking of
hardware functions into IP blocks using the Vivado
High-Level Synthesis (HLS) tool, and creating standard object files (.o
) using the target CPU toolchain. System linking
consists of program analysis of caller/callee relationships for all hardware functions,
and the generation of an application-specific hardware/software network to implement
every hardware function call. The sds++
system
compiler invokes all necessary tools, including Vivado HLS (function compiler), the Vivado Design Suite to implement the generated hardware system, and the
Arm compiler and sds++
linker to create the application binaries that run on the CPU
invoking the accelerator (stubs) for each hardware function by outputting a complete
bootable system for an SD card.
The compilation process includes the following tasks:
- Analyzing the code and running a compilation for the main application on the Arm core, as well as a separate compilation for each of the hardware accelerators.
- Compiling the application code through standard GNU Arm compilation tools with an object (.o) file produced as final output.
- Running the hardware accelerated functions through the HLS tool to start the process of custom hardware creation with an object (.o) file as output.
After compilation, the linking process includes the following tasks:
- Analyzing the data movement through the design and modifying the hardware platform to accept the accelerators.
- Implementing the hardware accelerators into the programmable logic (PL) region using the Vivado Design Suite to run synthesis and implementation, and generate the bitstream for the device.
- Updating the software images with hardware access APIs to call the hardware functions from the embedded processor application.
- Producing an integrated SD card image that can boot the board with the application in an Executable and Linkable Format (ELF) file.
SDSoC Debug Flow Overview
The systems produced by the SDSoC environment are high-performance, complex, and composed of hardware and software components. It can be difficult to understand the execution of applications in such systems with portions of software running in a processor, hardware accelerators executing in the programmable fabric, and many simultaneous data transfers between them. The SDSoC environment lets you create and debug projects using the Xilinx System Debugger (XSDB), and provides sophisticated hardware/software event tracing, offering an integrated timeline view of data transfers and accelerator tasks, including driver software setup and execution in hardware. Outside the SDx IDE, you can use command line or scripting options to debug your projects.
The SDSoC development environment lets you target the build process of
the compilation, linking commands to either a system emulation target, or to the
hardware target of the specified platform. As an alternative to building a complete
system, you can create a system emulation model that consists of the target platform and
application binaries. For the emulation target, the sds++
system compiler creates a simulation model using the source files
for the accelerator functions.
System emulation is one of the most capable debug features in the SDSoC environment. It can help debug functional issues and determine why an application is hanging. This feature is only available on Xilinx base platforms, including the ZC702, ZC706, ZCU102, ZCU104, ZCU106, and ZedBoard base platforms.
After you identify the hardware functions, you can use system emulation
to quickly compile the logic, and verify the entire system. This provides a Quick
Emulator (QEMU)-based emulator that runs the cross-compiled Arm code, interacting with the hardware accelerator being run in the
Vivado simulator. The RTL simulator can
display waveforms, or it can be run without waveforms for faster simulation. The
emulator can be run within the SDx IDE or on the
command line (sdsoc_emulator
), providing accurate
visibility of the final hardware implementation without the need to compile the system
into a bitstream, and program the device on the board.
When targeting the hardware platform, you can also enable hardware and software event tracing to analyze the execution of events, and identify any issues (see Hardware/Software Event Tracing). If there are problems with respect to the hardware design itself, you can use hardware debug from the Vivado Lab Edition tools by inserting debug cores in the hardware functions implemented in the SDSoC environment. The following flow chart shows a typical hardware build and debug process.
Xilinx base platforms support both system emulation and hardware target builds. Custom and third-party platforms, without emulation capabilities, support only the hardware build and debug flow.
System Emulation
On Xilinx base platforms, you can use
system emulation to debug register transfer level (RTL) transactions in the entire
system (PS and PL). Running your application on the SDSoC emulator (sdsoc_emulator
) gives
you visibility of data transfers with a debugger. You can debug system hangs and inspect
associated data transfers in the simulation waveform view, which gives you visibility
into signals on the hardware blocks associated with the data transfer.
Hardware Execution Flow
During hardware execution, you can use the actual hardware platform to run the accelerated application. You can create a debug configuration of the hardware that includes special debug logic in the accelerators, such as the System Integrated Logic Analyzer (System ILA), Virtual Input/Output (VIO) debug cores, and AXI performance monitors. The SDSoC environment provides specific hardware debug capabilities using the Vivado hardware manager, with waveform analysis, kernel activity reports, and memory access analysis to provide visibility into these critical hardware issues.
In-system debugging lets you debug your design in real time, on your target hardware. This is an essential step in design completion. Invariably, there are situations that are extremely hard to replicate in a simulator. Therefore, there is a need to debug the problem in the running hardware. In this step, you place debug cores into your design to provide you the ability to observe and control the design. After the debugging process is complete, you can remove the debug cores to increase performance and reduce resource usage of the device.
The SDx IDE and command line options provide
ways to instrument your design for debugging. The --dk
compiler switch lets you add ILA debug cores to the interfaces of your hardware
function. To debug C-callable IP that are used in your application code, you must have
instantiated the required debug cores into the RTL code of the IP prior to packaging it
as a C-callable IP.
Connecting to the Hardware
- For standalone and FreeRTOS, you must download the ELF file to the board using the USB/JTAG interface. Trace data is read out over the same USB/JTAG interface as well.
- For Linux, the SDx environment assumes
the OS boots from the SD card. It then copies the
.elf
file and runs it using the TCP/TCF agent running in Linux over the Ethernet connection between the board and host PC. The trace data is read out over the USB/JTAG interface. Both USB/JTAG and TCP/TCF agent interfaces are needed for tracing Linux applications.
Event Tracing
The event tracing feature provides a detailed view of what is happening in the system during the execution of an application. Trace events are produced and gathered into a timeline view, giving you a perspective of the running application. This detailed view can help you understand the performance of your application given the workload, hardware/software partitioning, and system design choices. This view enables event tracing of software running on the processor, as well as hardware accelerators and data transfer links in the system. Such information helps you to identify problems, optimize the design, and improve system implementation.
Tracing an application produces a log that records information about system execution. Compared to event logging, event tracing shows the correlation between events for the duration of the event, rather than an instantaneous event at a particular time. The goal of tracing is to help debug execution by observing what happened when, and how long events took. This is best used to analyze performance and get an indication of whether there is an application hang.