Debugging Applications and Kernels

The Vitis™ unified software platform provides application-level debug features and techniques that allow the host code, kernel code, and the interactions between them to be debugged. These features and techniques are split between software debugging and hardware debugging flows.

For software debugging, the host and kernel code can be debugged using the Vitis IDE, or using GDB from the command line as a standard debug tool.

For hardware debugging, kernels running on hardware can be debugged using Xilinx® virtual cable (XVC) running over the PCIe® bus, for Alveo™ Data Center accelerator cards, and debugged using USB-JTAG cables for both Alveo cards and embedded processor platforms.

Debugging Flows

The Vitis unified software platform provides application-level debug features which allow the host code, the kernel code, and the interactions between them to be efficiently debugged in either the Vitis IDE, or from the command line. The recommended debugging flow consists of three levels of debugging:

Debugging in Software Emulation to confirm the algorithm functionality of the application as represented in both your host program and kernel code.
Debugging in Hardware Emulation to compile the kernel into RTL, confirm the behavior of the generated logic, and evaluate the simulated performance of the hardware.
Debugging During Hardware Execution to implement the FPGA binary and debug the application running in hardware.

This three-tiered approach allows debugging the host and kernel code, and the interactions between them at different levels of abstraction. Each provides specific insights into the design and makes debugging easier. All flows are supported through an integrated GUI flow as well as through a batch flow using basic compile time and runtime setup options.

In the case of applications running on embedded processor platforms, some additional setup is required as described in Debugging on Embedded Processor Platforms.

Debugging in Software Emulation

IMPORTANT: The following steps describe debugging from the command line. However, the Vitis IDE offers a standalone debug environment for use with the Vitis application acceleration projects created from the command line. Refer to Using the Standalone Debug Flow for more information.

The Vitis unified software platform supports typical software debugging for the host code at all times, the kernel code when running in software emulation mode, and at points during hardware emulation mode. This is a standard software debug flow using breakpoints, stepping through code, analyzing variables, and forcing the code into specific states.

The following figure shows the debug flow during software emulation for the host and kernel code (written in C/C++ or OpenCL™) using the GNU debugging (GDB) tool. Notice the two instances of GDB to separately debug the host and kernel processes, and the use of the debug server (xrt_server).

Xilinx recommends iterating the design as much as possible in Software Emulation, which takes little compile time and executes quickly. For more detailed information on software emulation, see Software Emulation.

GDB-Based Debugging

IMPORTANT: Both the host and kernel code must be compiled for debugging using the -g option.

For the GNU debugging (GDB), you can debug the kernel or host code, adding breakpoints, and inspecting variables. This familiar software debug flow allows quick design, compile, and debug to validate the functionality of your application. The Vitis debugger also provides extensions to GDB to let you examine the content of the Xilinx Runtime (XRT) library from the host program. These extensions can be used to debug protocol synchronization issues between the host and the kernel.

The Vitis core development kit supports GDB host program debugging in all flows, but kernel debugging is limited to software and hardware emulation modes. Debugging features need to be enabled in your host and kernel code by using the -g option during compilation and linking.

This section shows how host and kernel debugging can be performed with the help of GDB. Because this flow should be familiar to most software developers, this section focuses on the extensions of host code debugging capabilities for the XRT library and the requirements of kernel debug.

Xilinx Runtime Library GDB Extensions

The Vitis debugger (xgdb) enables new GDB commands that give you visibility from the host application into the XRT library.

Note: If you launch GDB outside of the Vitis debugger, the command extensions need to be enabled using the appdebug.py script as described in Launching Host and Kernel Debug.

There are two kinds of commands which can be called from the gdb command line:

xprint commands that give visibility into XRT library data structures (cl_command_queue, cl_event, and cl_mem). These commands are explained below.
xstatus commands that give visibility into IP running on the Vitis target platform when debugging during hardware execution.

You can get more information about the xprint and xstatus commands by using the help <command> from the gdb command prompt.

A typical application for these commands is when you see the host application hang. In this case, the host application could be waiting for the command queue to finish, or waiting on an event list. Printing the command queue using the xprint queue command can tell you what events are unfinished, allowing you to analyze dependencies between events.

The output of both of these commands is automatically tracked when debugging with the Vitis IDE. In this case, three tabs are provided next to the common tabs for Variables, Breakpoints, and Registers in the upper left corner of the debug perspective. These are labeled Command Queue, Memory Buffers, and Platform Debug, showing the output of xprint queue, xprint mem, and xstatus, respectively.

xprint Commands

The arguments to xprint queue and xprint mem are optional. The application debug environment keeps track of all the XRT library objects and automatically prints all valid queues and cl_mem objects if the argument is not specified. In addition, the commands do a proper validation of supplied command queue, event, and cl_mem arguments.

xprint queue [<cl_command_queue>]
xprint event <cl_event>
xprint mem [<cl_mem>]
xprint kernel
xprint all

xstatus Commands

This functionality is only available in the system flow (hardware execution) and not in any of the emulation flows.

xstatus all
xstatus --<ipname>

GDB Kernel-Based Debugging

GDB kernel debugging is supported for the software emulation and hardware emulation flows. When the GDB executable is connected to the kernel in the IDE or command line flows, you can set breakpoints and query the content of variables in the kernel, similar to normal host code debugging. This is fully supported in the software emulation flow because the kernel GDB processes attach to the spawned software processes.

However, during hardware emulation, the kernel source code is transformed into RTL, created by Vitis HLS, and executed. As the RTL model is simulated, all transformations for performance optimization and concurrent hardware execution are applied. For that reason, not all C/C++/OpenCL lines can be uniquely mapped to the RTL code, and only limited breakpoints are supported and at only specific variables can be queried. Today, the GDB tool therefore breaks on the next possible line based on requested breakpoint statements and clearly states if variables can not be queried based on the RTL transformations.

Command Line Debug Flow

TIP: Set up the command shell or window as described in Setting Up the Vitis Environment prior to running the tools.

The following describes the steps required to run the debug flow in software emulation from the command line. Refer to Using the Vitis IDE for information on debugging in the IDE. Debugging in the Vitis core development kit uses the following steps:

Compiling and linking the host code for debugging by adding the -g option to the g++ command line as described in Building the Host Program.
Compiling and linking the kernel code for debugging by adding the -g option to the v++ command line as described in Building the Device Binary.
Note: When debugging OpenCL kernels, there are additional steps that you can take during compiling and linking as described in Debugging OpenCL Kernels.
Launching GDB to debug the application. This process involves three command target platforms as described in Launching Host and Kernel Debug.

Debugging OpenCL Kernels

For OpenCL kernels, additional runtime checks can be performed during software emulation. These additional checks include:

Checking whether an OpenCL kernel makes out-of-bounds accesses to the interface buffers (fsanitize=address).
Checking whether the kernel makes accesses to uninitialized local memory (fsanitize=memory).

These are Vitis compiler options that are enabled through the --advanced compiler option as described in --advanced Options, using the following command syntax:

--advanced.param compiler.fsanitize=address,memory

When applied, the emulation run produces a debug log with emulation diagnostic messages that are written to <project_dir>/Emulation-SW/<proj_name>-Default>/emulation_debug.log.

The fsanitize directive can also be specified in a config file, as follows:

[advanced]
#param=<param_type>:<param_name>.<value>
param=compiler.fsanitize=address,memory

Then the config file is specified on the v++ command line:

v++ -l –t sw_emu --config ./advanced.cfg -o bin_kernel.xclbin

Refer to the Vitis Compiler Configuration File for more information on the --config option.

Launching Host and Kernel Debug

In software emulation, to better model the hardware accelerator, the execution of the FPGA binary is spawned as a separate process. If you are using GDB to debug the host code, breakpoints set in kernel code are not encountered because the kernel code is not run within the host code process. To support the concurrent debugging of the host and kernel code, the Vitis debugger provides a system to attach to spawned kernels through the use of the debug server (xrt_server). To connect the host and kernel code to the debug server, you must open three terminal windows using the following process.

TIP: This flow should also work while using a graphical front-end for GDB, such as the data display debugger (DDD) available from GNU. The following steps are the instructions for launching GDB.

Open three terminal windows, and set up each window as described in Setting Up the Vitis Environment. The three windows are for:
- Running xrt_server
- Running GDB (xgdb) on the Host Code
- Running GDB (xgdb) on the Kernel Code
In the first terminal, after setting up the terminal environment, start the Vitis debug server using the following command:
```
xrt_server --sdx-url
```
The debug server listens for debug commands from the host and kernel, connecting the two processes to create a single debug environment. The xrt_server returns a listener port <num> on standard out. Keep track of the listener port number returned as this port is used by GDB to debug the kernel process. To control this process, you must start new GDB instances and connect to the xrt_server. You will do this in the next steps.
IMPORTANT: With the xrt_server running, all spawned GDB processes wait for control from you. If no GDB ever attaches to the xrt_server, or provides commands, the kernel code appears to hang.
In a second terminal, after setting up the terminal environment, launch GDB for the host code as described in the following steps:
1. Set the ENABLE_KERNEL_DEBUG environment variable. For example, in a C-shell use the following:
```
setenv ENABLE_KERNEL_DEBUG true
```
2. Set the XCL_EMULATION_MODE environment variable to sw_emu mode as described in Running the Application Hardware Build. For example, in a C-shell use the following:
```
setenv XCL_EMULATION_MODE sw_emu
```
3. The runtime debug feature must be enabled using an entry in the xrt.ini file, as described in xrt.ini File. Create an xrt.ini file in the same directory as your host executable, and include the following lines:
```
[Debug]
app_debug=true
```
  This informs the runtime library that the kernel has been compiled for debug, and that XRT library should enable debug features.
4. Start gdb through the Xilinx wrapper:
```
xgdb --args <host> <xclbin>
```
  Where <host> is the name of your host executable, and <xclbin> is the name of the FPGA binary. For example:
```
xgdb --args host.exe vadd.xclbin
```
  Launching GDB from the xgdb wrapper performs the following setup steps for the Vitis debugger:
  - Loads GDB with the specified host program.
  - Sources the Python script from the GDB command prompt to enable the Vitis debugger extensions:
```
gdb> source ${XILINX_XRT}/share/appdebug/appdebug.py
```
In a third terminal, after setting up the terminal environment, launch the xgdb command, and run the following commands from the (gdb) prompt:
- For software emulation:
```
file <Vitis_path>/data/emulation/unified/cpu_em/generic_pcie/model/genericpciemodel
```
  Where <Vitis_path> is the installation path of the Vitis core development kit. Using the $XILINX_VITIS environment variable will not work inside GDB.
- For hardware emulation:
  1. Locate the xrt_server temporary directory:/tmp/sdx/$uid.
  2. Find the xrt_server process id (PID) containing the DWARF file of this debug session.
  3. At the gdb command line, run: file /tmp/sdx/$uid/$pid/NUM.DWARF.
- In either case, connect to the kernel process:
```
target remote :<num>
```
  Where <num> is the listener port number returned by the xrt_server.
TIP: When debugging software/hardware emulation kernels in the Vitis IDE, these steps are handled automatically and the kernel process is automatically attached, providing multiple contexts to debug both the host code and kernel code simultaneously.

With the three terminal windows running the xrt_server, GDB for the host, and GDB for the kernels, you can set breakpoints on your host or kernels as needed, run the continue command, and debug your application. When the all kernel invocations have finished, the host code continues and the xrt_server connection drops.

IMPORTANT: For both software and hardware emulation flows, there are restrictions with respect to the accelerated kernel code debug interactions. Because this code is preprocessed in the software emulation flow, and translated to RTL in the hardware emulation flow, it is not always possible to set breakpoints at all locations. Only a limited number of breakpoints such as preserved loops and functions are supported, especially for hardware emulation. Nevertheless, this setup is useful for debugging the interface of the host code with the kernels.

Using printf() or cout to Debug Kernels

The basic approach to debugging algorithms is to verify key code steps and key data values throughout the execution of the program. For application developers, printing checkpoint statements, and outputting current values in the code is a simple and effective method of identifying issues within the execution of a program. This can be done using the printf() function, or cout for standard output.

C/C++ Kernel

For C/C++ kernel models, printf() is only supported during software emulation and should be excluded from the Vitis HLS synthesis step. In this case, any printf() statement should be surrounded by the following compiler macros:

#ifndef __SYNTHESIS__
    printf("Checkpoint 1 reached");
#endif

For C++ kernels, you can also use cout in your code to add checkpoints or messages used for debugging the code. For example, you might add the following:

std::cout << "TEST " << (match ? "PASSED" : "FAILED") << std::endl;

OpenCL Kernel

The Xilinx Runtime (XRT) library supports the OpenCL™ printf() built-in function within kernels in all build configurations: software emulation, hardware emulation, and during hardware execution.

TIP: The printf() function is only supported in all build configurations for OpenCL kernels. For C/C++ kernels, printf() is only supported in software emulation.

The following is an example of using printf() in the kernel, and the output when the kernel is executed with global size of 8:

__kernel __attribute__ ((reqd_work_group_size(1, 1, 1)))
void hello_world(__global int *a)
{
    int idx = get_global_id(0);

     printf("Hello world from work item %d\n", idx);
     a[idx] = idx;
}

The output is as follows:

Hello world from work item 0
Hello world from work item 1
Hello world from work item 2
Hello world from work item 3
Hello world from work item 4
Hello world from work item 5
Hello world from work item 6
Hello world from work item 7

IMPORTANT: printf() messages are buffered in the global memory and unloaded when kernel execution is completed. If printf() is used in multiple kernels, the order of the messages from each kernel display on the host terminal is not certain. Note, especially when running in hardware emulation and hardware, the hardware buffer size might limit printf output capturing.

Debugging in Hardware Emulation

During hardware emulation, kernel code is compiled into RTL code so that you can evaluate the RTL logic of kernels prior to implementation into the Xilinx device. The host code can be executed concurrently with a behavioral simulation of the RTL model of the kernel, directly imported, or created through Vitis HLS from the C/C++/OpenCL kernel code. For more information, see Hardware Emulation.

The following figure shows the hardware emulation flow diagram which can be used in the Vitis debugger to validate the host code, profile host and kernel performance, give estimated FPGA resource usage, and verify the kernel using an accurate model of the hardware (RTL). GDB can also be used for more traditional software-style debugging of the host and kernel code.

Verify the host code and the kernel hardware implementation is correct by running hardware emulation on a data set. The hardware emulation flow invokes the Vivado logic simulator in the Vitis core development kit to test the kernel logic that is to be executed on the FPGA fabric. The interface between the models is represented by a transaction-level model (TLM) to limit impact of interface model on the overall execution time. The execution time for hardware emulation is longer than software emulation.

TIP: Xilinx recommends that you use small data sets for debug and validation.

During hardware emulation, you can optionally modify the kernel code to improve performance. Iterate your host and kernel code design in hardware emulation until the functionality is correct, and the estimated kernel performance is satisfactory.

GDB-Based Debugging in Hardware Emulation

Debugging using a software-based GDB flow is fully supported during hardware emulation. Because the Vitis debugger maps the RTL code back to the original C/C++ source code, there is no difference in using GDB for debugging in hardware emulation. However, mapping the code limits the breakpoints and observability of the variables in some cases, because during the RTL generation by Vitis HLS, some variables and loops from the kernel source might have been dissolved, or optimized away.

Using GDB for debugging the host and kernel code in hardware emulation uses the same three-terminal process described for software emulation. Refer to the instructions in Command Line Debug Flow for details of running this flow.

Waveform-Based Kernel Debugging

Because the C/C++ and OpenCL kernel code is synthesized into RTL code using Vitis HLS in the hardware emulation build configuration, you can also use RTL behavioral simulation to analyze the kernel logic. Hardware designers are likely to be familiar with this approach. This waveform-based HDL debugging is supported by the Vitis core development kit using both the command line flow, or through the IDE flow during hardware emulation.

TIP: Waveform-based debugging is considered an advanced feature. In most cases, the RTL Logic does not need to be analyzed.

Enable Waveform Debugging with the Vitis Compiler Command

The waveform debugging process can be enabled through the v++ command using the following steps:

Enable debug features in the kernel code during compilation and linking, as described in Building the Device Binary.
```
v++ -g ...
```
Create an xrt.ini file in the same directory as the host executable, as described in xrt.ini File, with the following contents:
```
[Emulation]
debug_mode=batch

[Debug]
profile=true
timeline_trace=true
data_transfer_trace=fine
```
Run the application, host and kernel, in hardware emulation mode. The waveform database, reflecting the hardware transaction data, is collected in a file named <hardware_platform>-<device_id>-<xclbin_name>.wdb. This file can directly be opened in the Vitis analyzer as described in Using the Vitis Analyzer.
TIP: If debug_mode=gui in the xrt.ini, a live waveform viewer is launched when the application is run, as described in Waveform View and Live Waveform Viewer. This is especially useful when debugging a hw_emu hang issue, because you can interrupt the simulation process in the simulator and observe the waveform up to that time.

Run the Waveform-Based Kernel Debugging Flow

The Vitis IDE provides waveform-based HDL debugging in the hardware emulation mode. The waveform is opened in the Vivado waveform viewer which should be familiar to Vivado logic simulation users. The Vitis IDE lets you display kernel interfaces, internal signals, and includes debug controls such as restart, HDL breakpoints, as well as HDL code lookup and waveform markers. In addition, it provides top-level DDR data transfers (per bank) along with kernel-specific details including compute unit stalls, loop pipeline activity, and data transfers.

For details, see Waveform View and Live Waveform Viewer.

If the live waveform viewer is activated, the waveform viewer automatically opens when running the executable. By default, the waveform viewer shows all interface signals and the following debug hierarchy:

Memory Data Transfers: Shows data transfers from all compute units funnel through these interfaces.
TIP: These interfaces could be a different bit width from the compute units. If so, then the burst lengths would be different. For example, a burst of sixteen 32-bit words at a compute unit would be a burst of one 512-bit word at the OCL master.

Kernel <kernel name><workgroup size> Compute Unit<CU name>: Kernel name, workgroup size, and compute unit name.

CU Stalls (%): This shows a summary of stalls for the entire CU. A bus of all lowest-level stall signals is created, and the bus is represented in the waveform as a percentage (%) of those signals that are active at any point in time.

Data Transfers: This shows the data transfers for all AXI masters on the CU.

User Functions: This lists all of the functions within the hierarchy of the CU.

Function: <function name>: This is the function name.

Dataflow/Pipeline Activity: This shows the function-level loop dataflow/pipeline signals for a CU.

Function Stalls: This lists the three stall signals within this function.

Function I/O: This lists the I/O for the function. These I/O are of protocol -m_axi, ap_fifo, ap_memory, or ap_none.

TIP: As with any waveform debugger, additional debug data of internal signals can be added by selecting the instance of interest from the scope menu and the signals of interest from the object menu. Similarly, debug controls such as HDL breakpoints, as well as HDL code lookup and waveform markers are supported. Refer to the Vivado Design Suite User Guide: Logic Simulation (UG900) for more information on working with the waveform viewer.

Debug Techniques for Hardware Emulation

Due to the approximate models used in hardware emulation, the behavior of an emulated system might not match the hardware. The following list provides some common issues to examine if your application does not give expected results during hardware emulation:

Review the host application to ensure that the event dependency between different kernel runs is correctly captured. Such issues can lead to unpredictable behavior. It is also possible that the application can pass in hardware, but there could be a logical bug in your application which can be triggered on hardware under slightly different conditions.
If you have an RTL kernel, run the application in debug mode and ensure that are no "X" (undriven values) in simulation in the kernel. This indicates incorrect code which can work in hardware but will fail in simulation with unpredictable behavior. If it is an HLS-generated kernel, confirm that all the variables are initialized to appropriate values.
Ensure that the amount of data being processed by kernels in hardware emulation is small so that emulation can finish in a reasonable time. Otherwise, it can appear that the application is running forever or has "hung". In this case, when running the application in hardware emulation look for INFO: [Vitis-EM 22] messages in the host application console. Check that the amount of data being read/written to or from global memory is increasing:
1. If the RD/WR data is increasing, this indicates that application and hardware execution is progressing. The application is not hung, but is taking a really long time to complete. This could be due to large data size or due to kernels performing memory read/write in an inefficient manner. The application and kernel needs to be optimized.
2. If the RD/WR data is not increasing in successive messages, this indicates that simulation is running but there is a deadlock in the hardware somewhere — either in the kernel or rest of the platform. Review the AXI transactions at the boundary of kernel, interconnect (for example, sdx_memss), and other places to check if there is an incomplete transaction or whether any transaction is being generated by the kernel.
Run hardware emulation in waveform mode and also review at the timeline trace. Check whether the kernel is getting "started" and "done" by observing the traffic on its AXI4-Lite interface, or by observing the output interrupt from the kernel.
Review the [Emulation] section of the xrt.ini File to enable applicable settings that can help to narrow down the issue in your application or kernel.

Debugging During Hardware Execution

During hardware execution, the actual hardware platform is used to execute the kernels, and you can evaluate the performance of the host program and accelerated kernels just by running the application. However, debugging the hardware build requires additional logic to be incorporated into the application. This will impact both the FPGA resources consumed by the kernel and the performance of the kernel running in hardware. The debug configuration of the hardware build includes special ChipScope debug cores, such as Integrated Logic Analyzer (ILA) and Virtual Input/Output (VIO) cores, and AXI performance monitors for debug purposes.

TIP: The additional logic required for debugging the hardware should be removed from the final production build.

The following figure shows the debug process for the hardware build, including debugging the host code using GDB, and using the Vivado hardware manager, with waveform analysis, kernel activity reports, and memory access analysis to identify and localize hardware issues.

With the system hardware build configured for debugging, the host program running on the CPU and the Vitis accelerated kernels running on the Xilinx device can be confirmed to be executing correctly on the actual hardware of the target platform. Some of the conditions that can be identified and analyzed include the following:

System hangs caused by protocol violations:
- These violations can take down the entire system.
- These violations can cause the kernel to get invalid data or to hang.
- It is hard to determine where or when these violations originated.
- To debug this condition, you should use an ILA triggered off of the AXI protocol checker, which needs to be configured on the Vitis target platform.
Problems with the hardware kernel:
- Problems sometimes caused by the implementation: timing issues, race conditions, and bad design constraints.
- Functional bugs that hardware emulation does not reveal.
Performance issues:
- For example, the frames per second processing is not what you expect.
- You can examine data beats and pipelining.
- Using an ILA with trigger sequencer, you can examine the burst size, pipelining, and data width to locate the bottleneck.

Enabling Kernels for Debugging with Chipscope

System ILA

The key to hardware debugging lies in instrumenting the kernels with the required debug logic. The following topic discusses the v++ linker options that can be used to list the available kernel ports, enable the System Integrated Logic Analyzer (ILA) core on selected ports, and enable the AXI Protocol Checker debug core for checking for protocol violations.

The ILA core provides transaction-level visibility into an instance of a compute unit (CU) running on hardware. AXI traffic of interest can also be captured and viewed using the ILA core. The ILA provides custom event triggering on one or more signals to allow waveform capture at system speeds. The waveforms can be analyzed in a viewer and used to debug hardware, finding protocol violations, or performance issues. It can also be crucial for debugging difficult situation like application hangs.

Captured data can be accessed through the Xilinx virtual cable (XVC) using the Vivado tools. See the Vivado Design Suite User Guide: Programming and Debugging (UG908) for complete details.

The ILA core can be added to an existing RTL kernel to enable debugging features within that design, or it can be inserted automatically by the v++ compiler during the linking stage. The v++ command provides the --debug option as described in --debug Options to attach System ILA cores at the interfaces to the kernels for debugging and performance monitoring purposes.

Note: ILA debug cores require system resources, including logic and local memory to capture and store the signal data. Therefore they provide excellent visibility into your kernel, but they can affect both performance and resource utilization.

The -–debug option to enable ILA IP core insertion has the following syntax:

 --debug.chipscope <cu_name>[:<interface_name>]>

In general, the <interface_name> is optional. If not specified, all ports are expected to be analyzed.

AXI Protocol Checker

The AXI Protocol Checker core monitors AXI interfaces. When attached to an interface, it actively checks for protocol violations and provides an indication of which violation occurred. You can assign it for all CUs in the design, or for specific CUs and ports.

The -–debug option to enable AXI Protocol Checker insertion has the following syntax:

 --debug.protocol all

The protocol checker can be specified with the keyword all, or the <cu_name>:<interface_name>.

Note: The --debug.list_ports option can be specified to return the actual names of ports on the kernel to use with protocol or chipscope.

An example flow you could use for adding ILA or protocol checkers to your design is outlined below:

Compile the kernel source files into an XO file, using the -g option to instrument the kernel for debug features:
```
v++ -c -g -k <kernel_name> --platform <platform> -o <kernel_xo_file>.xo <kernel_source_files> 
```
After the kernel has been compiled into an XO file, use --debug.list_ports to cause the v++ compiler to print the list of valid compute units and port combinations for the kernel:
```
v++ -l -g --platform <platform> --connectivity.nk <kernel_name>:<compute_units>:<kernel_nameN> 
--debug.list_ports <kernel_xo_file>.xo 
```

Add the ILA or AXI debug cores on the desired ports by replacing list_ports with the appropriate --debug.chipscope or --debug.protocol command syntax:

v++ -l -g --platform <platform> --connectivity.nk <kernel_name>:<compute_units>:<kernel_nameN> 
--debug.chipscope <compute_unit_name>:<interface_name> <kernel_xo_file>.xo

TIP: The --debug option can be specified multiple times in a single v++ command line, or configuration file to specify multiple CUs and interfaces.

When the design is built, you can debug the design using the Vivado hardware manager as described in Debugging with ChipScope.

Adding Debug IP to RTL Kernels

IMPORTANT: This debug technique requires familiarity with the Vivado Design Suite, and RTL design.

You can also enable debugging in RTL kernels by manually adding ChipScope debug cores like the ILA and VIO in your RTL kernel code before packaging it for use in the Vitis development flow. From within the Vivado Design Suite, edit the RTL kernel code to manually instantiate an ILA debug core, or VIO IP from the Xilinx IP catalog, similar to using any other IP in Vivado IDE. Refer to the HDL Instantiation flow in the Vivado Design Suite User Guide: Programming and Debugging (UG908) to learn more about adding debug cores to your design.

The best time to add debug cores to your RTL kernel is when you create it. However, debug cores consume device resources and can affect performance, so it is good practice to make one kernel for debug and a second kernel for production use. The rtl_vadd_hw_debug of the RTL Kernels examples on GitHub shows an ILA debug core instantiated into the RTL kernel source file. The ILA monitors the output of the combinatorial adder as specified in the src/hdl/krnl_vadd_rtl_int.sv file.

	// ILA monitoring combinatorial adder
	ila_0 i_ila_0 (
		.clk(ap_clk),              // input wire        clk
		.probe0(areset),           // input wire [0:0]  probe0  
		.probe1(rd_fifo_tvalid_n), // input wire [0:0]  probe1 
		.probe2(rd_fifo_tready),   // input wire [0:0]  probe2 
		.probe3(rd_fifo_tdata),    // input wire [63:0] probe3 
		.probe4(adder_tvalid),     // input wire [0:0]  probe4 
		.probe5(adder_tready_n),   // input wire [0:0]  probe5 
		.probe6(adder_tdata)       // input wire [31:0] probe6
	);

You can also add the ILA debug core using a Tcl script from within an open Vivado project, using the Netlist Insertion flow described in Vivado Design Suite User Guide: Programming and Debugging (UG908), as shown in the following Tcl script example:

create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name ila_0
set_property -dict [list CONFIG.C_PROBE6_WIDTH {32} CONFIG.C_PROBE3_WIDTH {64} \
CONFIG.C_NUM_OF_PROBES {7} CONFIG.C_EN_STRG_QUAL {1} CONFIG.C_INPUT_PIPE_STAGES {2} \
CONFIG.C_ADV_TRIGGER {true} CONFIG.ALL_PROBE_SAME_MU_CNT {4} CONFIG.C_PROBE6_MU_CNT {4} \
CONFIG.C_PROBE5_MU_CNT {4} CONFIG.C_PROBE4_MU_CNT {4} CONFIG.C_PROBE3_MU_CNT {4} \
CONFIG.C_PROBE2_MU_CNT {4} CONFIG.C_PROBE1_MU_CNT {4} CONFIG.C_PROBE0_MU_CNT {4}] [get_ips ila_0]

After the RTL kernel has been instrumented for debug with the appropriate debug cores, you can analyze the hardware in the Vivado hardware manager as described in Debugging with ChipScope.

Enabling ILA Triggers for Hardware Debug

To perform hardware debug of both the host program and the kernel code running on the target platform, the application host code must be modified to let you set up the ILA trigger conditions after the kernel has been programmed into the device, but before starting the kernel.

Adding ILA Triggers Before Starting Kernels

Pausing the host program can be accomplished through the use of a pause, or wait step in the code, such as the wait_for_enter function used in the RTL Kernel example on GitHub. The function is defined in the src/host.cpp code as follows:

void wait_for_enter(const std::string &msg) {
    std::cout << msg << std::endl;
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}

The wait_for_enter function is used in the main function as follows:

....
    std::string binaryFile = xcl::find_binary_file(device_name,"vadd");
	
    cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
    devices.resize(1);
    cl::Program program(context, devices, bins);
    cl::Kernel krnl_vadd(program,"krnl_vadd_rtl");


     wait_for_enter("\nPress ENTER to continue after setting up ILA trigger...");

    //Allocate Buffer in Global Memory
    std::vector<cl::Memory> inBufVec, outBufVec;
    cl::Buffer buffer_r1(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            vector_size_bytes, source_input1.data());
    ...

    //Copy input data to device global memory
    q.enqueueMigrateMemObjects(inBufVec,0/* 0 means from host*/);

    //Set the Kernel Arguments
    ...

    //Launch the Kernel
    q.enqueueTask(krnl_vadd);

The use of the wait_for_enter function pauses the host program to give you time to set up the required ILA triggers and prepare to capture data from the kernel. After the Vivado hardware manager is set up and configured, press Enter to continue running the application.

For C++ host code, add a pause after the creation of the cl::Kernel object, as shown in the example above.
For C-language host code, add a pause after the clCreateKernel() function call:

Pausing the Host Application Using GDB

If you are running GDB to debug the host program at the same time as performing hardware debug on the kernels, you can also pause the host program as needed by inserting a breakpoint at the appropriate line of code. Instead of making changes to the host program to pause the application as needed, you can set a breakpoint prior to the kernel execution in the host code. When the breakpoint is reached, you can set up the debug ILA triggers in Vivado hardware manager, arm the trigger, and then resume the host program in GDB.

Debugging with ChipScope

You can use the ChipScope debugging environment and the Vivado hardware manager to help you debug your host application and kernels quickly and more effectively. These tools enable a wide range of capabilities from logic to system-level debug while your kernel is running in hardware. To achieve this, at least one of the following must be true:

Your Vitis application project has been designed with debug cores, using the --debug.xxx compiler switch, as described in Enabling Kernels for Debugging with Chipscope.
The RTL kernels used in your project must have been instantiated with debug cores (as described in Adding Debug IP to RTL Kernels).

Checking the FPGA Board for Hardware Debug Support

Supporting hardware debugging requires the platform to support several IP components, most notably the Debug Bridge. Talk to your platform designer to determine if these components are included in the target platform. If a Xilinx platform is used, debug availability can be verified using the platforminfo utility to query the platform. Debug capabilities are listed under the chipscope_debug objects.

For example, to query the a platform for hardware debug support, the following platforminfo command can be used:

$ platforminfo --json="hardwarePlatform.extensions.chipscope_debug" xilinx_u200_xdma_201830_2
{
    "debug_networks": {
        "user": {
            "name": "User Debug Network",
            "pcie_pf": "1",
            "bar_number": "0",
            "axi_baseaddr": "0x000C0000",
            "supports_jtag_fallback": "false",
            "supports_microblaze_debug": "true",
            "is_user_visible": "true"
        },
        "mgmt": {
            "name": "Management Debug Network",
            "pcie_pf": "0",
            "bar_number": "0",
            "axi_baseaddr": "0x001C0000",
            "supports_jtag_fallback": "true",
            "supports_microblaze_debug": "true",
            "is_user_visible": "false"
        }
    }
}

The response shows that the target platform contains user and mgmt debug networks, supports debugging a MicroBlaze™ processor, and also supports JTAG fallback for the Management Debug Network.

Running XVC and HW Servers

The following steps are required to run the Xilinx virtual cable (XVC) and HW servers, host applications, and also trigger and arm the debug cores in the Vivado hardware manager.

Add debug IP to the kernel as discussed in Enabling Kernels for Debugging with Chipscope.
Modify the host program to pause at the appropriate point as described in Enabling ILA Triggers for Hardware Debug.
Set up the environment for hardware debug, using an automated script described in Automated Setup for Hardware Debug, or manually as described in Manual Setup for Hardware Debug.
Run the hardware debug flow using the following process:
1. Launch the required XVC and the hw_server of the Vivado hardware manager.
2. Run the host program and pause at the appropriate point to enable setup of the ILA triggers.
3. Open the Vivado hardware manager and connect to the XVC server.
4. Set up ILA trigger conditions for the design.
5. Continue execution of the host program.
6. Inspect kernel activity in the Vivado hardware manager.
7. Rerun iteratively from step b (above) as required.

Automated Setup for Hardware Debug

Set up your Vitis core development kit as described in Setting Up the Vitis Environment.
Use the debug_hw script to launch the xvc_pcie and hw_server apps as follows:
```
debug_hw --xvc_pcie /dev/xvc_pub.<driver_id> --hw_server
```
The debug_hw script returns the following:
```
launching xvc_pcie...
xvc_pcie -d /dev/xvc_pub.<driver_id> -s TCP::10200
launching hw_server...
hw_server -sTCP::3121
```
TIP: The /dev/xvc_pub.<driver_id> driver character path is defined on your machine, and can be found by examining the /dev folder.
Modify the host code to include a pause statement after the kernel has been created/downloaded and before the kernel execution is started, as described in Enabling ILA Triggers for Hardware Debug.
Run your modified host program.

Launch Vivado Design Suite using the debug_hw script:

debug_hw --vivado --host <host_name> --ltx_file ./_x/link/vivado/vpl/prj/prj.runs/impl_1/debug_nets.ltx

TIP: The <host_name> is the name of your system.

As an example, the command window displays the following results:

launching vivado... ['vivado', '-source', 'vitis_hw_debug.tcl', '-tclargs', 
'/tmp/project_1/project_1.xpr', 'workspace/vadd_test/System/pfm_top_wrapper.ltx', 
'host_name', '10200', '3121']
 
****** Vivado v2019.2 (64-bit)
  **** SW Build 2245749 on Date Time
  **** IP Build 2245576 on Date Time
    ** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.
 
start_gui

In Vivado Design Suite, run the ILA trigger.
Press Enter to continue running the host program.
In the Vivado hardware manager, see the interface transactions on the kernel compute unit slave control interface in the Waveform view.

Manual Setup for Hardware Debug

TIP: The following steps can be used when setting up Nimbix and other cloud platforms.

There are a few steps required to start the debug servers prior to debugging the design in the Vivado hardware manager.

Set up your Vitis core development kit as described in Setting Up the Vitis Environment.
Launch the xvc_pcie server. The file name passed to xvc_pcie must match the character driver file installed with the kernel device driver, where <driver_id> can be found by examining the /dev folder.
```
>xvc_pcie -d /dev/xvc_pub.<device_id>
```
TIP: The xvc_pcie server has many useful command line options. You can issue xvc_pcie -help to obtain the full list of available options.
Start the hw_server on port 3121, and connect to the XVC server on port 10201 using the following command:
```
>hw_server -e "set auto-open-servers xilinx-xvc:localhost:10201" -e "set always-open-jtag 1"
```
Launch Vivado Design Suite and open the hardware manager:
```
vivado
```

Starting Debug Servers on an Amazon F1 Instance

Instructions to start the debug servers on an Amazon F1 instance can be found here: https://github.com/aws/aws-fpga/blob/master/hdk/docs/Virtual_JTAG_XVC.md.

Debugging Designs Using Vivado Hardware Manager

Traditionally, a physical JTAG connection is used to perform hardware debug for Xilinx devices with the Vivado hardware manager. The Vitis unified software platforms also makes use of the Xilinx virtual cable (XVC) for hardware debugging on remote accelerator cards. To take advantage of this capability, the Vitis debugger uses the XVC server, an implementation of the XVC protocol that allows the Vivado hardware manager to connect to a local or remote target device for debug, using the standard Xilinx debug cores like the ILA or the VIO IP.

The Vivado hardware manager, from the Vivado Design Suite or Vivado debug feature, can be running on the target instance or it can be running remotely on a different host. The TCP port on which the XVC server is listening must be accessible to the host running Vivado hardware manager. To connect the Vivado hardware manager to XVC server on the target, the following steps should be followed on the machine hosting the Vivado tools:

Launch the Vivado debug feature, or the full Vivado Design Suite.
Select Open Hardware Manager from the Tasks menu, as shown in the following figure.
Connect to the Vivado tools hw_server, specifying a local or remote connection, and the Host name and Port, as shown below.
Connect to the target instance Virtual JTAG XVC server.
Select the debug_bridge instance from the Hardware window in the Vivado hardware manager.
Specify the probes file (.ltx) for your design adding it to the Probes > File entry in the Hardware Device Properties window. Adding the probes file refreshes the hardware device, and Hardware window should now show the debug cores in your design.
TIP: If the kernel has debug cores as specified in Enabling Kernels for Debugging with Chipscope, the probes file (.ltx) is written out during the implementation of the kernel by the Vivado tool.
The Vivado hardware manager can now be used to debug the kernels running on the Vitis software platform. Arm the ILA cores in your kernels and run your host application.

TIP: Refer to the Vivado Design Suite User Guide: Programming and Debugging (UG908) for more information on working with the Vivado hardware manager to debug the design.

JTAG Fallback for Private Debug Network

Hardware debug for the Alveo Data Center accelerator cards typically uses the XVC-over-PCIe connection due to the inaccessibility of the physical card, and the JTAG connector on the card. While XVC-over-PCIe allows you to remotely debug your application running on the target platform, certain conditions such as AXI interconnect system hangs can prevent you from accessing the hardware debug functionality that depends on these PCIe/AXI features. Being able to debug these kinds of conditions is especially important for platform designers.

The JTAG Fallback feature is designed to provide access to debug networks that were previously only accessible through XVC-over-PCIe. The JTAG Fallback feature can be enabled without having to change the XVC-over-PCIe-based debug network in the platform design.

On the host side, when the Vivado hardware manager user connects through the hw_server to a JTAG cable that is connected to the physical JTAG pins of the accelerator card, or device under test (DUT), the hw_server disables the XVC-over-PCIe pathway to the hardware. This lets you use the XVC-over-PCIe cable as your primary debug path, but enable debug over the JTAG cable directly when it is required in certain situations. When you disconnect from the JTAG cable, the hw_server re-enables the XVC-over-PCIe pathway to the hardware.

JTAG Fallback Steps

Here are the steps required to enable JTAG Fallback:

Enable the JTAG Fallback feature of the Debug Bridge (AXI-to-BSCAN mode) master of the debug network to which you want to provide JTAG access. This step enables a BSCAN slave interface on this Debug Bridge instance.
Instantiate another Debug Bridge (BSCAN Primitive mode) in the static logic partition of the platform design.
Connect the BSCAN master port of the Debug Bridge (BSCAN Primitive mode) from step 2 to the BSCAN slave interface of the Debug Bridge (AXI-to-BSCAN mode) from step 1.

Utilities for Hardware Debugging

In some cases, the normal Vitis IDE and command line debug features are limited in their ability to isolate an issue. This is especially true when the software or hardware appears not to make any progress (hangs). These kinds of system issues are best analyzed with the help of the utilities mentioned in this section.

Using the Linux dmesg Utility

Well-designed kernels and modules report issues through the kernel ring buffer. This is also true for Vitis technology modules that allow you to debug the interaction with the accelerator board on the lowest Linux level.

The dmesg utility is a Linux tool that lets you read the kernel ring buffer. The kernel ring buffer holds kernel information messages in a circular buffer. A circular buffer of fixed size is used to limit the resource requirements by overwriting the oldest entry with the next incoming message.

TIP: In most cases, it is sufficient to work with the less verbose xbutil feature to localize an issue. Refer to Using the Xilinx xbutil Utility for more information on using this tool for debug.

In the Vitis technology, the xocl module and xclmgmt driver modules write informational messages to the ring buffer. Thus, for an application hang, crash, or any unexpected behavior (like being unable to program the bitstream, etc.), the dmesg tool should be used to check the ring buffer.

The following image shows the layers of the software platform associated with the target platform.

To review messages from the Linux tool, you should first clear the ring buffer:

sudo dmesg -c

This flushes all messages from the ring buffer and makes it easier to spot messages from the xocl and xclmgmt. After that, start your application and run dmesg in another terminal.

sudo dmesg

The dmesg utility prints a record shown in the following example:

In the example shown above, the AXI Firewall 2 has tripped, which is better examined using the xbutil utility.

Using the Xilinx xbutil Utility

The Xilinx board utility (xbutil) is a powerful standalone command line utility that can be used to debug lower level hardware/software interaction issues. A full description of this utility can be found in xbutil Utility.

With respect to debugging, the following xbutil options are of special interest:

query: Provides an overall status of a card including information on the kernels in card memory.

program: Downloads a binary (xclbin) to the programmable region of the Xilinx device.

status: Extracts the status of the Performance Monitors (spm) and the Lightweight AXI Protocol Checkers (lapc).

Techniques for Debugging Application Hangs

This section discusses debugging issues related to the interaction of the host code and the accelerated kernels. Problems with these interactions manifest as issues such as machine hangs or application hangs. Although the GDB debug environment might help with isolating the errors in some cases (xprint), such as hangs associated with specific kernels, these issues are best debugged using the dmesg and xbutil commands as shown here.

If the process of hardware debugging does not resolve the problem, it is necessary to perform hardware debugging using the ChipScope feature.

AXI Firewall Trips

The AXI firewall should prevent host hangs. This is why the AXI Protocol Firewall IP is included in all production Vitis platforms. When the firewall trips, one of the first checks to perform is confirming if the host code and kernels are set up to use the same memory banks. The following steps detail how to perform this check.

Use xbutil to program the FPGA:
```
xbutil program -p <xclbin>
```
TIP: Refer to xbutil Utility for more information on xbutil.
Run the xbutil query option to check memory topology:
```
xbutil query
```
In the following example, there are no kernels associated with memory banks:
If the host code expects any DDR banks/PLRAMs to be used, this report should indicate an issue. In this case, it is necessary to check kernel and host code expectations. If the host code is using the Xilinx OpenCL extensions, it is necessary to check which DDR banks should be used by the kernel. These should match the connectivity.sp options specified as discussed in Mapping Kernel Ports to Memory.

Kernel Hangs Due to AXI Violations

It is possible for the kernels to hang due to bad AXI transactions between the kernels and the memory controller. To debug these issues, it is required to instrument the kernels.

The Vitis core development kit provides two options for instrumentation to be applied during v++ linking (--link). Both of these options add hardware to your implementation, and based on resource utilization it might be necessary to limit instrumentation.
1. Add Lightweight AXI Protocol Checkers (lapc). These protocol checkers are added using the -–debug.protocol option, as explained in --debug Options. The following syntax is used:
```
--debug.protocol <compute_unit_name>:<interface_name>
```
  In general, the <interface_name> is optional. If not specified, all ports on the CU are expected to be analyzed. The --debug.protocol option is used to define the protocol checkers to be inserted. This option can accept a special keyword, all, for <compute_unit_name> and/or <interface_name>.
  Note: Multiple --debug.xxx options can be specified in a single command line, or configuration file.
2. Adding Performance Monitors (am, aim, asm) enables the listing of detailed communication statistics (counters). Although this is most useful for performance analysis, it provides insight during debugging on pending port activities. The Performance Monitors are added using the --profile option as described in --profile Options. The basic syntax for the --profile option is:
```
--profile.data <krnl_name>|all:<cu_name>|all:<intrfc_name>|all:<counters>|all
```
  Three fields are required to determine the specific interface to attach the performance monitor to. However, if resource consumption is not an issue, the keyword all lets you apply the monitoring to all existing kernels, compute units, and interfaces with a single option. Otherwise, you can specify the kernel_name, cu_name, and interface_name explicitly to limit instrumentation.
  The last option, <counters>|all, allows you to restrict the information gathering to just counters for large designs, while all (default) includes the collection of actual trace information.
  Note: Multiple --profile options can be specified in a single command line, or configuration file.
```
[profile]
dataernel1:cu1:m_axi_gmem0 
dataernel1:cu1:m_axi_gmem1 
dataernel2:cu2:m_axi_gmem
```
When the application is rebuilt, rerun the host application using the xclbin with the added AIM IP and LAPC IP.
When the application hangs, you can use xbutil status to check for any errors or anomalies.
Check the AIM output:
- Run xbutil status --aim a couple of times to check if any counters are moving. If they are moving then the kernels are active.
  TIP: Testing AIM output is also supported through GDB debugging using the command extension xstatus spm.
- If the counters are stagnant, the outstanding counts greater than zero might mean some AXI transactions are hung.
Check the LAPC output:
- Run xbutil status --lapc to check if there are any AXI violations.
  TIP: Testing LAPC output is also supported through GDB debugging using the command extension xstatus lapc.
- If there are any AXI violations, it implies that there are issues in the kernel implementation.

Host Application Hangs When Accessing Memory

Application hangs can also be caused by incomplete DMA transfers initiated from the host code. This does not necessarily mean that the host code is wrong; it might also be that the kernels have issued illegal transactions and locked up the AXI.

If the platform has an AXI firewall, such as in the Vitis target platforms, it is likely to trip. The driver issues a SIGBUS error, kills the application, and resets the device. You can check this by running xbutil query. The following figure shows such an error in the firewall status:
```
Firewall Last Error Status:
		0:		0x0	 (GOOD)
		1:		0x0	 (GOOD)
		2:		0x80000 (RECS_WRITE_TO_BVALID_MAX_WAIT). 
				  Error occurred on Tue 2017-12-19 11:39:13 PST

Xclbin ID:	0x5a39da87
```
TIP: If the firewall has not tripped, the Linux tool, dmesg, can provide additional insight.
When you know that the firewall has tripped, it is important to determine the cause of the DMA timeout. The issue could be an illegal DMA transfer, or kernel misbehavior. However, a side effect of the AXI firewall tripping is that the health check functionality in the driver resets the board after killing the application; any information on the device that might help with debugging the root cause is lost. To debug this issue, disable the health check thread in the xclmgmt kernel module to capture the error. This uses common Unix kernel tools in the following sequence:
1. sudo modinfo xclmgmt: This command lists the current configuration of the module and indicates if the health_check parameter is ON or OFF. It also returns the path to the xclmgmt module.
2. sudo rmmod xclmgmt: This removes and disables the xclmgmt kernel module.
3. sudo insmod <path to module>/xclmgmt.ko health_check=0: This re-installs the xclmgmt kernel module with the health check disabled.
  TIP: The path to this module is reported in the output of the call to modinfo.
With the health check disabled, rerun the application. You can use the kernel instrumentation to isolate this issue as previously described.

Typical Errors Leading to Application Hangs

The user errors that typically create application hangs are listed below:

Read-before-write in 5.0+ target platforms causes a Memory Interface Generator error correction code (MIG ECC) error. This is typically a user error. For example, this error might occur when a kernel is expected to write 4 KB of data in DDR, but it produces only 1 KB of data, and then try to transfer the full 4 KB of data to the host. It can also happen if you supply a 1 KB buffer to a kernel, but the kernel tries to read 4 KB of data.
An ECC read-before-write error also occurs if no data has been written to a memory location as the last bitstream download which results in MIG initialization, but a read request is made for that same memory location. ECC errors stall the affected MIG because kernels are usually not able to handle this error. This can manifest in two different ways:
1. The CU might hang or stall because it cannot handle this error while reading or writing to or from the affected MIG. The xbutil query shows that the CU is stuck in a BUSY state and is not making progress.
2. The AXI Firewall might trip if a PCIe® DMA request is made to the affected MIG, because the DMA engine is unable to complete the request. AXI Firewall trips result in the Linux kernel driver killing all processes which have opened the device node with the SIGBUS signal. The xbutil query shows if an AXI Firewall has indeed tripped and includes a timestamp.
If the above hang does not occur, the host code might not read back the correct data. This incorrect data is typically 0s and is located in the last part of the data. It is important to review the host code carefully. One common example is compression, where the size of the compressed data is not known up front, and an application might try to migrate more data to the host than was produced by the kernel.

Defensive Programming

The Vitis compiler is capable of creating very efficient implementations. In some cases, however, implementation issues can occur. One such case is if a write request is emitted before there is enough data available in the process to complete the write transaction. This can cause deadlock conditions when multiple concurrent kernels are affected by this issue and the write request of a kernel depends on the input read being completed.

To avoid these situations, a conservative mode is available on the adapter. In principle, it delays the write request until it has all of the data necessary to complete the write. This mode is enabled during compilation by applying the following --advanced.param option to the v++ compiler:

--advanced.param:compiler.axiDeadLockFree=yes

Because enabling this mode can impact performance, you might prefer to use this as a defensive programming technique where this option is inserted during development and testing and then removed during optimization. You might also want to add this option when the accelerator hangs repeatedly.

Debugging on Embedded Processor Platforms

Debugging on embedded processor platforms, such as the xilinx_zcu104_base_202010_1 platform, requires the use of the QEMU emulation environment to model the Arm processor and operating system for the device. As described in the next sections, running or debugging the application requires the additional step of launching the emulator, or connecting to the hardware platform through a TCF agent.

Emulation Debug for Embedded Processors

From within the Vitis IDE, launching debug for the software and hardware emulation builds include the following steps:

In the Assistant view, right-click the Emulation-SW or Emulation-HW build and select Set Active to make the build active.
From the Assistant view menu, select the Debug () command, and select the Launch on Emulator command to launch the debug environment.
This will open the Launch on Emulator dialog box as shown in the following figure. This prompts you to confirm launching the emulation environment and connecting to it using a Linux TCF agent. Select Start Emulator and Debug to continue.

This launches the emulation environment (QEMU), and loads the application in preparation for debugging. The application is paused as it enters the main() function. The Debug perspective is opened in the Vitis IDE, and you are ready to begin debugging your application.

Hardware Debug for Embedded Processors

For hardware builds the setup involves the following steps:

Copy the contents of the <project>/Hardware/sd_card/sd_card folder to a physical SD card. This creates a bootable medium for your target platform.
Insert the SD card into the card reader of your embedded processor platform.
Change the boot-mode settings of the platform to SD boot mode, and power up the board.
After the device is booted, enter the mount command at the command prompt to get a list of mount points. As shown in the following figure, the mount command displays mounting information for the system.
TIP: Be sure to capture the proper path for the cd command in the next step, and subsequent commands, based on the results of the mount command.
Execute the following commands, for example:
```
cd /run/media/mmcblkop1
source init.sh
cat /etc/xocl.txt
```
The cat command will display the platform name xilinx_vck190_base_202010_1 to let you confirm it is the same as your specified platform and that your setup is correct.
Run ifconfig to get the IP address of the target card. You will use the IP address to set up a TCF agent connection in Vitis IDE to connect to the assigned IP address of the embedded processor platform.
Create a target connection to the remote accelerator card. Use the Window > Show view > Xilinx > Target connections command to open the Target Connections view.
In the Target Connections view, right-click on the Linux TCF Agent and select the New Target command to open the New Target Connection dialog box.
Specify the Target Name, enable the Set as default target check box, and specify the Host IP address of the accelerator card that you obtained in an earlier step.
Click OK to close the dialog box and continue.
In the Assistant view, right-click on the Hardware build and select Set Active to make it the active build.
From the Assistant view menu, select the Debug () command, and select the Debug Configurations command. This opens the Debug Configurations dialog box to let you configure debug for the Hardware build on your specific platform.

Set the following fields on the Main tab of the dialog box:

Name

Specifies a name for your Hardware debug configuration.

Linux TCF Agent

Selects the new agent you built with the specified IP address for the accelerator card.

Configuration

Ensure you have selected the Hardware configuration.

Enable Profiling

If you want to capture trace data from events.

Select the Application tab in the Debug Configuration dialog box to see the following fields:

Set the following fields on the Application tab:

Local File Path

Specifies where the files created on the target platform will be written back into your local disk.

Remote File Path

Specifies the remote mount location from the accelerator card as determined in an earlier step.

Working directory

Specifies the location to write files created on the target platform.
Select Apply to save your changes, and Debug to start the process.
This opens the Debug perspective in the Vitis IDE, and connects to the PS application on your hardware platform. The application automatically breaks at the main() function to let you set up and configure the debug environment.

Example of Command Line Debugging

To help you get familiar with debugging using the command line flow, this example walks you through building and debugging the IDCT example available from the Xilinx GitHub.

In a terminal, set up your environment as described in Setting Up the Vitis Environment.
If you have not already done it, clone the Vitis Examples GitHub repository to acquire all of the Vitis examples:
```
git clone https://github.com/Xilinx/Vitis_Accel_Examples.git
```
This creates a Vitis_Examples directory which includes the IDCT example.
CD to the IDCT example directory:
```
cd Vitis_Examples/vision/idct/
```
The host code is fully contained in src/idct.cpp and the kernel code is part of src/krnl_idct.cpp.
Build the kernel software for software emulation as discussed in Building the Device Binary.
1. Compile the kernel object file for debugging using the v++ compiler, where -g indicates that the code is compiled for debugging:
```
v++ -t sw_emu --platform <DEVICE> -g -c -k krnl_idct \
-o krnl_idct.xo src/krnl_idct.cpp
```
2. Link the kernel object file, also specifying -g:
```
v++ -g -l -t sw_emu --platform <DEVICE> -config config.cfg \
-o krnl_idct.xclbin krnl_idct.xo
```
  The --config option specifies the configuration file, config.cfg, that contains the directives for the build process as described in the Vitis Compiler Configuration File. The contents of the configuration file are as follows:
```
kernel_frequency=250

[connectivity]
nk=krnl_idct:1:krnl_idct_1

sp=krnl_idct_1.m_axi_gmem0:DDR[0]
sp=krnl_idct_1.m_axi_gmem1:DDR[0]
sp=krnl_idct_1.m_axi_gmem2:DDR[1]

[advanced]
prop=solution.hls_pre_tcl='src/hls_config.tcl"
```
Compile and link the host code for debugging using the GNU compiler chain, g++ as described in Building the Host Program:
Note: For embedded processor target platforms you will use the GNU Arm cross-compiler as described in Compiling and Linking for Arm.
1. Compile host code C++ files for debugging using the -g option:
```
g++ -c -I${XILINX_XRT}/include -g -o idct.o src/idct.cpp 
```
2. Link the object files for debugging using -g:
```
g++ -g -lOpenCL -lpthread -lrt -lstdc++ -L${XILINX_XRT}/lib/ -o idct idct.o
```
As described in emconfigutil Utility, prepare the emulation environment using the following command:
```
emconfigutil --platform <device>
```
The actual emulation mode (sw_emu or hw_emu) then needs to be set through the XCL_EMULATION_MODE environment variable. In C-shell this would be as follows:
```
setenv XCL_EMULATION_MODE sw_emu
```
As described in xrt.ini File, you must setup the runtime for debug. In the same directory as the compiled host application, create an xrt.ini file with the following content:
```
[Debug]
app_debug=true
```
Run GDB on the host and kernel code. The following steps guide you through the command line debug process which requires three separate command terminals, setup as described in Setting Up the Vitis Environment.
1. In the first terminal, start the XRT debug server, which handles the transactions between the host and kernel code:
```
${XILINX_VITIS}/bin/xrt_server --sdx-url
```
2. In a second terminal, set the emulation mode:
```
setenv XCL_EMULATION_MODE sw_emu
```
  Run GDB by executing the following:
```
xgdb –-args idct krnl_idct.xclbin
```
  Enter the following on the gdb prompt:
```
run
```
3. In the third terminal, attach the software emulation or hardware emulation model to GDB to allow stepping through the design. Here, there is a difference between running software emulation and hardware emulation. In either flow, start up another xgdb:
```
xgdb
```
  - For debugging in software emulation:
    - Type the following on the gdb prompt:
```
file <XILINX_VITIS>/data/emulation/unified/cpu_em/generic_pcie/model/genericpciemodel
```
      Note: Because GDB does not expand the environment variable, you must specify the path to the Vitis software platform installation as represented by <XILINX_VITIS>
  - For debugging in hardware emulation:
    1. Locate the xrt_server temporary directory: /tmp/sdx/$uid.
    2. Find the xrt_server process ID (PID) containing the DWARF file of this debug session.
    3. At the gdb prompt, run:
```
file /tmp/sdx/$uid/$pid/NUM.DWARF
```
  - In either case, connect to the kernel process:
```
target remote :NUM
```
    Where NUM is the number returned by the xrt_server as the GDB listener port.
  At this point, debugging the host and kernel code can be done as usual with GDB, with the host code and the kernel code running in two different GDB sessions. This is common when dealing with different processes.
  IMPORTANT: Be aware that the application might hit a breakpoint in one process before the next breakpoint in the other process is hit. In these cases, the debugging session in one terminal appears to hang, while the second terminal is waiting for input.