Debug Techniques

This section closely examines different styles of debugging techniques. It classifies the different approaches into software-based debugging techniques and hardware-oriented techniques. In the software-based approaches, you are not required to fully understand the ultimate mapping of the kernel code onto the FPGA. However, this concept can only be extended to a certain amount of detail, at which point the more detailed hardware-based analysis is required.

The section is structured along the different debug stages in the SDAccel™ environment. It starts with functional verification during software emulation (a purely software-based approach). Next is hardware emulation, where the kernel code is converted into actual hardware representation providing more details of the final implementation. Hardware debugging as well as software debugging concepts can be applied during debugging in the hardware emulation stage. The last stage is system verification, where the actual hardware is executed. In this stage, software debugging concepts can only be applied to the host while the kernel must deploy hardware debugging concepts.

Functional Verification (Software Emulation)

Functional verification is the process during which the software representing the system is verified towards the ultimate implementation goal by ensuring that the software behaves as intended on the given data. This is a very common task during software development and many different concepts are available.

If your software does not perform as intended, you can use the debugger to identify the root cause of the issue, or if necessary, dump datapoints during software execution. This section introduces these concepts applied to an SDx™ environment project.

Using printf() to Debug Kernels

The simplest approach to debugging algorithms is to verify key data values throughout the execution of the program. For application developers, printing checkpoint values in the code is a tried and trusted way of identifying problems within the execution of a program. Because part of the algorithm is now running on an FPGA, even this debugging technique requires additional support.

The SDAccel development environment supports the OpenCL™ printf() built-in function within the kernels in all development flows: software emulation, hardware emulation, and running the kernel in actual hardware. The following is an example of using printf() in the kernel, and the output when the kernel is executed with global size of 8:

__kernel __attribute__ ((reqd_work_group_size(1, 1, 1)))
void hello_world(__global int *a)
{
    int idx = get_global_id(0);

     printf("Hello world from work item %d\n", idx);
     a[idx] = idx;
}

The output is as follows:

Hello world from work item 0
Hello world from work item 1
Hello world from work item 2
Hello world from work item 3
Hello world from work item 4
Hello world from work item 5
Hello world from work item 6
Hello world from work item 7

IMPORTANT: printf() messages are buffered in the global memory and unloaded when kernel execution is completed. If printf() is used in multiple kernels, the order of the messages from each kernel display on the host terminal is not certain. Please note, especially when running in hardware emulation and hardware, the hardware buffer size might limit printf output capturing.

Note: This feature is only supported for OpenCL kernels in all development flows.

For C/C++ kernel models printf() is only supported during software emulation and should be excluded from the Vivado® HLS synthesis step. In this case, any printf() statement should be surrounded by the following compiler macros:

#ifndef __SYNTHESIS__
    printf("text");
#endif

GDB-Based Debugging

This section shows how host and kernel debugging can be performed with the help of GDB. Because this flow should be familiar to software developers, this section focuses on the extensions of host code debugging capabilities specifically for FPGAs, and the current status of kernel-based hardware emulation support.

Host Code Debugging

Except for the method of launching the debugging environment described in the previous chapter, there is no difference between the SDAccel host code debugging and the commonly used GDB application debugging flow and features.

After gdb is launched, you can step through the host code in GDB and examine the C/C++/OpenCL objects to verify that their contents are as expected at any point in the code.

However, as stated in the introduction especially in the case of hardware emulation, it is common to look for issues regarding protocol synchronization between the host and the kernel. The SDAccel environment provides special GDB extensions to examine the content of the OpenCL runtime environment from the application host. These commands are described in more detail in the next section.

Xilinx OpenCL Runtime GDB Extensions

The Xilinx OpenCL runtime Debug Environment introduces new GDB commands that give visibility from the host application into the OpenCL runtime library.

Note: If you run GDB outside of the SDAccel environment, these commands need to be enabled as described in Launching GDB Host Code Debug.

There are two kinds of commands which can be called from the gdb command line:

Commands that give visibility into the OpenCL runtime data structures (cl_command_queue, cl_event, and cl_mem). The arguments to xprint queue and xprint mem are optional. The application debug environment keeps track of all the OpenCL objects and automatically prints all valid queues and cl_mem objects if the argument is not specified. In addition, the commands do a proper validation of supplied command queue, event, and cl_mem arguments.
```
xprint queue [<cl_command_queue>]
xprint event <cl_event>
xprint mem [<cl_mem>]
xprint kernel
xprint all
```
Commands that give visibility into the IP on the SDAccel platform. This functionality is only available in the system flow (hardware execution) and not in any of the emulation flows.
```
xstatus all
xstatus --<ipname>
```

You can get help information about the commands by using help <command>.

A typical example for using these commands is if you are seeing the host application hang. In this case, the host application is likely to be waiting for the command queue to finish or waiting on an event list. Printing the command queue using the xprint command can tell you what events are unfinished, letting you analyze the dependencies between the events.

The output of both of these commands is automatically tracked when debugging with the SDAccel IDE. In this case three tabs are provided next to the common tabs for Variables, Breakpoints, and Registers in the left upper corner of the debug perspective. These are labeled Command Queue, Memory Buffers, and Platform Debug, showing the output of xprint queue, xprint mem, and xstatus respectively.

Note: The information presented in these views is only visible to the application developer while actually debugging the host code. This is the reason why this debug technique is also applicable when actual system execution (hardware) is performed.

GDB Kernel-Based Debugging

GDB kernel debugging is supported for the software emulation and hardware emulation flows. When the GDB executable is connected to the kernel in the IDE or command line flows, you can set breakpoints and query the content of variables in the kernel, similar to normal host code debugging. This is fully supported in the software emulation flow because the kernel GDB processes attach to the spawned software processes.

However, during hardware emulation, the kernel source code is transformed into RTL, created by Vivado HLS, and executed. As the RTL model is simulated, all transformations for performance optimization and concurrent hardware execution are applied. For that reason, not all C/C++/OpenCL lines can be uniquely mapped to the RTL code, and only limited breakpoints are supported and at only specific variables can be queried. Today, the GDB tool therefore breaks on the next possible line based on requested breakpoint statements and clearly states if variables can not be queried based on the RTL transformations.

Debugging in Hardware Emulation

During hardware emulation, it is possible to deep dive into the implementation of the kernels. The SDAccel environment allows you to perform typical hardware-like debugging in this mode as well as some software-like GDB-based analysis on the hardware implementation.

GDB-Based Debugging

Debugging using a software-based GDB flow is fully supported during hardware emulation. Except for the execution of the actual RTL code representing the kernel code, there is no difference to the user because the GDB flow maps the RTL back into the source code description. This limits the breakpoint and observability of the variables in some cases, because during the RTL generation (HLS), variables and loops might have been dissolved.

For a detailed description of the debug feature itself please see the description in the SDAccel Debug Features chapter, and the extensions to GDB as presented in the GDB-Based Debugging section.

Waveform-Based Kernel Debugging

The C/C++ and OpenCL kernel code is synthesized using Vivado High Level Synthesis (HLS) to transform it into a Hardware Description Language (HDL) and later implement it onto the FPGA (xclbin).

Another debugging approach is based on simulation waveforms. Hardware-centric algorithm programmers are likely to be familiar with this approach. This waveform-based HDL debugging is best supported by the SDAccel environment through the IDE flow during hardware emulation.

TIP: For most debugging, the HDL model does not need to be analyzed. Waveform debugging is considered an advanced debugging capability.

Run the Waveform-Based Kernel Debugging Flow

Start the SDx environment, and perform the regular setup.
Select Run > Debug Configurations to open the Debug Configurations.
On the Debug Configurations window, select the current launch configuration from the OpenCL list, as shown in the following figure.
On the Main tab, two kernel debug options are displayed. Select both Use RTL waveform for kernel debugging and Launch live waveform, and close the configuration window. A debug session starts automatically. Selecting the Use RTL waveform for kernel debugging option ensures that a simulation waveform database is generated, while the Launch live waveform option spawns the Waveform viewer during the actual simulation, allowing you full control of the simulation engines and waveform display.
If the live waveform viewer is activated, the waveform viewer automatically opens when running the executable. By default, the waveform viewer shows all interface signals and the following debug hierarchy:
- Memory Data Transfers: Shows data transfers from all compute units funnel through these interfaces.
  TIP: These interfaces could be a different bit width from the compute units. If so, then the burst lengths would be different. For example, a burst of sixteen 32-bit words at a compute unit would be a burst of one 512-bit word at the OCL master.
- Kernel <kernel name><workgroup size> Compute Unit<CU name>
  - CU Stalls (%): This section shows a summary of stalls for the entire compute unit (CU). A bus of all lowest-level stall signals is created, and the bus is represented in the waveform as a percentage (%) of those signals that are active at any point in time.
  - Data Transfers: This section shows the data transfers for all AXI masters on the CU.
  - User Functions: This section lists all of the functions within the hierarchy of the CU.
    - Function: <function name>
      - Dataflow/Pipeline Activity: This section shows the function-level loop dataflow/pipeline signals for a CU.
      - Function Stalls: This section lists the three stall signals within this function.
      - Function I/O: This section lists the I/O for the function. These I/O are of protocol -m_axi, ap_fifo, ap_memory, or ap_none.
TIP: As with any waveform debugger, additional debug data of internal signals can be added by selecting the instance of interest from the scope menu and the signals of interest from the object menu. Similarly, debug controls such as HDL breakpoints, as well as HDL code lookup and waveform markers are supported. Refer to the Vivado Design Suite User Guide: Logic Simulation (UG900) for more information on working with the waveform viewer.

Enable Waveform Debugging through the XOCC Command Line

The waveform debugging process can also be enabled through the XOCC command line. Use the following instructions to enable it:

Turn on debug code generation during kernel compilation.
```
xocc -g ...
```

Create an sdaccel.ini file in the same directory as the host executable with the contents below:

[Emulation]
launch_waveform=batch

[Debug]
profile=true
timeline_trace=true
data_transfer_trace=fine

Execute hardware emulation. The hardware transaction data is collected in the file named <hardware_platform>-<device_id>-<xclbin_name>.wdb. This file can directly be opened through the SDAccel IDE.
TIP: If the launch_waveform option is set to gui in the emulation section: [Emulation] launch_waveform=gui, a live waveform viewer is spawned during the execution of the hardware emulation.

System Verification and Hardware Debug

Application Hangs

This section discusses debugging issues related to the interaction of the host code and the accelerated kernels. Problems with these interactions manifest as issues such as machine hangs or application hangs. Although the GDB debug environment might help with isolating the errors in some cases (xprint), such as hangs associated with specific kernels, these issues are best debugged using the dmesg and xbutil commands as shown here.

If the process of hardware debugging does not resolve the problem, it is necessary to perform hardware debugging using the ChipScope™ feature.

AXI Firewall Trips

The AXI firewall should prevent host hangs. This is why Xilinx recommends the AXI Protocol Firewall IP to be included in SDAccel environment platforms. When the firewall trips, one of the first checks you perform should be to see if the host code and kernels are set up to use the same memory banks. The following steps detail one of the simplest methods to perform this check.

Use xbutil to program the FPGA:
```
xbutil program -p <xclbin>
```
Run the xbutil query option to check memory topology:
```
xbutil query
```
In the following example, there is no memory bank associated with the kernels:
If the host code expects any DDR banks/PLRAMs to be used, this report should indicate an issue. In this case, it is necessary to check kernel and host code expectations. If the host code is using the Xilinx OpenCL extensions, it is necessary to check which DDR banks should be used by the kernel. These should match the xocc -sp arguments provided.

Kernel Hangs due to AXI Violations

It is possible for the kernels to hang due to bad AXI transactions between the kernels and the memory controller. To debug these issues, it is required to instrument the kernels.

The SDAccel environment provides two options for instrumentation to be applied during XOCC linking (-l). Both of these add hardware to your implementation, and based on utilization it might be necessary to limit instrumentation.
1. Add Lightweight AXI Protocol Checkers (lapc). These protocol checkers are added using the -–dk option. The following syntax is used:
```
 --dk <[protocol|list_ports]<:compute_unit_name><:interface_name>>
```
  In general, the <interface_name> is optional. If not specified, all ports are expected to be analyzed. The protocol option is used to define the protocol checkers to be inserted. This option can accept a special keyword, all, for <compute_unit_name> and/or <interface_name>. The list_ports option generates a list of valid compute units and port combinations in the current design.
  Note: Multiple --dk option switches can be specified in a single command line to additively add interface monitoring capability.
2. Adding SDx environment Performance Monitors (spm) enables the listing of detailed communication statistics (counters). Although this is most useful for performance analysis, it provides insight during debugging on pending port activities. The Performance Monitors are added using the profile_kernel option. The basic syntax for profile_kernel option is:
```
--profile_kernel data:<krnl_name|all>:<cu_name|all>:<intrfc_name|all>:<counters|all>
```
  Three fields are required to determine the precise interface to which the performance monitor is applied. However, if resource use is not an issue, the keyword all enables you to apply the monitoring to all existing kernels, compute units, and interfaces with a single option. Otherwise, you can specify the kernel_name, cu_name, and interface_name explicitly to limit instrumentation.
  The last option, <counters|all>, allows you to restrict the information gathering to just counters for large designs, while all (default) includes the collection of actual trace information.
  Note: Multiple --profile_kernel option switches can be specified in a single command line to additively add performance monitoring capability.
```
--profile_kernel data:kernel1:cu1:m_axi_gmem0 
--profile_kernel data:kernel1:cu1:m_axi_gmem1 
--profile_kernel data:kernel2:cu2:m_axi_gmem
```
When the application is rebuilt, rerun the host application using the xclbin with the added SPM IP and LAPC IP.
When the application hangs, you can use xbutil status to check for any errors or anomalies.
Check the SPM output:
- Run xbutil status --spm a couple of times to check if any counters are moving. If they are moving then the kernels are active.
  TIP: Testing SPM output is also supported through GDB debugging using the command extension xstatus spm.
- If the counters are stagnant, the outstanding counts greater than zero might mean some AXI transactions are hung.
Check the LAPC output:
- Run xbutil status --lapc to check if there are any AXI violations.
  TIP: Testing LAPC output is also supported through GDB debugging using the command extension xstatus lapc.
- If there are any AXI violations, it implies that there are problems in the kernel implementation.

Host Application Hangs when Accessing Memory

Application hangs can also be caused by incomplete DMA transfers initiated from the host code. This does not necessarily mean that the host code is wrong; it might also be that the kernels have issued illegal transactions and locked up the AXI.

If the platform has an AXI firewall, such as in the SDAccel platforms, it is likely to trip. The driver issues a SIGBUS error, kills the application, and resets the device. You can check this by running xbutil query. The following figure shows such an error in the firewall status:

TIP: If the firewall has not tripped, the Linux tool, dmesg, can provide additional insight.
When you know that the firewall has tripped, it is important to determine the cause of the DMA timeout. The issue could be an illegal DMA transfer, or kernel misbehavior. However, a side effect of the AXI firewall tripping is that the health check functionality in the driver resets the board after killing the application; any information on the device that might help with debugging the root cause is lost. To debug this problem, you can disable the health check thread in the xclmgmt kernel module to capture the error. This uses common Unix kernel tools in the following sequence:
1. sudo modinfo xclmgmt: This command lists the current configuration of the module and indicates if the health_check parameter is on or off. It also returns the path to the xclmgmt module.
2. sudo rmmod xclmgmt: This removes and therefore disables the xclmgmt kernel module.
3. sudo insmod <path to module>/xclmgmt.ko health_check=0: This reinstalls the xclmgmt kernel module with the health check disabled.
  TIP: The path to this module is reported in the output of the call to modinfo.
With the health check disabled, rerun the application. You can use the kernel instrumentation to isolate this issue as previously described.

Typical Errors Leading to Application Hangs

The user errors that typically create application hangs are listed below:

Read-before-write in 5.0+ shells causes an MIG ECC (Memory Interface Generator error correction code) error. This is typically a user error. For example, this error might occur when a kernel is expected to write 4KB of data in DDR, but it produces only 1KB of data, and you then try to transfer the full 4KB of data to the host. It can also happen if you supply a 1KB buffer to a kernel, but the kernel tries to read 4KB of data.
An ECC read-before-write error also occurs if no data has been written to a memory location since the last bitstream download which results in MIG initialization, but a read request is made for that same memory location. ECC errors stall the affected MIG because kernels are usually not able to handle this error. This can manifest in two different ways:
1. The CU might hang or stall because it cannot handle this error while reading or writing to or from the affected MIG. The xbutil query shows that the CU is stuck in a BUSY state and is not making progress.
2. The AXI Firewall might trip if a PCIe® DMA request is made to the affected MIG, because the DMA engine is unable to complete the request. AXI Firewall trips result in the Linux kernel driver killing all processes which have opened the device node with the SIGBUS signal. The xbutil query shows if an AXI Firewall has indeed tripped, and includes a timestamp.
If the above hang does not occur, the host code might not read back the correct data. This incorrect data is typically 0s, and is located in the last part of the data. It is important to review the host code carefully. One common example is compression, where the size of the compressed data is not known up front, and an application might try to migrate more data to the host than was produced by the kernel.

Debugging with ChipScope

You can use the ChipScope debugging environment and the Vivado hardware manager to help you debug your host application and kernels quickly and more effectively. In order to do this, at least one of the following must be true:

Your SDAccel application project has been instrumented with debug cores, using the --dk compiler switch (as described in Hardware Debugging Using ChipScope).
The RTL kernels used in your project must have been instantiated with debug cores (as described in Adding Debug IP to RTL Kernels).

These tools enable a wide range of capabilities from logic to system level debug while your kernel is running in hardware.

Note: Debugging on the kernel platform requires additional logic to be incorporated into the overall hardware model, which might have an impact on resource use and kernel performance.

Running XVC and HW Servers

The following steps are required to run the XVC (Xilinx Virtual Cable) and HW servers, host applications, and finally trigger and arm the debug cores in Vivado hardware manager.

Add debug IP to the kernel.
Instrument the host application to pause at appropriate point in the host execution where you want to debug. See Debugging through the Host Application.
Set up the environment for hardware debug. You can do this manually, or by using a script that automates this for you. The following steps are described in Manual Setup for Hardware Debug and Automated Setup for Hardware Debug:
1. Run the required XVC and HW servers.
2. Execute the host application and pause at the appropriate point in the host execution to enable setup of ILA triggers.
3. Open Vivado hardware manager and connect to the XVC server.
4. Set up ILA trigger conditions for the design.
5. Continue with host application.
6. Inspect results in the Vivado hardware manager.
7. Rerun iteratively from step b (above) as required.

Adding Debug IP to RTL Kernels

IMPORTANT: This debug technique requires familiarity with the Vivado Design Suite, and RTL design.

You need to instantiate debug cores like the Integrated Logic Analyzer (ILA) and Virtual Input/Output (VIO) in your RTL kernel code to debug the kernel logic. From within the Vivado Design Suite, edit the RTL kernel to instantiate an ILA IP customization, or a VIO IP, into the RTL code, similar to using any other IP in Vivado IDE. Refer to the Vivado Design Suite User Guide: Programming and Debugging (UG908) to learn more about using the ILA or other debug cores in the RTL Insertion flow and to learn about using the HDL generate statement technique to enable/disable debug core generation.

TIP: The best time to add debug cores to your RTL kernel is when you create it. Refer to the Debugging section in the UltraFast Design Methodology Guide for the Vivado Design Suite (UG949) for more information.

You can also add the ILA debug core using a Tcl script from within an open Vivado project as shown in the following code example:

create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name ila_0
set_property -dict [list CONFIG.C_PROBE6_WIDTH {32} CONFIG.C_PROBE3_WIDTH {64} \
CONFIG.C_NUM_OF_PROBES {7} CONFIG.C_EN_STRG_QUAL {1} CONFIG.C_INPUT_PIPE_STAGES {2} \
CONFIG.C_ADV_TRIGGER {true} CONFIG.ALL_PROBE_SAME_MU_CNT {4} CONFIG.C_PROBE6_MU_CNT {4} \
CONFIG.C_PROBE5_MU_CNT {4} CONFIG.C_PROBE4_MU_CNT {4} CONFIG.C_PROBE3_MU_CNT {4} \
CONFIG.C_PROBE2_MU_CNT {4} CONFIG.C_PROBE1_MU_CNT {4} CONFIG.C_PROBE0_MU_CNT {4}] [get_ips ila_0]

The following is an example of an ILA debug core instantiated into the RTL kernel source file of the RTL Kernel Debug example design on GitHub. The ILA monitors the output of the combinatorial adder as specified in the src/hdl/krnl_vadd_rtl_int.sv file.

	// ILA monitoring combinatorial adder
	ila_0 i_ila_0 (
		.clk(ap_clk),              // input wire        clk
		.probe0(areset),           // input wire [0:0]  probe0  
		.probe1(rd_fifo_tvalid_n), // input wire [0:0]  probe1 
		.probe2(rd_fifo_tready),   // input wire [0:0]  probe2 
		.probe3(rd_fifo_tdata),    // input wire [63:0] probe3 
		.probe4(adder_tvalid),     // input wire [0:0]  probe4 
		.probe5(adder_tready_n),   // input wire [0:0]  probe5 
		.probe6(adder_tdata)       // input wire [31:0] probe6
	);

After the RTL kernel has been instrumented for debug with the appropriate debug cores, you can analyze the hardware in the Vivado hardware manager features as described in the previous topic.

Debugging through the Host Application

To debug the host application working with the kernel code running on the SDAccel platform, the application host code must be modified to ensure that you can set up the ILA trigger conditions after the kernel has been programmed into the device, but before starting the kernel.

Pausing a C++ Host Application

The following code example is from the src/host.cpp code from the RTL Kernel example on GitHub:

....
    std::string binaryFile = xcl::find_binary_file(device_name,"vadd");
	
    cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
    devices.resize(1);
    cl::Program program(context, devices, bins);
    cl::Kernel krnl_vadd(program,"krnl_vadd_rtl");


     wait_for_enter("\nPress ENTER to continue after setting up ILA trigger...");

    //Allocate Buffer in Global Memory
    std::vector<cl::Memory> inBufVec, outBufVec;
    cl::Buffer buffer_r1(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 
            vector_size_bytes, source_input1.data());
    ...

    //Copy input data to device global memory
    q.enqueueMigrateMemObjects(inBufVec,0/* 0 means from host*/);

    //Set the Kernel Arguments
    ...

    //Launch the Kernel
    q.enqueueTask(krnl_vadd);

The addition of the conditional if (interactive) test and the use of the wait_for_enter function pause the host application to give the ILA time to set up the required triggers and prepare to capture data from the kernel. After the Vivado hardware manager is set up and configured properly, you can press Enter to continue running the host application.

Pausing the Host Application Using GDB

Instead of making changes to the host application to pause before a kernel execution, you can run a GDB session from the SDx IDE. You can then set a breakpoint prior to the kernel execution in the host application. When the breakpoint is reached, you can set up the debug ILA triggers in Vivado hardware manager, arm the trigger, and then resume the kernel execution in GDB.

Automated Setup for Hardware Debug

Note: A full SDx environment install is required to complete the following task. See the SDx Environments Release Notes, Installation, and Licensing Guide for more information about installation.

Set up your SDx environment by sourcing the appropriate settings64.sh/.csh file found in your SDx install area.
Start xvc_pcie and hw_server apps using the sdx_debug_hw script.
```
sdx_debug_hw --xvc_pcie /dev/xvc_pub.m1025 --hw_server
launching xvc_pcie...
xvc_pcie -d /dev/xvc_pub.m1025 -s TCP::10200
launching hw_server...
hw_server -sTCP::3121
```
Note: The /dev/xvc_* character device will differ depending on the platform. In this example, the character device is /dev/xvc_pub.m1025, though on your system it is likely to differ.
In the SDx IDE, modify the host code to include a pause statement after the kernel has been created/downloaded and before the kernel execution is started, then recompile the host program.
- For C++ host code, add a pause after the creation of the cl::Kernel object. The following snippet is from the Vector Add template design C++ host code:
- For C-language host code, add a pause after the clCreateKernel() function call:

Run your modified host program.

vadd_test.exe ./binary_container_1.xclbin
Loading: './binary_container_1.xclbin'
Pausing to allow you to arm ILA trigger. Hit enter here to resume host program...

Launch Vivado Design Suite using the sdx_debug_hw script located in your SDAccel installation directory.

> sdx_debug_hw --vivado --host xcoltlab40 --ltx_file ../workspace/vadd_test/System/pfm_top_wrapper.ltx

The command window displays the following:

launching vivado... ['vivado', '-source', 'sdx_hw_debug.tcl', '-tclargs', '/tmp/sdx_tmp/project_1/project_1.xpr', 'workspace/vadd_test/System/pfm_top_wrapper.ltx', 'xcoltlab40', '10200', '3121']
 
****** Vivado v2018.2 (64-bit)
  **** SW Build 2245749 on Wed May 30 12:36:19 MDT 2018
  **** IP Build 2245576 on Wed May 30 15:12:50 MDT 2018
    ** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.
 
start_gui

In Vivado Design Suite, run the ILA trigger.

Press Enter to un-pause the host program.

vadd_test.exe ./binary_container_1.xclbin
Loading: './binary_container_1.xclbin'
Pausing to allow you to arm ILA trigger.  Hit enter here to resume host program...
 
TEST PASSED

In the Vivado Design Suite, see the interface transactions on the kernel compute unit slave control interface in the Waveform view.

Manual Setup for Hardware Debug

Manually Starting Debug Servers

Note: The following steps are also applicable when using Nimbix and other cloud platforms.

There are two steps required to start the debug servers prior to debugging the design in Vivado hardware manager.

Source the SDx environment setup script, settings64.csh or settings64.sh, and launch the xvc_pcie server. The filename passed to xvc_pcie must match the character driver file installed with the kernel device driver.
```
>xvc_pcie -d /dev/xvc_pub.m1025
```
Note: The xvc_pcie server has many useful command line options. You can issue xvc_pcie -help to obtain the full list of available options.

Start the XVC server on port 10201 and the hw_server on port 3121.

>hw_server "set auto-open-servers xilinx-xvc:localhost:10201" -e "set always-open-jtag 1"

Starting Debug Servers on an Amazon F1 Instance

Instructions to start the debug servers on an Amazon F1 instance can be found here: https://github.com/aws/aws-fpga/blob/master/hdk/docs/Virtual_JTAG_XVC.md

Debugging Designs using Vivado Hardware Manager

Traditionally, a physical JTAG connection is used to debug FPGAs. The SDAccel platforms have leveraged XVC for a debug flow that enables debug in the cloud. To take advantage of this capability, SDAccel enables running the XVC server. The XVC server is an implementation of Xilinx Virtual Cable (XVC) protocol, which allows the Vivado Design Suite to connect to a local or remote target FPGA for debug, using standard Xilinx debug cores like the Integrated Logic Analyzer IP (ILA), or the Virtual Input/Output IP (VIO), and others.

The Vivado hardware manager (Vivado Design Suite or Vivado Lab Edition) can be running on the target instance or it can be running remotely on a different host. The TCP port on which the XVC server is listening must be accessible to the host running Vivado hardware manager. To connect the Vivado hardware manager to XVC server on the target, the following steps should be followed on the machine hosting the Vivado tools:

Launch the Vivado Lab Edition, or the full Vivado Design Suite.
Select Open Hardware Manager from the Tasks menu, as shown in the following figure.
Connect to the Vivado tools hw_server, specifying a local or remote connection, and the Host name and Port, as shown below.
Connect to the target instance Virtual JTAG XVC server.
Select the debug bridge instance from the Hardware window of the Vivado hardware manager.
In the Hardware Device Properties window select the appropriate probes file for your design by clicking the icon next to the Probes file entry, selecting the file, and clicking OK. This refreshes the hardware device, and it should now show the debug cores present in your design.
TIP: The probes file (.ltx) is written out during the implementation of the kernel by the Vivado tool, if the kernel has debug cores as specified in Hardware Debugging Using ChipScope.
The Vivado hardware manager can now be used to debug the kernels running on the SDAccel platform. Refer to the Vivado Design Suite User Guide: Programming and Debugging (UG908) for more information on working with the Vivado hardware manager.

Debugging a MicroBlaze Processor (RTL Kernels Only)

Note: This technique requires familiarity with the Vivado Design Suite, RTL design, the MicroBlaze™ processor, and standard MicroBlaze debugging techniques.

In RTL kernel block designs, a MicroBlaze processor is included under the control hierarchy. To debug the software applications running on the MicroBlaze processor, a MicroBlaze Debug Module (MDM) can optionally be included in the RTL kernel block design, allowing standard MicroBlaze debugging techniques to take place over XVC. To enable MicroBlaze debugging, both of the following must be true:

The SDAccel environment platform must support MicroBlaze debugging over XVC.
The RTL kernel must contain a MicroBlaze processor and MicroBlaze Debug Module (MDM).

The following platforms support hardware debug of a MicroBlaze processor:

xilinx_u200_xdma_201830_1
xilinx_u250_xdma_201830_1
xilinx_vcu1525_xdma_201830_1

MicroBlaze debugging can optionally be enabled in the RTL Kernel Wizard user interface. When generating the RTL kernel, if the platform supports MicroBlaze debug, a checkbox appears in the wizard allowing the feature to be enabled. When this box is checked, the optional MicroBlaze Debug Module (MDM) is included in the control block of the RTL kernel. The following steps detail how to enable MicroBlaze debug on your RTL kernel during the generation of the kernel.

Launch the RTL Kernel Wizard by clicking Xilinx > RTL Kernel Wizard. When the RTL Kernel Wizard launches, click Next.
On the General Settings page, select Block Design as the kernel type, and check the box to Enable MicroBlaze Debug, as seen in the following figure:

Connecting to a MicroBlaze Processor in an RTL Kernel over XVC

When the RTL kernel has been generated with MicroBlaze debug and an .xclbin binary has been created, you can connect to the MicroBlaze processor embedded in the kernel while it is running in hardware to view hardware registers and perform standard MicroBlaze debugging techniques.

Set up your environment by sourcing the appropriate settings64.sh/.csh file found in your install area.
Start the xvc_pcie and hw_server apps using the sdx_debug_hw script, as shown in the following example:
```
sdx_debug_hw --xvc_pcie /dev/xvc_pub.m1025 --hw_server
launching xvc_pcie...
xvc_pcie -d /dev/xvc_pub.m1025 -s TCP::10200
launching hw_server...
hw_server -sTCP::3121
```
Note: The /dev/xvc_* character device differs depending on the platform. In this example, the character device is /dev/xvc_pub.m1025, though on your system it is likely to differ.

Launch the Xilinx Software Command Line Tool (XSCT):

$ xsct

****** Xilinx Software Commandline Tool (XSCT) v2018.3
  **** SW Build 2373407 on Thu Oct 25 21:12:35 MDT 2018
    ** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.


xsct%

Connect to the hardware server and XVC server to list the available targets:
```
xsct% connect -url tcp:localhost:3121 -xvc-url tcp:localhost:10200
tcfchan#0
xsct% targets
  1  debug_bridge
     2  00000000
     3  Legacy Debug Hub
     4  MicroBlaze Debug Module at USER1.1.2.2
        5  MicroBlaze #0 (Running)
xsct%
```
Note: While this example uses a both a local hardware server and local XVC server, this is not a requirement. If you wish to use XSCT on a remote machine, replace localhost in the above example with the IP address or host name of the host on which sdx_debug_hw is running.
As can be seen, the MicroBlaze processor is listed as target number 5. It can be connected to by issuing the targets -set command. Listing the targets again shows that the MicroBlaze processor has been selected as the active target:
```
xsct% targets -set 5
xsct% targets
  1  debug_bridge
     2  00000000
     3  Legacy Debug Hub
	4  MicroBlaze Debug Module at USER1.1.2.2
        5* MicroBlaze #0 (Running)
```

At this point, standard MicroBlaze debugging techniques can be applied as described in the MicroBlaze Processor Reference Guide (UG984). For example, to list the contents of the MicroBlaze registers, rrd can be issued:

xsct% rrd
 r0: 0000000000000000   r1: 00000000000115e8   r2: 0000000000010960
 r3: 0000000000000006   r4: 0000000000000006   r5: 0000000000000000
 r6: 0000000000000000   r7: 0000000000000000   r8: 0000000000000000
 r9: 0000000000000000  r10: 0000000000000000  r11: 0000000000000000
r12: 0000000000000000  r13: 0000000000010a60  r14: 0000000000000000
r15: 0000000000010348  r16: 0000000000000000  r17: 0000000000000000
r18: 00000000ffffffff  r19: 00000000000115e8  r20: 0000000000000000
r21: 0000000000000000  r22: 0000000000000000  r23: 0000000000000000
r24: 0000000000000000  r25: 0000000000000000  r26: 0000000000000000
r27: 0000000000000000  r28: 0000000000000000  r29: 0000000000000000
r30: 0000000000000000  r31: 0000000000000000   pc: 00000000000106bc
msr:         00000010  ear: 0000000000000010  esr:         00000010
btr: 0000000000000010  edr:         00000010  dcr:         00000009
dsr:         21010000

xsct%