Running Emulation
Development of a user application and hardware kernels targeting an FPGA requires a phased development approach. Because FPGA, Versal™ ACAP, and Zynq UltraScale+ MPSoC are programmable devices, building the device binary for hardware takes some time. To enable quicker iterations without having to go through the full hardware compilation flow, the Vitis™ tool provides emulation targets on which the application and kernels can be run. Compiling for emulation targets is significantly faster than compiling for the actual hardware. Additionally, emulation targets provide full visibility into the application or accelerator, thus making it easier to perform debugging. Once your design passes in emulation, then in the late stages of development you can compile and run the application on the hardware platform.
The Vitis tool provides two emulation targets:
- Software emulation (sw_emu)
- The software emulation build compiles and links quickly, and the host program runs either natively on an x86 processor or in the QEMU emulation environment. The PL kernels are natively compiled and running on the host machine. This build target lets you quickly iterate on both the host code and kernel logic.
- Hardware emulation (hw_emu)
- The host program runs in
sw_emu
, natively on x86 or in the QEMU, but the kernel code is compiled into an RTL behavioral model which is run in the Vivado® simulator or other supported third-party simulators. This build and run loop takes longer but provides a cycle-accurate view of kernel logic.
Compiling for either of the emulation targets is seamlessly integrated into the Vitis command line and IDE flows. You can compile your host and kernel source code for either emulation target, without making any change to the source code. For your host code, you do not need to compile differently for emulation as the same host executable or PS application ELF binary can be used in emulation. Emulation targets support most of the features including XRT APIs, buffer transfer, platform memory SP tags, kernel-to-kernel connections, etc.
Running Emulation Targets
The emulation targets have their own target specific drivers which are
loaded by XRT. Thus, the same CPU binary can be run as-is without recompiling, by just
changing the target mode during runtime. Based on the value of the XCL_EMULATION_MODE
environment variable, XRT loads the
target specific driver and makes the application interface with an emulation model of
the hardware. The allowed values of XCL_EMULATION_MODE
are sw_emu
and hw_emu
.
If XCL_EMULATION_MODE
is not set, then XRT will load
the hardware driver.
XCL_EMULATION_MODE
when running
emulation.You can also use the xrt.ini file to
configure various options applicable to emulation. There is an [Emulation]
specific section in xrt.ini, as described in xrt.ini File.
Data Center vs. Embedded Platforms
Emulation is supported for both data center and embedded platforms. For data center platforms, the host application is compiled for x86 server, while the device is modeled as separate x86 process emulating the hardware. The user host code and the device model process communicate using RPC calls. For embedded platforms, where the CPU code is running on the embedded Arm processor, emulation flows use QEMU (Quick Emulator) to mimic the Arm-based PS-subsystem. In QEMU, you can boot embedded Linux and run Arm binaries on the emulation targets.
For running software emulation (sw_emu
) and
hardware emulation (hw_emu
) of a data center
application, you must compile an emulation model of the accelerator card using the
emconfigutil
command and set the XCL_EMULATION_MODE
environment variable prior to launching
your application. The steps are detailed in Running Emulation on Data Center Accelerator Cards.
For running sw_emu
or hw_emu
of an embedded application, you must launch the QEMU emulation
environment on the x86 processor to model the execution environment of the Arm processor. This requires the use of the launch_emulator.py
command, or shell scripts generated
during the build process. The details of this flow are explained in Running Emulation on an Embedded Processor Platform.
QEMU
QEMU stands for Quick Emulator. It is a generic and open source machine emulator. Xilinx provides a customized QEMU model that mimics the Arm based processing system present on Versal ACAP, Zynq® UltraScale+™ MPSoC, and Zynq-7000 SoC devices. The QEMU model provides the ability to execute CPU instructions at almost real time without the need for real hardware. For more information, refer to the Xilinx Quick Emulator User Guide: QEMU.
For hardware emulation, the Vitis emulation targets use QEMU and co-simulate it with an RTL and SystemC-based model for rest of the design to provide a complete execution model of the entire platform. You can boot an embedded Linux kernel on it and run the XRT-based accelerator application. Because QEMU can execute the Arm instructions, you can take the Arm binaries and run them in emulation flows as-is without the need to recompile. QEMU also allows you to debug your application using GDB and TCF-based target connections from Xilinx System Debugger (XSDB).
The Vitis emulation flow also uses QEMU to emulate the MicroBlaze™ processor to model the platform management modules (PLM and PMU) of the devices. On Versal devices, the PLM firmware is used to load the PDI to program sections of the PS and AI Engine model.
To ensure that the QEMU configuration matches the platform, there are additional files that must be provided as part of sw directory of Vitis platforms. Two common files, qemu_args.txt and pmc_args.txt, contain the command line arguments to be used when launching QEMU. When you create a custom platform, these two files are automatically added to your platform with default contents. You can review the files and edit as needed to model your custom platform. Refer to a Xilinx embedded platform for an example.
Because QEMU is a generic model, it uses a Linux device tree style DTB formatted file to enable and configure various hardware modules. A default QEMU hardware DTB file is shipped with the Vitis tools in the <vitis_installation>/data/emulation/dtbs folder. However, if your platform requires a different QEMU DTB, you can package it as part of your platform.
Running Emulation on Data Center Accelerator Cards
- Set the desired runtime settings in the xrt.ini file. This step is optional.
As described in xrt.ini File, the file specifies various parameters to control debugging, profiling, and message logging in XRT when running the host application and kernel execution. This enables the runtime to capture debugging and profile data as the application is running. The
Emulation
group in the xrt.ini provides features that affect your emulation run.TIP: Be sure to use thev++ -g
option when compiling your kernel code for emulation mode. - Create an emconfig.json file from the
target platform as described in emconfigutil Utility.
This is required for running hardware or software emulation.
The emulation configuration file,
emconfig.json
, is generated from the specified platform using theemconfigutil
command, and provides information used by the XRT library during emulation. The following example creates theemconfig.json
file for the specified target platform:
In emulation mode, the runtime looks for the emconfig.json file in the same directory as the host executable, and reads in the target configuration for the emulation runs.emconfigutil --platform xilinx_u200_xdma_201830_2
TIP: It is mandatory to have an up-to-date JSON file for running emulation on your target platform. - Set the
XCL_EMULATION_MODE
environment variable tosw_emu
(software emulation) orhw_emu
(hardware emulation) as appropriate. This changes the application execution to emulation mode.Use the following syntax to set the environment variable for C shell (csh):
setenv XCL_EMULATION_MODE sw_emu
Bash shell:
export XCL_EMULATION_MODE=sw_emu
IMPORTANT: The emulation targets will not run if theXCL_EMULATION_MODE
environment variable is not properly set. - Run the application.
With the runtime initialization file (xrt.ini), emulation configuration file (emconfig.json), and the
XCL_EMULATION_MODE
environment set, run the host executable with the desired command line argument.IMPORTANT: The INI and JSON files must be in the same directory as the executable.For example:
./host.exe kernel.xclbin
TIP: This command line assumes that the host program is written to take the name of the xclbin file as an argument, as most Vitis examples and tutorials do. However, your application may have the name of the xclbin file hard-coded into the host program, or may require a different approach to running the application.
Running Emulation on an Embedded Processor Platform
- Set the desired runtime settings in the xrt.ini file.
As described in xrt.ini File, the file specifies various parameters to control debugging, profiling, and message logging in XRT when running the host application and kernel execution. As described in Enabling Profiling in Your Application this enables the runtime to capture debugging and profile data as your application is running.
The xrt.ini file, as well as any additional files required for running the application, must be included in the output files as explained in Packaging for Embedded Platforms.
TIP: Be sure to use thev++ -g
option when compiling your kernel code for emulation mode. - Launch the QEMU emulation environment by running the launch_sw_emu.sh script or launch_hw_emu.sh script.
launch_sw_emu.sh -forward-port 1440 22
The script is created in the emulation directory during the packaging process, and uses the
launch_emulator.py
command to setup and launch QEMU. When launching the emulation script you can also specify options for thelaunch_emulator.py
command. Such as the-forward-port
option to forward the QEMU port to an open port on the local system. This is needed when trying to copy files from QEMU as discussed in Step 5 below. Refer to launch_emulator Utility for details of the command.Another example would be to specify
launch_hw_emu.sh -enable-debug
to configure additional XTERMs to be opened for QEMU and PL processes to observe live transcripts of command execution to aid in debugging the application. This is not enabled by default, but can be useful when needed for debug. - Mount and configure the QEMU shell with the required settings.
The Xilinx embedded base platforms have
rootfs
on a separate EXT4 partition on the SD card. After booting Linux, this partition needs to be mounted. If you are running emulation manually, you need to run the following commands from the QEMU shell:mount /dev/mmcblk0p1 /mnt cd /mnt export LD_LIBRARY_PATH=/mnt:/tmp:$LD_LIBRARY_PATH export XCL_EMULATION_MODE=hw_emu export XILINX_XRT=/usr export XILINX_VITIS=/mnt
TIP: You can set theXCL_EMULATION_MODE
environment variable tosw_emu
for software emulation, orhw_emu
for hardware emulation. This configures the host application to run in emulation mode. - Run the application from within the QEMU shell.
With the runtime initialization (xrt.ini), the
XCL_EMULATION_MODE
environment set, run the host executable with the command line as required by the host application. For example:./host.elf kernel.xclbin
TIP: This command line assumes that the host program is written to take the name of the xclbin file as an argument, as most Vitis examples and tutorials do. However, your application can have the name of the xclbin file hard-coded into the host program, or can require a different approach to running the application. - After the application run has completed, you might have some files that
were produced by the runtime, such as opencl_summary.csv, opencl_trace.csv, and xclbin.run_summary. These files can be found in the /mnt folder inside the QEMU environment. However,
to view these files you must copy them from the QEMU Linux system back to your local
system. The files can be copied using the
scp
command as follows:scp -P 1440 root@<host-ip-address>:/mnt/<file> <dest_path>
Where:
1440
is the QEMU port to connect to.root@<host-ip-address>
is the root login for the PetaLinux running under QEMU on the specified IP address. The default root password is "root".- /mnt/<file> is the path and file name of the file you want to copy from the QEMU environment.
- <dest_path> specifies the path and file name to copy the file to on the local system.
For example:scp -P 1440 root@172.55.12.26:/mnt/xclbin.run_summary
- When your application has completed emulation and you have copied any
needed files, click Ctrl + a + x keys to
terminate the QEMU shell and return to the Linux shell. Note: If you have trouble terminating the QEMU environment, you can kill the processes it launches to run the environment. The tool reports the process IDs (pids) at the start of the transcript, or you can specify the
-pid-file
option to capture the pids when launching emulation.
Speed and Accuracy of Hardware Emulation
Hardware emulation uses a mix of SystemC and RTL co-simulation to provide a balance between accuracy and speed of simulation. The SystemC models are a mix of purely functional models and performance approximate models. Hardware emulation does not mimic hardware accuracy 100%, therefore you should expect some differences in behavior between running emulation and executing your application on hardware. This can lead to significant differences in application performance, and sometimes differences in functionality can also be observed.
Functional differences with hardware typically point to a race condition or some unpredictable behavior in your design. So, an issue seen in hardware might not always be reproducible in hardware emulation, though most behavior related to interactions between the host and the accelerator, or the accelerator and the memory are reproducible in hardware emulation. This makes hardware emulation an excellent tool to debug issues with your accelerator prior to running on hardware.
The following table lists models that are used to mimic the hardware platform and their accuracy levels.
Hardware Functionality | Description |
---|---|
Host to Card PCIe® Connection
and DMA (XDMA, SlaveBridge) |
For data center platforms, the connection to the x86 host server over PCIe is done as a purely functional model and does not have any performance modeling. Thus, any issues related to PCIe bandwidth cannot be reflected in hardware emulation runs. |
UltraScale™ DDR Memory, SmartConnect | The SystemC models for the DDR memory controller, AXI SmartConnect, and other data path IPs are usually throughput approximate. They typically do not model the exact latency of the hardware IP. The model can be used to gauge a relative performance trend as you modify your application or the accelerator kernel. |
AI Engine | The AI Engine SystemC model is cycle approximate, though it is not intended to be 100% cycle accurate. A common model is used between AI Engine Simulator and hardware emulation, thus enabling a reasonable comparison between the two stages. |
Versal NoC and DDR Models | The Versal NoC and DDR SystemC models are cycle approximate. |
Arm Processing Subsystem (PS, CIPS) | The Arm PS is modeled using QEMU, which is a purely functional execution model. For more information, see QEMU. |
User Kernel (accelerator) | Hardware emulation uses RTL for the user accelerator. As follows, the accelerator behavior by itself is 100% accurate. However, the accelerator is surrounded by other approximate models. |
Other I/O Models | For hardware emulation, there is generic Python or C-based traffic generator which can be interfaced with the emulation process. You can generate abstract traffic at AXI protocol level which mimics the I/O in your design. Because these models are abstract, any issues observed on the specific hardware board will not be shown in hardware emulation. |
Because hardware emulation uses RTL co-simulation as its execution model, the
speed of execution is orders of magnitude slower as compared to real hardware. Xilinx recommends using small data buffers. For example, if you have a
configurable vector addition and in hardware you are performing a 1024 element
vadd
, in emulation you might restrict it to 16 elements. This will
enable you to test your application with the accelerator, while still completing
execution in reasonable time.
Working with Simulators in Hardware Emulation
Simulator Support
The Vitis tool uses the Vivado logic simulator (xsim
) as the default simulator for all platforms, including Alveo Data Center accelerator cards, and Versal and Zynq UltraScale+ MPSoC embedded platforms. However, for Versal embedded platforms, like xilinx_vck190_base
or custom platforms similar to it, the Vitis tool also supports the use of third-party
simulators for hardware emulation: Mentor Graphics Questa Advanced Simulator, Xcelium,
and VCS. The specific versions of the supported simulators are the same as the versions
supported by Vivado Design Suite.
Enabling a third-party simulator requires some additional configuration
options to be implemented during generation of the device binary (.xclbin) and supporting Tcl scripts. The specific
requirements for each simulator is discussed below. Also, note that you should run the
Vivado setup for third-party simulators before
using those simulators in Vitis. Specifically, you
must pre-compile the simulation models using the compile_sim_lib
Tcl command. For more details, see the Vivado Design Suite User Guide: Logic
Simulation (UG900) for third-party simulator
setup.
- Questa
- Add the following advanced parameters and Vivado properties to a configuration file for use during linking:
## Final set of additional options required for running simulation using Questa Simulator [advanced] param=hw_emu.simulator=QUESTA [vivado] prop=project.__CURRENT__.simulator.questa_install_dir=/tools/gensys/questa/2020.4/bin/ prop=project.__CURRENT__.compxlib.questa_compiled_library_dir=<install_dir>/clibs/questa/2020.4/lin64/lib/ prop=fileset.sim_1.questa.compile.sccom.cores={4}
After generating the configuration file you can use it in thev++
command line as follows:v++ -link --config questa_sim.cfg
- Xcelium
- Add the following advanced parameters and Vivado properties to a configuration file for use during
linking:
## Final set of additional options required for running simulation using Xcelium Simulator [advanced] param=hw_emu.simulator=XCELIUM [vivado] prop=project.__CURRENT__.simulator.xcelium_install_dir=/tools/dist/xlm/20.09.006/tools.lnx86/xcelium/bin/ prop=project.__CURRENT__.compxlib.xcelium_compiled_library_dir=/clibs/xcelium/20.09.006/lin64/lib/ prop=fileset.sim_1.xcelium.elaborate.xmelab.more_options={-timescale 1ns/1ps}
After generating the configuration file you can use it in thev++
command line as follows:v++ -link --config xcelium.cfg
- VCS
- Add the following advanced parameters and Vivado properties to a configuration file for use during
linking:
## Final set of additional options required for running simulation using VCS Simulator [advanced] param=hw_emu.simulator=VCS [vivado] prop=project.__CURRENT__.simulator.vcs_install_dir=/tools/gensys/vcs/R-2020.12/bin/ prop=project.__CURRENT__.compxlib.vcs_compiled_library_dir=/clibs/vcs/R-2020.12/lin64/lib/ prop=project.__CURRENT__.simulator.vcs_gcc_install_dir=/tools/installs/synopsys/vg_gnu/2019.06/amd64/gcc-6.2.0_64/bin
After generating the configuration file you can use it in thev++
command line as follows:v++ -link --config vcs_sim.cfg
You can use the -user-pre-sim-script
and
-user-post-sim-script
options from the launch_emulator.py
command to specify Tcl scripts to run
before the start of simulation, or after simulation completes. As an example, in these
scripts, you can use the $cwd
command to get the run
directory of the simulator and copy any files needed prior to simulation, or copy any
output files generated at the end of simulation.
To enable hardware emulation, you must setup the environment for
simulation in the Vivado Design Suite. A key step for
setup is pre-compiling the RTL and SystemC models for use with the simulator. To do
this, you must run the compile_sim_lib
command in the
Vivado tool. For more information on
pre-compilation of simulation models, refer to the Vivado Design Suite User Guide: Logic
Simulation (UG900).
When creating your Versal platform
ready for simulation, the Vivado tool generates a
simulation wrapper which must be instantiated in your simulation test bench. So, if the
top most design module is <top>
, then when
calling launch_simulation
in the Vivado tool, it will generate a <top>_sim_wrapper
module, and also generates xlnoc.bd. These files are generated as simulation-only
sources and will be overwritten anytime launch_simulation
is called in the Vivado tool. Platform developers need to instantiate this module in the
test bench and not their own <top>
module.
Using the Simulator Waveform Viewer
Hardware emulation uses RTL and SystemC models for execution. A regular application and HLS-based kernel developer does not need to be aware of the hardware level details. The Vitis analyzer provides sufficient details of the hardware execution model. However, for advanced users who are familiar with HW signal and protocols, they can launch hardware emulation with the simulator waveform running, as described in Waveform View and Live Waveform Viewer.
By default, when running v++ --link -t
hw_emu
, the tool compiles the simulation models in optimized mode. However,
when you also specify the -g
switch, you enable
hardware emulation models to be compiled in debug mode. During the application runtime,
use the -g
switch with the launch_hw_emu.sh
command to run the simulator interactively in GUI mode
with waveforms displayed. By default, the hardware emulation flow adds common signals of
interest to the waveform window. However, you can pause the simulator to add signals of
interest and resume simulation.
AXI Transactions Display in XSIM Waveform
xsim
:
add_wave <HDL_objects>
Using the add_wave
command, you can specify
full or relative paths to HDL objects. For additional details on how to interpret the
TLM waveform and how to add interfaces in the GUI, see the Vivado Design Suite User Guide: Logic
Simulation (UG900).
Working with SystemC Models
SystemC models in the Vitis application acceleration development flow allow you to quickly model an RTL algorithm for rapid analysis in software and hardware emulation. Using this approach you can model portions of your system while the RTL kernel is still in development, but you want to move forward with some system analysis.
The SystemC model feature supports all the XRT-managed kernel execution models
using ap_ctrl_hs
and ap_ctrl_chain
. It also supports modeling both AXI4 memory mapped interfaces (m_axi
) and AXI4-Stream interfaces (axis
), as well
as register reads and write of the s_axilite
interface.
You can model your kernel code in SystemC TLM models, provide interfaces to other kernels and
the host application, and use it during emulation. You can create a Xilinx object file (XO) to link the SystemC model to other kernels in
your xclbin
. The sections that follow discuss the creation of SystemC
models, the use of the create_sc_xo
command to create the XO, and
generating the xclbin
using the v++
command.
Coding a SystemC Model
The process for defining a SystemC model uses the following steps:
- Include header files "xtlm_ap_ctrl.h" and "xtlm.h".
- Derive your kernel from a predefined class based on the supported
kernel types:
ap_ctrl_chain
,ap_ctrl_hs
, etc. - Declare and define the AXI interfaces used on your kernel.
- Add required kernel arguments with the correct address offset and size.
- Write the kernel body in
main()
thread.
This process is demonstrated in the code example below.
When creating the SystemC model, you derive the kernel from a
class-based on a supported control protocol: xtlm_ap_ctrl_chain
, xtlm_ap_ctrl_hs
, and
xtlm_ap_ctrl_none
. Use the following structure to
create your SystemC model.
#include "xtlm.h"
#include "xtlm_ap_ctrl.h"
class vadd : public xsc::xtlm_ap_ctrl_hs
{
public:
SC_HAS_PROCESS(vadd);
vadd(sc_module_name name, xsc::common_cpp::properties& _properties):
xsc::xtlm_ap_ctrl_hs(name)
{
DEFINE_XTLM_AXIMM_MASTER_IF(in1, 32);
DEFINE_XTLM_AXIMM_MASTER_IF(in2, 32);
DEFINE_XTLM_AXIMM_MASTER_IF(out_r, 32);
ADD_MEMORY_IF_ARG(in1, 0x10, 0x8);
ADD_MEMORY_IF_ARG(in2, 0x18, 0x8);
ADD_MEMORY_IF_ARG(out_r, 0x20, 0x8);
ADD_SCALAR_ARG(size, 0x28, 0x4);
SC_THREAD(main_thread);
}
//! Declare aximm interfaces..
DECLARE_XTLM_AXIMM_MASTER_IF(in1);
DECLARE_XTLM_AXIMM_MASTER_IF(in2);
DECLARE_XTLM_AXIMM_MASTER_IF(out_r);
//! Declare scalar args...
unsigned int size;
void main_thread()
{
wait(ev_ap_start); //! Wait on ap_start event...
//! Copy kernel args configured by host...
uint64_t in1_base_addr = kernel_args[0];
uint64_t in2_base_addr = kernel_args[1];
uint64_t out_r_base_addr = kernel_args[2];
size = kernel_args[3];
unsigned data1, data2, data_r;
for(int i = 0; i < size; i++) {
in1->read(in1_base_addr + (i*4), (unsigned char*)&data1); //! Read from in1 interface
in2->read(in2_base_addr + (i*4), (unsigned char*)&data2); //! Read from in2 interface
//! Add data1 & data2 and write back result
data_r = data1 + data2; //! Add
out_r->write(out_r_base_addr + (i*4), (unsigned char*)&data_r); //! Write the result
}
ap_done(); //! completed Kernel computation...
}
};
The include files are available in the Vitis installation hierarchy under the $XILINX_Vivado/data/systemc/simlibs/ folder.
The kernel name is specified when defining the class for the SystemC
model, as shown above, inheriting from the xtlm_ap_ctrl_hs
class.
DECLARE_XTLM_AXIMM_MASTER_IF(in1);
DEFINE_XTLM_AXIMM_MASTER_IF(in1, 32);
ADD_MEMORY_IF_ARG(in1, 0x10, 0x8);
When specifying the kernel arguments, offsets, and size, these values should match the values reflected in the AXI4-Lite interface of the XRT-managed kernel as described in SW-Controllable Kernels and Control Requirements for XRT-Managed Kernels.
The addresses below 0x10
are reserved for use by XRT
for managing kernel execution. Kernel arguments can be specified from
0x10
onwards. Most importantly the arguments, offsets, and size
specified in the SystemC model should match the values used in the Vitis HLS or RTL kernel.
The the kernel is executed in the SystemC main_thread
. This thread waits until the ap_start
bit is set by the host application, or XRT, at which time the
kernel can process argument values as shown:
- The kernel waits for a signal to begin from XRT (
ev_ap_start
). - Kernel arguments are mapped to variables in the SystemC model.
- The inputs are read.
- The vectors are added and the result is captured.
- The output is written back to the host.
- The finished signal is sent to XRT (
ap_done
).
Creating the XO
To generate XO file from the SystemC model, use the create_sc_xo
command. This takes the SystemC kernel source
file as input and creates IP that generates the XO, which can be used for linking with
the target platform and other kernels with the Vitis compiler. For example:
create_sc_xo vadd.cpp
Generating an XO file from the source file involves a number of
intermediate steps like generating a Package IP script, and running the package_xo
command. These intermediate commands can be
used for debugging if necessary.
The output of the above create_sc_xo
command is vadd.xo.
Linking with the v++ Command
v++ --link
command line:
v++ --link --platform <platform> --target hw_emu \
--config ./vadd.cfg --input_files ../vadd.xo --output ../vadd.link.xclbin \
--optimize 0 --save-temps --temp_dir ./hw_emu
The SystemC model can be used for both software emulation and hardware emulation, but is not supported for hardware build targets.
When a SystemC model is included in the xclbin
, the design is no longer clock cycle accurate due to the
limitations of the TLM.
Using I/O Traffic Generators
Some user applications such as video streaming and Ethernet-based applications make use of I/O ports on the platform to stream data into and out of the platform. For these applications, performing hardware emulation of the design requires a mechanism to mimic the hardware behavior of the I/O port, and to simulate data traffic running through the ports. I/O traffic generators let you model traffic through the I/O ports during hardware emulation in the Vitis application acceleration development flow, or during logic simulation in the Vivado Design Suite. This supports both AXI4-Stream and AXI4 memory map interface I/O emulation.
Adding Traffic Generators to Your Design
Xilinx devices have rich I/O interfaces. The Alveo accelerator cards primarily have PCIe and DDR memory interfaces which have their own specific model. However, your platforms could also have other I/Os, for example GT-kernel based generic I/O, Video Streams, and Sensor data. I/O Traffic Generator kernels provide a method for platforms and applications to inject traffic onto the I/O during simulation.
This solution requires both the inclusion of streaming I/O kernels (XO) or IP in your design, and the use of a Python/C++/C provided by Xilinx to inject traffic or to capture output data from the emulation process. The Xilinx provided Python/C++/C library can be used to integrate traffic generator code into your application, run it as a separate process, and have it interface with the emulation process. Currently, Xilinx provides a library that enables interfacing at AXI4-Stream level to mimic any Streaming I/O and AXI3/AXI4 memory mapped interface to mimic any memory mapped I/O.
AXI4-Stream I/O Model for Streaming Traffic
The following section is specific to AXI4-Stream. The streaming I/O model can be used to emulate streaming traffic on the platform, and also support delay modeling. You can add streaming I/O to your application, or add them to your custom platform design as described below:
- Streaming I/O kernels can be added to the device binary (xclbin) file like any other compiled kernel object
(XO) file, using the
v++ --link
command. The Vitis installation provides kernels for AXI4-Stream interfaces of various data widths. These can be found in the software installation at $XILINX_VITIS/data/emulation/XO.Add these to your designs using the following example command:
v++ -t hw_emu --link $XILINX_VITIS/data/emulation/XO/sim_ipc_axis_master_32.xo $XILINX_VITIS/data/emulation/XO/sim_ipc_axis_slave_32.xo ...
In the example above, the sim_ipc_axis_master_32.xo and sim_ipc_axis_slave_32.xo provide 32-bit master and slave kernels that can be linked with the target platform and other kernels in your design to create the .xclbin file for the hardware emulation build.
- IPC modules can also be added to a platform block design using the
Vivado IP integrator feature for Versal and Zynq UltraScale+ MPSoC custom platforms. The tool provides
sim_ipc_axis_master_v1_0
andsim_ipc_axis_slave_v1_0
IP to add to your platform design. These can be found in the software installation at $XILINX_VIVADO/data/emulation/hw_em/ip_repo.The following is an example Tcl script used to add IPC IP to your platform design, which will enable you to inject data traffic into your simulation from an external process written in Python or C++:
#Update IP Repository path if required set_property ip_repo_paths $XILINX_VIVADO/data/emulation/hw_em/ip_repo [current_project] ## Add AXIS Master create_bd_cell -type ip -vlnv xilinx.com:ip:sim_ipc_axis_master:1.0 sim_ipc_axis_master_0 #Change Model Property if required set_property -dict [list CONFIG.C_M00_AXIS_TDATA_WIDTH {64}] [get_bd_cells sim_ipc_axis_master_0] ##Add AXIS Slave create_bd_cell -type ip -vlnv xilinx.com:ip:sim_ipc_axis_slave:1.0 sim_ipc_axis_slave_0 #Change Model Property if required set_property -dict [list CONFIG.C_S00_AXIS_TDATA_WIDTH {64}] [get_bd_cells sim_ipc_axis_slave_0]
Writing Traffic Generators in Python
You must also include a traffic generator process while simulating your application to generate data traffic on the I/O traffic generators, or to capture output data from the emulation process. The Xilinx provided Python or C++ library can be used to create the traffic generator code as described below. Also, an application can communicate to multiple I/O interface. It is not necessary to have each instance of I/O utilities to be in a separate process/thread. In case your application demands it, you might consider the non-blocking version APIs (details provided in the following section).
- For Python, set
$PYTHONPATH
on the command terminal:setenv PYTHONPATH $XILINX_VIVADO/data/emulation/hw_em/lib/python:\ $XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/python/
- Sample Python code to connect with the
gt_master
instance would look like the following:Blocking Send from xilinx_xtlm import ipc_axis_master_util from xilinx_xtlm import xtlm_ipc import struct import binascii #Instantiating AXI Master Utilities master_util = ipc_axis_master_util("gt_master") #Create payload payload = xtlm_ipc.axi_stream_packet() payload.data = "BINARY_DATA" # One way of getting "BINARY_DATA" from integer can be like payload.data = bytes(bytearray(struct.pack("i", int_number))) More info @ https://docs.python.org/3/library/struct.html payload.tlast = True #AXI Stream Fields #Optional AXI Stream Parameters payload.tuser = "OPTIONAL_BINARY_DATA" payload.tkeep = "OPTIONAL_BINARY_DATA" #Send Transaction master_util.b_transport(payload) master_util.disconnect() #Disconnect connection between Python & Emulation
- Sample Python code to connect with the
gt_slave
instance would look like the following:Blocking Receive from xilinx_xtlm import ipc_axis_slave_util from xilinx_xtlm import xtlm_ipc #Instantiating AXI Slave Utilities slave_util = ipc_axis_slave_util("gt_slave") #Sample payload (Blocking Call) payload = slave_util.sample_transaction() slave_util.disconnect() #Disconnect connection between Python & Emulation
- For non-blocking version APIs in Python, it can be found
at:
$XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/python/xilinx_xtlm.py
Writing Traffic Generators in C++
- For C++ the APIs are available
at:
The C++ API provides both blocking and non-blocking function support. The following snippets show the usage.$XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/cpp/inc/axis
TIP: A sample Makefile is also available to generate the executable. - Blocking send: A simple API is available if you prefer not to have fine granular control (recommended):
#include "xtlm_ipc.h" //Include file void send_data() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_initiator_socket_util<xtlm_ipc::BLOCKING> socket_util("gt_master"); const unsigned int NUM_TRANSACTIONS = 8; std::vector<char> data; std::cout << "Sending " << NUM_TRANSACTIONS << " data transactions..." <<std::endl; for(int i = 0; i < NUM_TRANSACTIONS; i++) { data = generate_data(); print(data); socket_util.transport(data.data(), data.size()); } }
For advanced users who need fine granular control over AXI4-Stream can use the following:
#include "xtlm_ipc.h" //Include file void send_packets() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_initiator_socket_util<xtlm_ipc::BLOCKING> socket_util("gt_master"); const unsigned int NUM_TRANSACTIONS = 8; xtlm_ipc::axi_stream_packet packet; std::cout << "Sending " << NUM_TRANSACTIONS << " Packets..." <<std::endl; for(int i = 0; i < NUM_TRANSACTIONS; i++) { xtlm_ipc::axi_stream_packet packet; // generate_data() is your custom code to generate traffic std::vector<char> data = generate_data(); //! Set packet attributes... packet.set_data(data.data(), data.size()); packet.set_data_length(data.size()); packet.set_tlast(1); //Additional AXIS attributes can be set if required socket_util.transport(packet); //Blocking transport API to send the transaction } }
- Blocking receive: A simple API is available if you prefer not to have fine granular control (recommended):
#include "xtlm_ipc.h" //Include file void receive_data() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_target_socket_util<xtlm_ipc::BLOCKING> socket_util("gt_slave"); const unsigned int NUM_TRANSACTIONS = 100; unsigned int num_received = 0; std::vector<char> data; std::cout << "Receiving " << NUM_TRANSACTIONS << " data transactions..." <<std::endl; while(num_received < NUM_TRANSACTIONS) { socket_util.sample_transaction(data); print(data); num_received += 1; } }
For advanced users who need fine granular control over AXI4-Stream can use the following:
#include "xtlm_ipc.h" void receive_packets() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_target_socket_util<xtlm_ipc::BLOCKING> socket_util("gt_slave"); const unsigned int NUM_TRANSACTIONS = 8; unsigned int num_received = 0; xtlm_ipc::axi_stream_packet packet; std::cout << "Receiving " << NUM_TRANSACTIONS << " packets..." <<std::endl; while(num_received < NUM_TRANSACTIONS) { socket_util.sample_transaction(packet); //API to sample the transaction //Process the packet as per requirement. num_received += 1; } }
- Non-Blocking send:
#include <algorithm> // std::generate #include "xtlm_ipc.h" //A sample implementation of generating random data. xtlm_ipc::axi_stream_packet generate_packet() { xtlm_ipc::axi_stream_packet packet; // generate_data() is your custom code to generate traffic std::vector<char> data = generate_data(); //! Set packet attributes... packet.set_data(data.data(), data.size()); packet.set_data_length(data.size()); packet.set_tlast(1); //packet.set_tlast(std::rand()%2); //! Option to set tuser tkeep optional attributes... return packet; } //Simple Usage void send_data() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_initiator_socket_util<xtlm_ipc::NON_BLOCKING> socket_util("gt_master"); const unsigned int NUM_TRANSACTIONS = 8; std::vector<char> data; std::cout << "Sending " << NUM_TRANSACTIONS << " data transactions..." <<std::endl; for(int i = 0; i < NUM_TRANSACTIONS/2; i++) { data = generate_data(); print(data); socket_util.transport(data.data(), data.size()); } std::cout<< "Adding Barrier to complete all outstanding transactions..." << std::endl; socket_util.barrier_wait(); for(int i = NUM_TRANSACTIONS/2; i < NUM_TRANSACTIONS; i++) { data = generate_data(); print(data); socket_util.transport(data.data(), data.size()); } } void send_packets() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_initiator_socket_util<xtlm_ipc::NON_BLOCKING> socket_util("gt_master"); // Instantiate Non Blocking specialization const unsigned int NUM_TRANSACTIONS = 8; xtlm_ipc::axi_stream_packet packet; std::cout << "Sending " << NUM_TRANSACTIONS << " Packets..." <<std::endl; for(int i = 0; i < NUM_TRANSACTIONS; i++) { packet = generate_packet(); // Or user's test patter / live data etc. socket_util.transport(packet); } }
- Non-Blocking receive:
#include <unistd.h> #include "xtlm_ipc.h" //Simple Usage void receive_data() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_target_socket_util<xtlm_ipc::NON_BLOCKING> socket_util("gt_slave"); const unsigned int NUM_TRANSACTIONS = 8; unsigned int num_received = 0, num_outstanding = 0; std::vector<char> data; std::cout << "Receiving " << NUM_TRANSACTIONS << " data transactions..." <<std::endl; while(num_received < NUM_TRANSACTIONS) { num_outstanding = socket_util.get_num_transactions(); num_received += num_outstanding; if(num_outstanding != 0) { std::cout << "Outstanding data transactions = "<< num_outstanding <<std::endl; for(int i = 0; i < num_outstanding; i++) { socket_util.sample_transaction(data); print(data); } } usleep(100000); } } void receive_packets() { //! Instantiate IPC socket with name matching in IPI diagram... xtlm_ipc::axis_target_socket_util<xtlm_ipc::NON_BLOCKING> socket_util("gt_slave"); const unsigned int NUM_TRANSACTIONS = 8; unsigned int num_received = 0, num_outstanding = 0; xtlm_ipc::axi_stream_packet packet; std::cout << "Receiving " << NUM_TRANSACTIONS << " packets..." <<std::endl; while(num_received < NUM_TRANSACTIONS) { num_outstanding = socket_util.get_num_transactions(); num_received += num_outstanding; if(num_outstanding != 0) { std::cout << "Outstanding packets = "<< num_outstanding <<std::endl; for(int i = 0; i < num_outstanding; i++) { socket_util.sample_transaction(packet); print(packet); } } usleep(100000); //As transaction is non-blocking we would like to give some delay between consecutive samplings } }
- The following is an example Makefile for the blocking receive
above:
GCC=/usr/bin/g++ IPC_XTLM=$(XILINX_VIVADO)/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/cpp/ PROTO_PATH=$(XILINX_VIVADO)/data/simmodels/xsim/2021.1/lnx64/6.2.0/ext/protobuf/ BOOST=$(XILINX_VIVADO)/tps/boost_1_64_0/ SRC_FILE=b_receive.cpp .PHONY: run all default: all all : b_receive b_receive: $(SRC_FILE) $(GCC) $(SRC_FILE) $(IPC_XTLM)/src/common/xtlm_ipc.pb.cc $(IPC_XTLM)/src/axis/*.cpp $(IPC_XTLM)/src/common/*.cpp -I$(IPC_XTLM)/inc/ -I$(PROTO_PATH)/include/ -L$(PROTO_PATH) -lprotobuf -o $@ -lpthread -I$(BOOST)/
- For C APIs, it can be found
at:
It can be linked against pre-compiled library at:$XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/C/inc/axis/c_axis_socket.h
$XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/C/lib/
A full system-level example is available at https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/emulation.
AXI4 Memory Map External Traffic through Python/C++
The AXI4 memory map external traffic has the following specifications:
- Only transaction-level granularity is supported.
- Re-ordering of transactions is not supported.
- Parallel Read, Write transactions are not supported (transactions will be serialized).
- Unaligned transactions are not supported.
The following figure shows the high-level design.
Use Cases
The use cases include the following:
- Emulate AXI4 memory map Master/Slave through an external process such as Python/C++. This can help you with emulating design with quick design time of AXI4 Master/Slave without investing resources in developing AXI4 Master.
- Chip-to-chip connection between two FPGAs can be emulated with AXI4 memory map Interprocess communication.
API/Pseudo Code
For API/pseudo code, a single instance of AXI4 memory map transaction is used for the complete transaction. This is
in line with how payload is used in the Xilinx
SystemC modules. For AXI4 Master, there is a
b_transport(aximm_packet)
API. After the call,
aximm_packet
is updated with a response given by
AXI4 Slave. For AXI4 Slave, there are sample_transaction()
and send_response(aximm_packet)
APIs.
The following code snippets show the API usage in the context of C++.
- Code snippet for C++
Master:
auto payload = generate_random_transaction(); //Custom Random transaction generator. Users can configure AXI propeties on the payload. /* Or User can set the AXI transaction properties as follows payload->set_addr(std::rand() * 4); payload->set_len(1 + (std::rand() % 255)); payload->set_size(1 << (std::rand() % 3)); */ master_uti.b_transport(*payload.get(), std::rand() % 0x10); //A blocking call. Response will be updated in the same payload. Each AXI MM transaction will use same payload for whole transaction std::cout << "-----------Transaction Response------------" << std::endl; std::cout << *payload << std::endl; //Prints AXI transaction info
- Code snippet for C++
Slave:
auto& payload = slave_util.sample_transaction(); // Sample the transaction //If it is read transaction, give read data if(payload.cmd() == xtlm_ipc::aximm_packet_command_READ) { rd_resp.resize(payload.len()*payload.size()); std::generate(rd_resp.begin(), rd_resp.end(), []() { return std::rand()%0x100;}); } //Set AXI response (for Read & Write) payload.set_resp(std::rand()%4); slave_util.send_response(payload); //Send the response to the master
The following code snippets show the API usage in the context of Python.
You need to set PYTHONPATH
as follows:
- For example, on C Shell:
setenv PYTHONPATH $XILINX_VIVADO/data/emulation/hw_em/lib/python: $XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/python
- Code snippet of Python
Master:
aximm_payload = xtlm_ipc.aximm_packet() random_packet(aximm_payload) # Custom function to set AXI Properties randomly #Or user can set AXI properties as required #aximm_payload.addr = int(random.randint(0, 1000000)*4) #aximm_payload.len = random.randint(1, 64) #aximm_payload.size = 4 master_util.b_transport(aximm_payload) #After this call aximm_payload will have updated response as set by the AXI Slave.
- Code snippet of Python
Slave:
aximm_payload = slave_util.sample_transaction() aximm_payload.resp = random.randint(0,3) if not aximm_payload.cmd: #if it is a read transction set Random data tot_bytes = aximm_payload.len * aximm_payload.size for i in range(0, int(tot_bytes/SIZE_OF_EACH_DATA_IN_BYTES)): aximm_payload.data += bytes(bytearray(struct.pack(">I", random.randint(0,60000)))) # Binary data should be aligned with C struct slave_util.send_resp(aximm_payload)
AXI4 Memory Map I/O Limitations in the Platform
The following shows the AXI4 memory map I/O limitations in the platform:
- During platform development, AXI4 memory map I/O can be connected to any memory/slave.
- Master AXI4 memory map I/O cannot connect to kernel as kernel cannot provide an additional slave interface.
- AXI4 memory map Slave I/O can be used without any restrictions.
- AXI4 memory map Master I/O can be used where data needs to be driven from external process to memory/slave.
XO Usage
The use cases of AXI4 memory map I/O XO differs from AXI4-Stream I/O XO. AXI4 memory map XOs have few limitations on usage during the link stage of Vitis listed as below:
- Only AXI4 memory map Master I/O can be used.
- AXI4 memory map Master I/O can connect only with available slaves in the platform.
- AXI4 memory map Master I/O cannot communicate with kernel in the design.
For XO usage during link stage:
- To generate XO, developers can use the script available at $XILINX_VITIS/data/emulation/XO/scripts/aximm_xo_creation.sh
- Required configuration of XO can be generated using the above
script.
$XILINX_VITIS/data/emulation/XO/scripts/aximm_xo_creation.sh --address_width <adr_width> --data_width <data_width> --id_width <id_width> --output_path <output_path>.xo $XILINX_VITIS/data/emulation/XO/scripts/aximm_xo_creation.sh --address_width 64 --data_width 64 --id_width 4 --output_path sim_ipc_aximm_master.xo
- After generating XO, it can be used in the design with
configuration as shown below (sample usage, actual connection to be done based on
the
requirement):
[connectivity] nk=sim_ipc_aximm_master:1:aximm_master sp=aximm_master.M_AXIMM:HBM[0]
Running Traffic Generators
After generating an external process binary as shown above using the headers and sources available at $XILINX_VIVADO/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/<supported_language>, you can run the emulation using the following steps:
- Launch the Vitis hardware emulation or Vivado simulation using the standard process and wait for the simulation to start.
- From another terminal(s), launch the external process such as Python/C++/C.