Integrating the Application Using the Vitis Tools Flow

While developing an AI Engine design graph, many design iterations are typically performed using the AI Engine compiler or AI Engine simulator tools. This method provides quick design iterations when focused on developing the AI Engine application. When ready, the AI Engine design can be integrated into a larger system design using the flow described in this chapter.

The Vitis™ tools flow simplifies hardware design and integration with a software-like compilation and linking flow, integrating the three domains of the Versal™ device: the AI Engine array, the programmable logic (PL) region, and the processing system (PS). The Vitis compiler flow lets you integrate your compiled AI Engine design graph (libadf.a) with additional kernels implemented in the PL region of the device, including HLS and RTL kernels, and link them for use on a target platform. You can call these compiled hardware functions from a host program running in the Arm® processor in the Versal device.

The following figure shows the high-level steps required to use the Vitis tools flow to integrate your application. The command-line process to run this flow is described here.

Note: You can also use this flow from within the Vitis IDE as explained in Using the Vitis IDE.

IMPORTANT: Using Vitis tools and AI Engine tools require the setup described in Setting Up the Vitis Tool Environment.

The following steps can be adapted to any AI Engine design in a Versal device.

As described in Compiling an AI Engine Graph Application, the first step is to create and compile the AI Engine graph into a libadf.a file using the AI Engine compiler. You can iterate between the AI Engine compiler, and the AI Engine simulator to develop the graph, until you are ready to proceed.
Compiling PL Kernels: PL kernels are compiled for implementation in the PL region of the target platform using the v++ --compile command. These kernels can be C/C++ kernels or RTL kernels, in compiled Xilinx object (xo) form.
Linking the System: Link the compiled AI Engine graph with the C/C++ kernels and RTL kernels onto a target platform. The process creates an XCLBIN file to load and run an AI Engine graph and PL kernels code in the target platform.
Compile the Embedded Application for the Cortex-A72 Processor: Optionally compile a host application to run on the Cortex®-A72 core processor using the GNU Arm cross-compiler to create an ELF file. The host program interacts with the AI Engine kernels and kernels in the PL region. This compilation step is optional because there are several ways to deploy and interact with the AI Engine kernels, and the host program running in the PS is one way.
Packaging the system: Use the v++ --package process to gather the required files to configure and boot the system, to load and run the application, including the AI Engine graph and PL kernels. This builds the necessary package to run emulation and debug, or run your application on hardware.

Platforms

A platform is a fully contained image that defines both the hardware (XSA) as well as the software (bare metal, Linux, or both). The XSA contains the hardware description of the platform, which is defined in the Vivado Design Suite, and the software is defined with the use of a bare-metal setup, or a Linux image defined through PetaLinux.

Types of Platforms

There are two types of platforms: base platform, or custom platform. A base platform is one that is provided by Xilinx (for example, the xilinx_vck190_base_202110_1) typically targeting Xilinx boards, and a custom platform is one that you can create, either by extending or re-customizing a base platform or creating a new platform. When starting platform development, it can be useful to use a base platform as a reference development platform to create your custom platform.

Custom Platforms

You can create platforms that can re-customize an existing base platform (for example, changing the AI Engine clock frequency, clocks available in the programmable logic (PL), changing memory controller settings) or create a new platform targeting Xilinx or non-Xilinx boards. Creating a platform allows you to provide your own IP or subsystems to meet your needs. The process to create a platform can be found in Creating Embedded Platforms in Vitis in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

Platform Clocking

Platforms have a variety of clocking: processor, PL, and AI Engine clocking. The following table explains the clocking for each.

Table 1. Platform Clocks
Clock	Description
AI Engine	Can be configured in the platform in the AI Engine IP.
Processor	Can be configured in the platform in the CIPS IP.
Programmable Logic (PL)	Can have multiple clocks and can be configured in the platform.
NoC	Device dependent and can be configured in the platform in the CIPS and NoC IP.
These clocks are derived from the platform and are affected by the device, speed grade and operating voltage.

For more information related to platform clocking, see Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393). For information on Versal device clocks, see Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957).

PL Kernels

PL kernels can take the form of HLS kernels, written in C/C++, or RTL kernels packaged in the Vivado Design Suite. These kernels must be separately compiled to produce the Xilinx object files (XO) used in integrating the system design on the target platform.

HLS kernels, written in C/C++, can be written and compiled from within the Vitis HLS tool directly, or as part of the Vitis application acceleration development flow.

For information on creating and building RTL kernels, see RTL Kernels in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

Compiling PL Kernels

To compile kernels using the Vitis compiler command as described in the Compiling Kernels with Vitis Compiler in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416), use the following command syntax:

v++ --compile -t hw_emu --platform xilinx_vck190_base_202110_1 -g \
-k <kernel_name> <kernel>.cpp -o <kernel_name>.xo --save-temps

The v++ command uses the options described in the following table.

Table 2. Vitis Compiler Options
Option	Description
`--compile`	Specifies compilation mode.
`-t hw_emu`	Specifies the build target for the compilation process. For more information, see the Build Targets section in Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393).
`--platform`	Specifies the path and name of the target platform. For this example command-line option, it is assumed to have PLATFORM_REPO_PATHS set to the right platform path.
`-g`	Enables the debug features. This is required for emulation modes.
`-k`	Specifies the kernel name. This must match the function name in the specified kernel source file.
`-o`	Specifies the output file name of the compiled Xilinx object file (.xo).
`--save-temps`	Saves the temporary files generated during the compilation process. This is optional.

Clocking the PL Kernels

PLIO represents an ADF graph interface to the PL. This PL could be a PL kernel, a platform IP representing a signal source or sink, or it could be a data mover to interface the ADF graph to memory. You should provide clock frequency values for these interfaces to ensure simulation results match the results from running the design in hardware. In addition, when you link the ADF graph into the platform, at the Vitis linker (v++ -link) step, you have the ability to provide more accurate clock values depending on the specific clock frequency values supported by the platform. To set the exact frequency of a PLIO interface in the graph and the clock frequency of the corresponding PL kernel you must specify the clock frequency in three locations:

ADF graph (Optional)
Vitis compilation of a PL kernel (v++ -c)
Vitis linking (v++ -l)

You must specify the clocking depending on where the kernels are located. The following table describes the default clocks based on the kernel location.

Table 3. Default Kernel Clocks
Kernel Location	Description
AI Engine kernels	Clocked per the AI Engine clock frequency. All cores run with the same clock frequency.
PL kernels connected to AI Engine graph	HLS: Default frequency for all HLS kernels - 150 MHz RTL: Frequency is set to the frequency that the XO file was compiled with. AI Engine: Set in the PLIO constructor in the AI Engine graph. Setting the frequency here is optional. See Adaptive Data Flow Graph Specification Reference for more information.¹
PL kernels added to platform using the Vitis linker	Platforms have a default clock. If no clocking option is set at the command line or configuration file the default clock is used. This default can be overridden depending on the design and required clock value, as shown in the following table.
If the PLIO frequency is not provided in the PLIO constructor the AI Engine compiler defaults the frequency to one quarter of the AI Engine clock frequency. When you determine the target platform, Xilinx recommends setting the PLIO clock frequencies explicitly, to make your AI Engine simulations more representative of your application.

Note: The maximum supported PLIO interface clock frequency is half the AI Engine clock frequency, depending on the device speed-grade. If you specify a higher frequency, the Vitis linker will cap the frequency to the maximum supported frequency and will issue a critical warning in linker stage.

Setting the clocks at the Vitis linker step allows you to choose a frequency based on the platform. The following table describes the Vitis compiler clocking options during the link step.

Table 4. Vitis Linking Clock Options
`[clock]` Options	Description
`--clock.defaultFreqHz arg`	Specify a default clock frequency to use in Hz.
`--clock.defaultId arg`	Specify a default clock reference ID to use.
`--clock.defaultTolerance arg`	Specify a default clock tolerance to use.
`--clock.freqHz arg`	`<frequency_in_Hz>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]` Specify a clock frequency in Hz and a list of associated compute unit names and optionally their clock pins.
`--clock.id arg`	`<reference_ID>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]` Specify a clock reference ID and a list of associated compute unit names and optionally their clock pins.
`--clock.tolerance arg`	`<tolerance>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]` Specify a clock tolerance and a list of associated compute unit names and optionally their clock pins.

The following table describes the steps to set clock frequencies for PLIOs that interface to the platform, including to PL kernels specified outside of the ADF graph.

Table 5. Compiling PL Kernels with Non-default Clocking
PL Kernel Location	Clock Specification
PLIO interface specified in ADF graph	Specify the clock frequency per PLIO interface in the graph. For the PLIO interface (you can optionally specify `FreqMHz` `adf::PLIO *<input> = new adf::PLIO(<logical_name>, <plio_width>, <file>, <FreqMHz>);`
HLS kernels	Compile the HLS code using the Vitis compiler. `v++ -c -k kernelName kernel.cpp --hls.clock freqHz:kernelName` To change the frequency at which HLS kernels are compiled use: `--hls.clock arg:kernelName`. `arg` must be in Hz (for example, `250000000Hz` is 250 MHz). Per kernel, specify the clock in the Vitis linker. `v++ -l ... --clock.freqHz <freqHz>:kernelName.ap_clk`
RTL kernels	Per kernel, specify the clock in the Vitis linker. `v++ -l ... --clock.freqHz <freqHz>:kernelName.ap_clk`

Note: Clock frequencies for PL kernels at the Vitis linker stage take precedence over Vitis compile time clock frequency values. However the clock frequency value specified at the Vitis linker stage should not exceed the Vitis compiler clock frequency value significantly, because the Vitis compiler generates RTL based on the target frequency specified.

See Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393) for more detailed information on how to compile kernels for specific platform clocks and clocking information.

Linking the System

After the AI Engine graph and the C/C++ kernels are compiled, and any RTL kernels are packaged, the Vitis v++ --link command links them with the target platform to build the device binary (XCLBIN), used to program the hardware. For more information, see Linking the Kernels in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

The following is an example of the linking command for the Vitis compiler in the AI Engine design flow.

v++ --link -t hw_emu --platform xilinx_vck190_base_202110_1 -g \
<pl_kernel1>.xo <pl_kernel2>.xo ../libadf.a -o vck190_aie_graph.xclbin \
--config ../system.cfg --save-temps

The v++ command uses the options in the following table.

Table 6. Vitis Compiler Link Options
Option	Description
`--link`	Specifies the linking process.
`-t hw_emu`	Specifies the build target of the link process. For the AI Engine kernel flow, the target can be either `hw_emu` for emulation and test, or `hw` to build the system hardware. IMPORTANT: The v++ compilation and linking commands must use both the same build target (`-t`) and the same target platform (`--platform`).
`--platform`	Specifies the path to the target platform.
`-g`	Specifies the addition of debugging logic required to enable debug (for hardware emulation) and to capture waveform data.
`<pl_kernel1>.xo <pl_kernel2>.xo`	Specifies the input compiled PL kernel object files (`.xo`) to link with the AI Engine graph and the target platform.
`../libadf.a`	Specifies the input compiled AI Engine graph application to link with the PL kernels and the target platform.
`-o`	Specifies the device binary (XCLBIN) file that is the output of the linking process.
`--config`	Specifies a configuration file to define some of the compilation or linking options.¹
`--save-temps`	Indicates that the temporary files created during the build process should be preserved for later examination or use. This includes output files created by Vitis HLS and the Vivado Design Suite.
The `--config` option is used to simplify the `v++` command line by moving many commands with extended syntax into a file that can be specified from the command line. For more information, see the Vitis Compiler Configuration file in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

TIP: The config file requirements for the command line are different from the requirements of the Vitis IDE, as discussed in Configuring the HW-Link Project.

For the AI Engine kernel flow, the Vitis compiler requires two specific sections in the configuration file:[connectivity] and [advanced]. The following is an example configuration file.

[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
stream_connect=mm2s.s:ai_engine_0.DataIn1
stream_connect=ai_engine_0.DataOut1:s2mm.s
[advanced]
param=compiler.addOutputTypes=hw_export

The [connectivity] section of the configuration file has options described in the following table.

Table 7. Connectivity Section Options
Option	Description
`nk`	Specifies the number of kernel- instances or CUs the `v++` command adds to the device binary (XCLBIN). The `nk` option specifies the kernel name, the number of instances, or CUs of that kernel, and the CU name for each instance. In the example, `nk=mm2s:1:mm2s` specifies the kernel `mm2s` should only have one instance and that instance should be called `mm2s`. Multiple instances of the kernels are specified as `nk=mm2s:2:mm2s_1.mm2s_2`. This indicates that `mm2s` should have two CUs called `mm2s_1` and `mm2s_2`. For more information, see Creating Multiple Instances of a Kernel in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).
`sc`	Defines AXI4-Stream connections between the ports of the AI Engine graph and the streaming ports of the PL kernels. Connections can be defined as the streaming output of one kernel connecting to the streaming input of a second kernel, or to a streaming input port on an IP implemented in the target platform. For more information, see --connectivity Options in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416). The example `stream_connect=mm2s.s:ai_engine_0.DataIn1` from the config file, defines a connection between the streaming output of the `mm2s` PL kernel, and the `DataIn1` input port of the AI Engine graph. The example `stream_connect=ai_engine_0.DataOut1:s2mm.s` defines a connection between the `DataOut1` output port of the AI Engine graph, to the input port `s` of the PL kernel `s2mm`. For more information, see Specify Streaming Connections Between Compute Units in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

To instruct the Vitis linker to generate an XSA archive of the generated hardware design, add the following parameter to a configuration file.

[advanced]param=compiler.addOutputTypes=hw_export

TIP: The exported XSA is required for building the fixed platform in the bare-metal flow as described in Building a Bare-metal System.

Alternatively, for a custom platform, add the following TCL command prior to invoking write_hw_platform.

set_property compiler.default_output_type=hw_export

which directs the Vitis linker to always generate an XSA for any design.

During the linking process, the Vitis compiler invokes the Vivado Design Suite to generate the device binary (XCLBIN) for the target platform. The XCLBIN file is used to program the device and includes the following information.

PDI: Programming information for the AI Engine array
Debug data: Debug information when included in the build
Memory topology: Defines the memory resources and structure for the target platform
IP Layout: Defines layout information for the implemented hardware design
Metadata: Various elements of platform meta data to let the tool load and run the XCLBIN file on the target platform

For more information on the XRT use of the XCLBIN file, see XRT.

Compile the Embedded Application for the Cortex-A72 Processor

After linking the AI Engine graph and PL kernels, the focus moves to the embedded application running in the PS that interacts with the AI Engine graph and kernels. The PS application is written in C/C++, using API calls to control the initialization, running, and closing of the AI Engine graph as described in Run-Time Graph Control API.

You compile the embedded application by following the typical cross-compilation flow for the Arm Cortex-A72 processor. The following are example commands for compiling and linking the PS application:

aarch64-xilinx-linux-g++ -std=c++14 -O0 -g -Wall -c \
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/ \
-I./ -I./src -I${XILINX_HLS}/include/ -I${XILINX_VITIS}/aietools/include -o sw/host.o sw/host.cpp

aarch64-xilinx-linux-g++ -std=c++14 -O0 -g -Wall -c \
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/ \
-I./ -I./src -I${XILINX_HLS}/include/ -I${XILINX_VITIS}/aietools/include -o sw/aie_control_xrt.o Work/ps/c_rts/aie_control_xrt.cpp

Many of the options in the preceding command are standard and can be found in a description of the g++ command. The more important options are listed as follows.

-std=c++14
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/
-I./ -I./src
-I${XILINX_HLS}/include/
-I${XILINX_VITIS}/aietools/include
-o sw/host.o sw/host.cpp

The cross compiler aarch64-xilinx-linux-g++ is used to compile the Linux host code. aie_control_xrt.cpp is copied from the directory Work/ps/c_rts.

aarch64-xilinx-linux-g++ -ladf_api_xrt -lgcc -lc -lpthread -lrt -ldl \
-lcrypt -lstdc++ -lxrt_coreutil \
-L<platform_path>/sysroots/aarch64-xilinx-linux/usr/lib \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux \
-L${XILINX_VITIS}/aietools/lib/aarch64.o -o sw/host.exe sw/host.o sw/aie_control_xrt.o

Note in the preceding linker script that it links the adf_api_xrt libraries, which is necessary for the ADF API to work with the XRT API.

xrt_coreutil are required libraries for XRT and for the XRT API.

While many of the options can be found in a description of the g++ command, some of the more important options are listed in the following table.

Table 8. Command Options
Option	Description
`-ladf_api_xrt`	Required for the ADF API. For more information, see Host Programming on Linux. This is used to control the AI Engine through XRT. If not controlling with XRT, use `-ladf_api` with the path `-L${XILINX_VITIS}/aietools/lib/aarch64none.so`. For more information see Host Programming for Bare-metal Systems.
`-lxrt_coreutil`	Required for the XRT API.
`-L<platform_path>/sysroots/aarch64-xilinx-linux/usr/lib`
`--sysroot=<platform_path>/aarch64-xilinx-linux`
`-L${XILINX_VITIS}/aietools/lib/aarch64.o`
`-o sw/host.exe`

Packaging

The AI Engine compiler generates output in the form of a library file, libadf.a, which contains ELF and CDO files, as well as tool-specific data and metadata, for hardware and hardware emulation flows. To create a loadable image binary, this data must be combined with PL-based configuration data, boot loaders, and other binaries. The Vitis™ packager performs this function, combining information from libadf.a and the Vitis linker generated XSA file.

This requires the use of the Vitis packaging command (v++ --package) as described in Vitis Compiler Command in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).

For Versal ACAPs, the programmable device image (PDI) file is used to boot and program the hardware device. For hardware emulation the --package command adds the PDI and EMULATION_DATA sections to the XCLBIN file, and outputs a new XCLBIN file. For hardware builds, the package process creates an XCLBIN file containing ELF files and graph configuration data objects (CDOs) for the AI Engine application.

In the Vitis IDE, the package process is automated and the tool creates the required files based on the build target, platform, and OS. However, in the command line flow, you must specify the Vitis packaging command (v++ --package) with the correct options for the job.

Packaging the System

For both hardware and hardware emulation, the v++ --package command takes the XCLBIN file and libadf.a as input, produces a script to launch hardware emulation (launch_hw_emu.sh), and writes the required support files. An example command line follows:

v++ --package --config package.cfg ./aie_graph/libadf.a \
./project.xclbin -o aie_graph.xclbin

where, the --config package.cfg option specifies a configuration file with the following options:

platform=xilinx_vck190_base_202110_1
target=hw_emu
save-temps=1

[package]
boot_mode=sd
out_dir=./emulation
enable_aie_debug=1
rootfs=<path_to_platform>/sw/versal/xilinx-versal-common-v2021.1/rootfs.ext4
image_format=ext4
kernel_image=<path_to_platform>/sw/versal/xilinx-versal-common-v2021.1/Image
sd_file=host.exe

The following table explains the options for both hardware and hardware emulation.

Table 9. Hardware and Hardware Emulation Options
Command-line Flag	Hardware	Hardware Emulation	Details
`platform`	Target platform	Target platform	Either a base platform, or a custom platform that meets AI Engine flow requirements.
`target`	`hw`	`hw_emu`	Specifies the hardware emulation build target. Specifying `hw_emu` as the target causes a number of files to be generated, including the PDI to boot the device, as well as files required for emulation. Specifying `hw` only generates the PDI file required to configure and boot the hardware.
`save-temps`			Causes the Vitis compiler to save intermediate files created during the build and package process.
Package Options
boot_mode¹	sd	sd	Indicates the device boots from an SD card or from a QSPI image in flash memory. Values can be: `sd` or `qspi`.
out-dir	<path>	<path>	Specifies a directory where output files should be created. If `out-dir` is not specified, the files are written to the current working directory.
kernel_image	<path>/Image	<path>/Image	Specifies the image file that is specified as part of the linking command. The file here should be the same for both targets.
rootfs	<path>/rootfs.cpio	<path>/rootfs.cpio	Specifies the path to the Root FS file that is requires as part of the linking command. The file should be the same for both targets.
`enable-aie-debug`			Generate debug features for the AI Engine kernels. This can be used in both hardware and emulation builds.
`defer_aie_run`			The AI Engines will be enabled by the PS application. When unset, generate the CDO commands to enable AI Engines during PDI load instead. Only valid if libadf.a is an input file and the platform is of a Versal platform.
`ps_elf`	`<file>,core`	`<file>,core`	Used only for bare-metal designs. Automatically programs the PS core to run. Example: `host.elf`, `a72-0`
`domain`	`aiengine`	`aiengine`	Specifies the domain to be run. For AI Engine designs, this should always be `aiengine`.
`sd_file`	`<file>`	`<file>`	Copies the ELF for the main application that will run on the Cortex-A72 processor for bare metal, and any files needed to run on Linux. The XCLBIN file is automatically copied to the out-dir or sd_card folder. To have more files copied to the sd_card folder, you must specify this option multiple times.
The `xilinx_vck190_v202110_1` platform does not support the `qspi` option. Custom platforms that are configured to support it will work.

The following table shows the output defined by -out-dir produced when building for both hardware and hardware emulation.

Build Output

Hardware

Table 10. Table of Outputs
Build	Output
Hardware	\|-- BOOT.BIN \|-- boot_image.bif \|-- sd_card \| \|-- BOOT.BIN \| \|-- boot.scr \| \|-- aie_graph.xclbin \| \|-- host.exe \| \|-- Image \| \|-- init.sh \| `-- platform_desc.txt \|-- sd_card.img
Hardware Emulation	\|-- BOOT_bh.bin //Boot header \|-- BOOT.BIN //Boot File \|-- boot_image.bif \|-- launch_hw_emu.sh //Hardware emulation launch script \|-- libadf //AIE emulation data folder \| `-- cfg \| \|-- aie.control.config.json \| \|-- aie.partial.aiecompile_summary \| \|-- aie.shim.solution.aiesol \| \|-- aie.sim.config.txt \| `-- aie.xpe \|-- plm.bin //PLM boot file \|-- pmc_args.txt //PMC command argument specification file \|-- pmc_cdo.bin //PMC boot file \|-- qemu_args.txt //QEMU command argument specification file \|-- sd_card \| \|-- BOOT.BIN \| \|-- boot.scr \| \|-- aie_graph.xclbin \| \|-- host.exe \| \|-- Image \| \|-- init.sh \| `-- platform_desc.txt \|-- sd_card.img `-- sim //Vivado simulation folder

|-- BOOT.BIN
|-- boot_image.bif
|-- sd_card
|   |-- BOOT.BIN
|   |-- boot.scr
|   |-- aie_graph.xclbin
|   |-- host.exe
|   |-- Image
|   |-- init.sh
|   `-- platform_desc.txt
|-- sd_card.img

Hardware Emulation

|-- BOOT_bh.bin	//Boot header
|-- BOOT.BIN			 //Boot File
|-- boot_image.bif
|-- launch_hw_emu.sh	   //Hardware emulation launch script
|-- libadf                  //AIE emulation data folder
|   `-- cfg
|       |-- aie.control.config.json
|       |-- aie.partial.aiecompile_summary
|       |-- aie.shim.solution.aiesol
|       |-- aie.sim.config.txt
|       `-- aie.xpe
|-- plm.bin                 //PLM boot file
|-- pmc_args.txt            //PMC command argument specification file
|-- pmc_cdo.bin             //PMC boot file
|-- qemu_args.txt           //QEMU command argument specification file
|-- sd_card
|   |-- BOOT.BIN
|   |-- boot.scr
|   |-- aie_graph.xclbin
|   |-- host.exe
|   |-- Image
|   |-- init.sh
|   `-- platform_desc.txt
|-- sd_card.img
`-- sim                      //Vivado simulation folder

For hardware emulation, the key output file is the launch_hw_emu.sh script used to launch emulation. The sd_card.img image includes the BOOT.BIN (U-Boot to boot Linux, PDI boot data, etc.), Image (kernel image), XCLBIN file, user application (host.exe), and other files. For example, all generated files are placed in a folder called emulation.

To use the sd_card.img file on a Linux host, use the dd command to write the image to the SD card. If you are targeting Linux but with package.image_format=fat32, copy the sd_card folder to an SD card formatted for FAT32. This is not needed for hardware emulation.

TIP: The PS host application is included in the sd_card output, however, it is not incorporated into the rootfs . If you want to include the executable images in rootfs , you must rebuild the rootfs before running the v++ --package command.

If the design needs to be programmed to a local flash memory, make sure --package.boot_mode qspi is used. This allows the use of the program_flash command or the use of the Vitis IDE to program the device or program the flash memory, described in Using the Vitis IDE.

Building a Bare-metal System

Building a bare-metal system requires a few additional steps from the standard application flow previously described. The specific steps required are described here.

Build the bare-metal platform.
Building bare-metal applications requires a bare-metal domain in the platform. The base platform xilinx_vck190_base_202110_1 does not have a bare-metal domain, which mean you must create a platform with one. Starting from the v++ linking process as described in Linking the System, you must create a custom platform because the PS application needs drivers for the PL kernels in the design.
Use the XSA generated during the link process to create a new platform using the following command:
```
generate-platform.sh -name vck190_baremetal -hw <filename>.xsa \
							-domain psv_cortexa72_0:standalone
```
where:
- -name vck190_baremetal: Specifies a name for the platform that will be created. The platform will be created according to the specified name. In this example it will be written to: ./vck190_baremetal/export/vck190_baremetal
- -hw <filename>.xsa: Specifies the name of the input XSA file generated during the v++ --link command. The <filename> will be the same as the file name specified for the .xclbin output.
- -domain psv_cortexa72_0:standalone: Specifies the processor domain and operating system to apply to the new platform.
You can add the new platform to your platform repository by adding the file location to your $PLATFORM_REPO_PATHS environment variable. This makes it accessible to the Vitis IDE for instance, or allows you to specify the platform in command-lines by simply referring to the name rather than the whole path.
IMPORTANT: The generated platform will be used only for building the bare-metal PS application and is not used any other places across the flow.
Compile and link the PS application.
To build the PS application for the bare-metal flow, use the platform generated in the prior step. You need the PS application (main.cpp), and the bare-metal AI Engine control file (aie_control.cpp), which is created by the aiecompiler command and can be found in the ./Work/ps/c_rts folder.
Compile the main.cpp file using the following command:
```
aarch64-none-elf-gcc -I.. -I. -I../src \
-I./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include \
-g -c -std=c++11 -o main.o main.cpp
```
Note: You must include the BSP include files for the generated platform, located at: ./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include
Compile the aie_control.cpp file using the following command:
```
aarch64-none-elf-gcc -I.. -I. -I../src \
-I./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include \
-g -c -std=c++11 -o aie_control.o ../Work/ps/c_rts/aie_control.cpp
```
Link the PS application using the two compiled object files:
```
aarch64-none-elf-gcc main.o aie_control.o -g -mcpu=cortex-a72 -Wl,-T -Wl,./lscript.ld \

-L./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bsplib/lib  \
-ladf_api -Wl,--start-group,-lxil,-lgcc,-lc,-lstdc++,--end-group -o main.elf
```
Note: You also need the BSP libxil.a located at ./vck190_baremetal/export/vck190_baremetal/standalone_domain/bsplib/lib during linking. Here the assumption is the AI Engine are enabled during the platform management controller (PMC) boot.
Package the System
Finally, you must run the package process to generate the final boot-able image (PDI) for running the design on the bare-metal platform. This command produces the SD card content for booting the device and running the application. Refer to Packaging for more information. This requires the use of the v++ --package command as shown below:
```
v++ -p -t hw \
    -f xilinx_vck190_base_202110_1 \
    libadf.a project.xclbin \
    --package.out_dir ./sd_card \
    --package.domain aiengine \
    --package.defer_aie_run \
    --package.boot_mode sd \
    --package.ps_elf main.elf,a72-0 \
    -o aie_graph.xclbin
```
TIP: For bare-metal ELF files running on PS cores, you should also add the package.ps_elf option to the --package command.
The use of --package.defer_aie_run is related to the way the AI Engine graph is run. If the application is loaded and launched at boot time, these options are not required. If your host application launches and controls the graph, then you need to use these options when compiling and packaging your system as described in Deploying the System.
The ./sd_card folder, specified by the --out_dir option, contains the following files produced for the hardware build:
```
|-- BOOT.BIN	//BOOT.BIN file containing PDI and the application ELF
|-- boot_image.bif	  //bootgen input file used to create BOOT.BIN
`-- sd_card              //SD card folder
    |-- aie_graph.xclbin     //xclbin output file (not used)
    `-- BOOT.BIN         //BOOT.BIN file containing PDI and the application ELF
```
Copy the contents of the sd_card folder to an SD card to create a boot device for your system.

Now you have built the bare-metal system, you can run it or debug it.

Running the System

Running the system depends on the build target. The process of running the hardware emulation build is different from running the hardware build.

For the hardware build, copy the contents of the sd_card folder produced by the package process to an actual SD card. That device becomes the boot device for your system. Boot your system and launch your application as designed. To capture event trace data when running the hardware, see Performance Analysis of AI Engine Graph Application. To debug the running hardware, see Debugging the AI Engine Application.

Running Hardware Emulation

To build the project for hardware emulation confirm that the target option of the V++ link command is target=hw_emu. Next, the v++ --package command generates the launch_hw_emu.sh script as part of the process for packaging the system. This script launches the emulation environment for the AI Engine application for test and debug purposes. Hardware emulation runs the AI Engine simulator for the graph application, runs the Vivado logic simulator for the PL kernels, and runs QEMU for the PS host application.

Use the following command to launch hardware emulation from the command line.

./launch_hw_emu.sh --graphic-xsim

Note: The --graphic-xsim switch is optional and launches the Vivado logic simulator window where you can specify what signals from the design you want to view. It does not include internal AI Engine signals. Here, you must click the Run All button in the window to continue execution.

The launch_hw_emu.sh script launches QEMU in system mode, and loads and runs the AI Engine application, running the PL kernels in the Vivado simulator. If the emulation flow completes successfully, at the end of the emulation you should see something like the following:

[LAUNCH_EMULATOR] INFO: 09:44:09 : PS-QEMU exited
[LAUNCH_EMULATOR] INFO: 09:44:09 : PMU/PMC-QEMU exited
[LAUNCH_EMULATOR] INFO: 09:44:09 : Simulation exited
pmu_path /scratch/aie_test1/hw_emu_pmu.log
pl-sim_dir /scratch/aie_test1/sim/behav_waveform/xsim
Please refer PS /simulate logs at /scratch/aie_test1 for more details.
DONE!
INFO: Emulation ran successfully

When launching hardware emulation, you can specify options for the AI Engine simulator that runs the graph application. The options can be specified from the launch_hw_emu.sh script using the -aie-sim-options as described in Simulator Options for Hardware Emulation.

When the emulation is fully booted and the Linux prompt is up, make sure to set the following environment variable.

export XILINX_XRT=/usr

This ensures that the host application works. Note that this also must be done when running on hardware.

Generating Traffic for HW Emulation

This section describes how to provide input and capture the output from the AI Engine array in hardware emulation using AXI traffic generators. In the AI Engine simulator the input data stimulus was provided using the simulation platform construct.

PLIO("DataIn", adf::plio_32_bits, "data/input.txt")

For hardware emulation an equivalent feature exists that emulates the behavior of this PLIO and AXI4-Stream interface. Both Python and C++ APIs are provided to make this easier to use.

The primary external data interfaces for the AI Engine array are AXI4-Stream interfaces. These are known as PLIOs and allow the AI Engine to receive data, operate on the data, and send data back on a separate AXI4-Stream interface. The input interface to the AI Engine is an AXI4-Stream slave and the output is an AXI4-Stream master. To interact with these top level interfaces during hardware emulation complementary AXI4-Stream modules are provided. These complementary modules are referred to as the AXI traffic generators.

Note: The width of a PLIO interface is an important system level design decision. The wider the interface the more data can be sent per PL clock cycle.

AXI Traffic Generators

The AXI Traffic generators are provided as XO files which need to be linked to your simulation platform using the Vitis compiler (v++). These XO files are called sim_ipc_axis_master_XY.xo and sim_ipc_axis_slave_ZW.xo where XY and ZW correspond to the number of bits in the PLIO interface. For example sim_ipc_axis_master_128.xo provides an AXI4-Stream master data bus that is 128 bits wide. A wider interface allows the PL to achieve the same throughput at a lower clock frequency and allows the AI Engine array to maximize its memory bandwidth. However, the PLIO interface tiles are each 64 bits wide and they are a limited resource. Using one 64-bit PLIO interface at twice the clock speed provides an equivalent bandwidth to a 128-bit PLIO while using only one PLIO tile. This requires the PL to run at twice the clock speed and the optimal choice will vary from application to application.

Two steps are required to use the traffic generators with the Vitis compiler. First, make the connections between the sim_ipc modules and their corresponding AXI4-Stream ports on the AI Engine array. This is typically done in the system.cfg file. Here is an example:

[connectivity]
nk=sim_ipc_axis_master:1:inst_sim_ipc_axis_master
nk=sim_ipc_axis_slave:1:inst_sim_ipc_axis_slave
stream_connect=sim_ipc_axis_master.M00_AXIS:ai_engine_0.DataIn
stream_connect=ai_engine_0.DataOut:sim_ipc_axis_slave.S00_AXIS

The syntax for connecting the sim_ipc_axis XO files is as follows.

nk=sim_ipc_axis_master:<Number Of Masters>:<your_instance_name_1>
nk=sim_ipc_axis_slave:<Number Of Slaves>:<your_instance_name_2>

The sim_ipc_axis_master/slave specifies the type of XO file and the instance name should be meaningful to your application.

Next, add the XO files to the Vitis link command. Note the sim_ipc XO files can only be used with target hw_emu.

v++ -l --platform <platform.xpfm> sim_ipc_axis_master_128.xo sim_ipc_axis_slave_128.xo libadf.a -target hw_emu --config system.cfg

For additional information on how to use XO files with the Vitis compiler see https://github.com/Xilinx/Vitis-Tutorials/tree/master/AI_Engine_Development/Feature_Tutorials/05-AI-engine-versal-integration.

Note: To use multiple AXI4-Stream masters at the same time change the Number of Masters field in the system.cfg file from 1 to as many as needed (up to 8).

Formatting Data with Traffic Generators in Python

To emulate AXI4-Stream transactions AXI Traffic Generators require the payload data to be broken into appropriately sized bursts. For example, to send 128 bytes with a PLIO width of 32 bits (4 bytes) requires 128 bytes/4 bytes = 32AXI4-Stream transactions. Converting between bytes arrays and AXI transactions can be handled in Python.

The Python struct library provides a mechanism to convert between Python and C data types. Specifically, the struct.pack and struct.unpack functions pack and unpack byte arrays according to a format string argument. The following table shows format strings for common C data types and PLIO widths.

For more information see: https://docs.python.org/3/library/struct.html


Data Type	PLIO Width	Python Code Snippet
cfloat	PLIO32	N/A
	PLIO64	`rVec = np.real(data)` `iVec = np.imag(data)` `out2column = np.zeros((L,2)).astype(np.single)` `out2column.tobytes()` `formatString = "<"+str(len(byte_arry)//4)+"f"`
	PLIO128
cint16	PLIO32	`rVec = np.real(data).astype(np.int16)` `iVec = np.imag(data).astype(np.int16)` `formatString = "<"+str(len(byte_arry)//2)+"h"`
	PLIO64
	PLIO128
int8	PLIO32	`intvec = np.real(data).astype(np.int8)` `formatString = "<"+str(len(byte_arry)//1)+"b"`
	PLIO64
	PLIO128
int32	PLIO32	`intvec = np.real(data).astype(np.int32)` `formatString = "<"+str(len(byte_arry)//4)+"i"`
	PLIO64
	PLIO128

The remaining aspects of Python libraries, interacting with the sim_ipc Python object and providing and receiving data are beyond the scope of this document.

A significant benefit of this feature is that it enables you to integrate your AI Engine design with a larger system while also minimizing the amount of PS code required. This is useful during development where not all domains of the system are ready to integrate.

Because the data source and sink are kept completely within the simulated PL domain the host only needs to provide setup and control functionality. For example the main in a minimal host.cpp might look like the following.


int main(int argc, char ** argv)
{
    //////////////////////////////////////////
    // Open xclbin
    //////////////////////////////////////////
    auto dhdl = xrtDeviceOpen(0); // Open Device the local device
    if(dhdl == nullptr)
        throw std::runtime_error("No valid device handle found. Make sure using right xclOpen         index.");
    auto xclbin = load_xclbin(dhdl, "a.xclbin");
    auto top = reinterpret_cast(xclbin.data());
    adf::registerXRT(dhdl, top->m_header.uuid);

    //////////////////////////////////////////
    // graph execution for AIE
    ////////////////////////////////////////// 

    printf("graph init.\n");
    mygraph_top.init();

    printf("graph run\n");
    mygraph_top.run(1);

    mygraph_top.end();
    printf("graph end\n");
    xrtDeviceClose(dhdl);

    return 0;
}

Deploying the System

The Vitis design execution model has multiple considerations that impact how the AI Engine graph is loaded onto the board, run, reset, and reloaded. Depending on the needs of the application you have a choice of loading the AI Engine graph at board boot up time, or using the PS host application. In addition, you can also control running the graph as soon as the graph is loaded or defer it to a later time. You also have the option of running the graph infinitely or for a fixed number of iterations or cycles.

AI Engine Graph Load and Run

The AI Engine graph can be loaded and run immediately at boot, or it can be loaded by the host PS application. Additionally, you also have the option of deferring the running of the graph after the graph has been loaded using the graph.run() host API XRT call. By default, the Xilinx® platform management controller (PMC) loads and runs the graph. However the v++ --package.defer_aie_run option will let you defer the graph run until after the graph has been loaded using the graph.run() API call. The following table lists the deployment options.

Table 11. Deploying the AI Engine Graph
Host Control	Run Forever
Specify `v++ --package.defer_aie_run` to stop the AI Engine from starting at boot-up.	Enable it in the PDI and let the graph run forever.
Enable the graph from the PS program using `graph.run()`	Enable it in the PDI and let the graph run forever.

AI Engine Run Iterations

The AI Engine graph can run for a limited number of iterations or infinitely. By default, the graph runs infinitely. You can use the graph.run(run_iterations) or graph.end(cycles) to limit the number of graph runs to a specific number of iterations or for a specific number of cycles. See Run-Time Graph Control API.