Integrating the Application Using the Vitis Tools Flow
While developing an AI Engine design graph, many design iterations are typically performed using the AI Engine compiler or AI Engine simulator tools. This method provides quick design iterations when focused on developing the AI Engine application. When ready, the AI Engine design can be integrated into a larger system design using the flow described in this chapter.
The Vitis™ tools flow simplifies hardware design and integration with a software-like compilation and linking flow, integrating the three domains of the Versal™ device: the AI Engine array, the programmable logic (PL) region, and the processing system (PS). The Vitis compiler flow lets you integrate your compiled AI Engine design graph (libadf.a) with additional kernels implemented in the PL region of the device, including HLS and RTL kernels, and link them for use on a target platform. You can call these compiled hardware functions from a host program running in the Arm® processor in the Versal device.
The following steps can be adapted to any AI Engine design in a Versal device.
- As described in Compiling an AI Engine Graph Application, the first step is to create and compile the AI Engine graph into a libadf.a file using the AI Engine compiler. You can iterate between the AI Engine compiler, and the AI Engine simulator to develop the graph, until you are ready to proceed.
- Compiling PL Kernels: PL kernels are compiled
for implementation in the PL region of the target platform using the
v++ --compile
command. These kernels can be C/C++ kernels or RTL kernels, in compiled Xilinx object (xo) form. - Linking the System: Link the compiled AI Engine graph with the C/C++ kernels and RTL kernels onto a target platform. The process creates an XCLBIN file to load and run an AI Engine graph and PL kernels code in the target platform.
- Compile the Embedded Application for the Cortex-A72 Processor: Optionally compile a host application to run on the Cortex®-A72 core processor using the GNU Arm cross-compiler to create an ELF file. The host program interacts with the AI Engine kernels and kernels in the PL region. This compilation step is optional because there are several ways to deploy and interact with the AI Engine kernels, and the host program running in the PS is one way.
- Packaging the system: Use the
v++ --package
process to gather the required files to configure and boot the system, to load and run the application, including the AI Engine graph and PL kernels. This builds the necessary package to run emulation and debug, or run your application on hardware.
Platforms
A platform is a fully contained image that defines both the hardware (XSA) as well as the software (bare metal, Linux, or both). The XSA contains the hardware description of the platform, which is defined in the Vivado Design Suite, and the software is defined with the use of a bare-metal setup, or a Linux image defined through PetaLinux.
Types of Platforms
There are two types of platforms: base platform, or custom platform. A base
platform is one that is provided by Xilinx (for
example, the xilinx_vck190_base_202110_1
)
typically targeting Xilinx boards, and a custom
platform is one that you can create, either by extending or re-customizing a base
platform or creating a new platform. When starting platform development, it can be
useful to use a base platform as a reference development platform to create your
custom platform.
Custom Platforms
You can create platforms that can re-customize an existing base platform (for example, changing the AI Engine clock frequency, clocks available in the programmable logic (PL), changing memory controller settings) or create a new platform targeting Xilinx or non-Xilinx boards. Creating a platform allows you to provide your own IP or subsystems to meet your needs. The process to create a platform can be found in Creating Embedded Platforms in Vitis in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).
Platform Clocking
Platforms have a variety of clocking: processor, PL, and AI Engine clocking. The following table explains the clocking for each.
Clock | Description |
---|---|
AI Engine | Can be configured in the platform in the AI Engine IP. |
Processor | Can be configured in the platform in the CIPS IP. |
Programmable Logic (PL) | Can have multiple clocks and can be configured in the platform. |
NoC | Device dependent and can be configured in the platform in the CIPS and NoC IP. |
|
For more information related to platform clocking, see Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393). For information on Versal device clocks, see Versal AI Core Series Data Sheet: DC and AC Switching Characteristics (DS957).
PL Kernels
PL kernels can take the form of HLS kernels, written in C/C++, or RTL kernels packaged in the Vivado Design Suite. These kernels must be separately compiled to produce the Xilinx object files (XO) used in integrating the system design on the target platform.
HLS kernels, written in C/C++, can be written and compiled from within the Vitis HLS tool directly, or as part of the Vitis application acceleration development flow.
For information on creating and building RTL kernels, see RTL Kernels in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416).
Compiling PL Kernels
To compile kernels using the Vitis compiler command as described in the Compiling Kernels with Vitis Compiler in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416), use the following command syntax:
v++ --compile -t hw_emu --platform xilinx_vck190_base_202110_1 -g \
-k <kernel_name> <kernel>.cpp -o <kernel_name>.xo --save-temps
The v++
command uses the options described
in the following table.
Option | Description |
---|---|
--compile |
Specifies compilation mode. |
-t hw_emu |
Specifies the build target for the compilation process. For more information, see the Build Targets section in Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393). |
--platform |
Specifies the path and name of the target platform. For this example command-line option, it is assumed to have PLATFORM_REPO_PATHS set to the right platform path. |
-g |
Enables the debug features. This is required for emulation modes. |
-k |
Specifies the kernel name. This must match the function name in the specified kernel source file. |
-o |
Specifies the output file name of the compiled Xilinx object file (.xo). |
--save-temps |
Saves the temporary files generated during the compilation process. This is optional. |
Clocking the PL Kernels
PLIO represents an ADF graph interface to the PL. This PL could be a PL
kernel, a platform IP representing a signal source or sink, or it could be a data
mover to interface the ADF graph to memory. You should provide clock frequency
values for these interfaces to ensure simulation results match the results from
running the design in hardware. In addition, when you link the ADF graph into the
platform, at the Vitis linker (v++ -link
) step, you have the ability to provide more
accurate clock values depending on the specific clock frequency values supported by
the platform. To set the exact frequency of a PLIO interface in the graph and the
clock frequency of the corresponding PL kernel you must specify the clock frequency
in three locations:
- ADF graph (Optional)
-
Vitis compilation of a PL kernel (
v++ -c
) - Vitis linking (
v++ -l
)
You must specify the clocking depending on where the kernels are located. The following table describes the default clocks based on the kernel location.
Kernel Location | Description |
---|---|
AI Engine kernels | Clocked per the AI Engine clock frequency. All cores run with the same clock frequency. |
PL kernels connected to AI Engine graph | HLS: Default frequency for all HLS kernels - 150 MHz RTL: Frequency is set to the frequency that the XO file was compiled with. AI Engine: Set in the PLIO constructor in the AI Engine graph. Setting the frequency here is optional. See Adaptive Data Flow Graph Specification Reference for more information.1 |
PL kernels added to platform using the Vitis linker | Platforms have a default clock. If no clocking option is set at the command line or configuration file the default clock is used. This default can be overridden depending on the design and required clock value, as shown in the following table. |
|
Setting the clocks at the Vitis linker step allows you to choose a frequency based on the platform. The following table describes the Vitis compiler clocking options during the link step.
[clock]
Options |
Description |
---|---|
--clock.defaultFreqHz
arg
|
Specify a default clock frequency to use in Hz. |
--clock.defaultId
arg |
Specify a default clock reference ID to use. |
--clock.defaultTolerance
arg
|
Specify a default clock tolerance to use. |
--clock.freqHz
arg
|
<frequency_in_Hz>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]
Specify a clock frequency in Hz and a list of associated compute unit names and optionally their clock pins. |
--clock.id
arg |
<reference_ID>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]
Specify a clock reference ID and a list of associated compute unit names and optionally their clock pins. |
--clock.tolerance
arg |
<tolerance>:<cu_0>[.<clk_pin_0>][,<cu_n>[.<clk_pin_n>]]
Specify a clock tolerance and a list of associated compute unit names and optionally their clock pins. |
The following table describes the steps to set clock frequencies for PLIOs that interface to the platform, including to PL kernels specified outside of the ADF graph.
PL Kernel Location | Clock Specification |
---|---|
PLIO interface specified in ADF graph | Specify the clock frequency per PLIO interface in the graph. For the
PLIO interface (you can optionally specify
|
HLS kernels | Compile the HLS code using the Vitis compiler.
To
change the frequency at which HLS kernels are compiled use:
Per
kernel, specify the clock in the Vitis
linker.
|
RTL kernels | Per kernel, specify the clock in the Vitis
linker.
|
See Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393) for more detailed information on how to compile kernels for specific platform clocks and clocking information.
Linking the System
After the AI Engine graph and the C/C++
kernels are compiled, and any RTL kernels are packaged, the Vitis
v++ --link
command links them with the target
platform to build the device binary (XCLBIN), used to program the hardware. For more
information, see Linking the Kernels in the
Application Acceleration Development flow of the Vitis Unified Software Platform Documentation
(UG1416).
The following is an example of the linking command for the Vitis compiler in the AI Engine design flow.
v++ --link -t hw_emu --platform xilinx_vck190_base_202110_1 -g \
<pl_kernel1>.xo <pl_kernel2>.xo ../libadf.a -o vck190_aie_graph.xclbin \
--config ../system.cfg --save-temps
The v++
command uses the options
in the following table.
Option | Description |
---|---|
--link |
Specifies the linking process. |
-t
hw_emu |
Specifies the build target of the link process.
For the AI Engine kernel flow,
the target can be either hw_emu
for emulation and test, or hw to
build the system hardware.IMPORTANT: The v++ compilation and
linking commands must use both the same build target ( -t ) and the same target platform
(--platform ). |
--platform |
Specifies the path to the target platform. |
-g |
Specifies the addition of debugging logic required to enable debug (for hardware emulation) and to capture waveform data. |
<pl_kernel1>.xo
<pl_kernel2>.xo |
Specifies the input compiled PL kernel object
files (.xo ) to link with the
AI Engine graph and the
target platform. |
../libadf.a |
Specifies the input compiled AI Engine graph application to link with the PL kernels and the target platform. |
-o |
Specifies the device binary (XCLBIN) file that is the output of the linking process. |
--config |
Specifies a configuration file to define some of the compilation or linking options.1 |
--save-temps |
Indicates that the temporary files created during the build process should be preserved for later examination or use. This includes output files created by Vitis HLS and the Vivado Design Suite. |
|
For the AI Engine kernel flow,
the Vitis compiler requires two specific
sections in the configuration file:[connectivity]
and [advanced]
. The following is an example
configuration file.
[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
stream_connect=mm2s.s:ai_engine_0.DataIn1
stream_connect=ai_engine_0.DataOut1:s2mm.s
[advanced]
param=compiler.addOutputTypes=hw_export
The [connectivity]
section of the
configuration file has options described in the following table.
Option | Description |
---|---|
nk |
Specifies the number of kernel- instances or
CUs the v++ command adds to the
device binary (XCLBIN).The Multiple instances of
the kernels are specified as |
sc |
Defines AXI4-Stream connections
between the ports of the AI Engine graph and the streaming ports of the PL kernels. Connections can
be defined as the streaming output of one kernel connecting to the
streaming input of a second kernel, or to a streaming input port on
an IP implemented in the target platform. For more information, see
--connectivity Options in
the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation
(UG1416). The example The example |
To instruct the Vitis linker to generate an XSA archive of the generated hardware design, add the following parameter to a configuration file.
[advanced]param=compiler.addOutputTypes=hw_export
Alternatively, for a custom platform, add the following TCL command prior to
invoking
write_hw_platform
.
set_property compiler.default_output_type=hw_export
which directs the Vitis linker to always generate an XSA for any design.
During the linking process, the Vitis compiler invokes the Vivado Design Suite to generate the device binary (XCLBIN) for the target platform. The XCLBIN file is used to program the device and includes the following information.
- PDI
- Programming information for the AI Engine array
- Debug data
- Debug information when included in the build
- Memory topology
- Defines the memory resources and structure for the target platform
- IP Layout
- Defines layout information for the implemented hardware design
- Metadata
- Various elements of platform meta data to let the tool load and run the XCLBIN file on the target platform
For more information on the XRT use of the XCLBIN file, see XRT.
Compile the Embedded Application for the Cortex-A72 Processor
After linking the AI Engine graph and PL kernels, the focus moves to the embedded application running in the PS that interacts with the AI Engine graph and kernels. The PS application is written in C/C++, using API calls to control the initialization, running, and closing of the AI Engine graph as described in Run-Time Graph Control API.
You compile the embedded application by following the typical cross-compilation flow for the Arm Cortex-A72 processor. The following are example commands for compiling and linking the PS application:
aarch64-xilinx-linux-g++ -std=c++14 -O0 -g -Wall -c \
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/ \
-I./ -I./src -I${XILINX_HLS}/include/ -I${XILINX_VITIS}/aietools/include -o sw/host.o sw/host.cpp
aarch64-xilinx-linux-g++ -std=c++14 -O0 -g -Wall -c \
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/ \
-I./ -I./src -I${XILINX_HLS}/include/ -I${XILINX_VITIS}/aietools/include -o sw/aie_control_xrt.o Work/ps/c_rts/aie_control_xrt.cpp
Many of the options in the preceding command are standard and can be found in
a description of the g++
command. The more
important options are listed as follows.
-std=c++14
-I<platform_path>/sysroots/aarch64-xilinx-linux/usr/include/xrt
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux/
-I./ -I./src
-I${XILINX_HLS}/include/
-I${XILINX_VITIS}/aietools/include
-o sw/host.o sw/host.cpp
The cross compiler aarch64-xilinx-linux-g++
is used to compile the Linux host code. aie_control_xrt.cpp is copied from the directory Work/ps/c_rts.
aarch64-xilinx-linux-g++ -ladf_api_xrt -lgcc -lc -lpthread -lrt -ldl \
-lcrypt -lstdc++ -lxrt_coreutil \
-L<platform_path>/sysroots/aarch64-xilinx-linux/usr/lib \
--sysroot=<platform_path>/sysroots/aarch64-xilinx-linux \
-L${XILINX_VITIS}/aietools/lib/aarch64.o -o sw/host.exe sw/host.o sw/aie_control_xrt.o
Note in the preceding linker script that it links the adf_api_xrt libraries, which is necessary for the ADF API to work with the XRT API.
xrt_coreutil
are required libraries for XRT
and for the XRT API.
While many of the options can be found in a description of the g++
command, some of the more important options are
listed in the following table.
Option | Description |
---|---|
-ladf_api_xrt |
Required for the ADF API. For more information, see Host Programming on Linux. This is used to control the AI Engine through XRT. If not
controlling with XRT, use |
-lxrt_coreutil |
Required for the XRT API. |
-L<platform_path>/sysroots/aarch64-xilinx-linux/usr/lib |
|
--sysroot=<platform_path>/aarch64-xilinx-linux |
|
-L${XILINX_VITIS}/aietools/lib/aarch64.o |
|
-o
sw/host.exe |
Packaging
The AI Engine compiler generates output in the form of a library file, libadf.a, which contains ELF and CDO files, as well as tool-specific data and metadata, for hardware and hardware emulation flows. To create a loadable image binary, this data must be combined with PL-based configuration data, boot loaders, and other binaries. The Vitis™ packager performs this function, combining information from libadf.a and the Vitis linker generated XSA file.
This requires the use of the Vitis packaging command (v++
--package
) as described in Vitis Compiler Command in
the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation
(UG1416).
For Versal ACAPs, the
programmable device image (PDI) file is used to boot and program the hardware
device. For hardware emulation the --package
command adds the PDI and EMULATION_DATA sections to the XCLBIN file, and outputs a
new XCLBIN file. For hardware builds, the package process creates an XCLBIN file
containing ELF files and graph configuration data objects (CDOs) for the AI Engine application.
In the Vitis IDE, the package
process is automated and the tool creates the required files based on the build
target, platform, and OS. However, in the command line flow, you must specify the
Vitis packaging command (v++ --package
) with the correct options for the
job.
Packaging the System
For both hardware and hardware emulation, the v++
--package
command takes the XCLBIN file and libadf.a as input, produces a script to launch hardware emulation
(launch_hw_emu.sh), and writes the
required support files. An example command line follows:
v++ --package --config package.cfg ./aie_graph/libadf.a \
./project.xclbin -o aie_graph.xclbin
where, the --config package.cfg
option specifies a configuration file with the following options:
platform=xilinx_vck190_base_202110_1
target=hw_emu
save-temps=1
[package]
boot_mode=sd
out_dir=./emulation
enable_aie_debug=1
rootfs=<path_to_platform>/sw/versal/xilinx-versal-common-v2021.1/rootfs.ext4
image_format=ext4
kernel_image=<path_to_platform>/sw/versal/xilinx-versal-common-v2021.1/Image
sd_file=host.exe
The following table explains the options for both hardware and hardware emulation.
Command-line Flag | Hardware | Hardware Emulation | Details |
---|---|---|---|
platform |
Target platform | Target platform | Either a base platform, or a custom platform that meets AI Engine flow requirements. |
target
|
hw |
hw_emu |
Specifies the hardware emulation build target.
Specifying hw_emu as the target
causes a number of files to be generated, including the PDI to boot
the device, as well as files required for emulation. Specifying
hw only generates the PDI
file required to configure and boot the hardware. |
save-temps |
Causes the Vitis compiler to save intermediate files created during the build and package process. | ||
Package Options | |||
boot_mode1 | sd | sd | Indicates the device boots from an SD card or from a QSPI image in
flash memory. Values can be: sd
or qspi . |
out-dir | <path> | <path> | Specifies a directory where output files should be created. If
out-dir is not specified, the files are written
to the current working directory. |
kernel_image | <path>/Image | <path>/Image | Specifies the image file that is specified as part of the linking command. The file here should be the same for both targets. |
rootfs | <path>/rootfs.cpio | <path>/rootfs.cpio | Specifies the path to the Root FS file that is requires as part of the linking command. The file should be the same for both targets. |
enable-aie-debug |
Generate debug features for the AI Engine kernels. This can be used in both hardware and emulation builds. | ||
defer_aie_run |
The AI Engines will be enabled by the PS application. When unset, generate the CDO commands to enable AI Engines during PDI load instead. Only valid if libadf.a is an input file and the platform is of a Versal platform. | ||
ps_elf |
<file>,core |
<file>,core |
Used only for bare-metal designs. Automatically programs the PS core
to run. Example: host.elf ,
a72-0 |
domain |
aiengine |
aiengine |
Specifies the domain to be run. For AI Engine
designs, this should always be aiengine . |
sd_file |
<file> |
<file> |
Copies the ELF for the main application that will run on the Cortex-A72 processor for bare metal, and any files needed to run on Linux. The XCLBIN file is automatically copied to the out-dir or sd_card folder. To have more files copied to the sd_card folder, you must specify this option multiple times. |
|
The following table shows the output defined by -out-dir
produced when building for both hardware and
hardware emulation.
Build | Output |
---|---|
Hardware |
|
Hardware Emulation |
|
For hardware emulation, the key output file is the launch_hw_emu.sh script used to launch emulation.
The sd_card.img image includes the BOOT.BIN (U-Boot to boot Linux, PDI boot data,
etc.), Image (kernel image), XCLBIN file, user application (host.exe), and other files. For example, all
generated files are placed in a folder called emulation
.
To use the sd_card.img file on a Linux host, use the dd command to write the image to the SD card. If you are targeting Linux but with package.image_format=fat32, copy the sd_card folder to an SD card formatted for FAT32. This is not needed for hardware emulation.
sd_card
output, however, it is
not incorporated into the rootfs
. If you want to
include the executable images in rootfs
, you
must rebuild the rootfs
before running the
v++ --package
command.If the design needs to be programmed to a local flash memory, make sure
--package.boot_mode qspi
is used. This allows
the use of the program_flash
command or the use
of the Vitis IDE to program the device or program the flash
memory, described in Using the Vitis IDE.
Building a Bare-metal System
- Build the bare-metal platform.
Building bare-metal applications requires a bare-metal domain in the platform. The base platform
xilinx_vck190_base_202110_1
does not have a bare-metal domain, which mean you must create a platform with one. Starting from thev++
linking process as described in Linking the System, you must create a custom platform because the PS application needs drivers for the PL kernels in the design.Use the XSA generated during the link process to create a new platform using the following command:
generate-platform.sh -name vck190_baremetal -hw <filename>.xsa \ -domain psv_cortexa72_0:standalone
where:
-name vck190_baremetal
: Specifies a name for the platform that will be created. The platform will be created according to the specified name. In this example it will be written to: ./vck190_baremetal/export/vck190_baremetal-hw <filename>.xsa
: Specifies the name of the input XSA file generated during thev++ --link
command. The <filename> will be the same as the file name specified for the .xclbin output.-domain psv_cortexa72_0:standalone
: Specifies the processor domain and operating system to apply to the new platform.
You can add the new platform to your platform repository by adding the file location to your
$PLATFORM_REPO_PATHS
environment variable. This makes it accessible to the Vitis IDE for instance, or allows you to specify the platform in command-lines by simply referring to the name rather than the whole path.IMPORTANT: The generated platform will be used only for building the bare-metal PS application and is not used any other places across the flow. - Compile and link the PS application.
To build the PS application for the bare-metal flow, use the platform generated in the prior step. You need the PS application (main.cpp), and the bare-metal AI Engine control file (aie_control.cpp), which is created by the
aiecompiler
command and can be found in the ./Work/ps/c_rts folder.Compile the main.cpp file using the following command:
aarch64-none-elf-gcc -I.. -I. -I../src \ -I./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include \ -g -c -std=c++11 -o main.o main.cpp
Note: You must include the BSP include files for the generated platform, located at: ./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/includeCompile the aie_control.cpp file using the following command:
aarch64-none-elf-gcc -I.. -I. -I../src \ -I./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bspinclude/include \ -g -c -std=c++11 -o aie_control.o ../Work/ps/c_rts/aie_control.cpp
Link the PS application using the two compiled object files:
aarch64-none-elf-gcc main.o aie_control.o -g -mcpu=cortex-a72 -Wl,-T -Wl,./lscript.ld \ -L./vck190_baremetal/export/vck190_baremetal/sw/vck190_baremetal/standalone_domain/bsplib/lib \ -ladf_api -Wl,--start-group,-lxil,-lgcc,-lc,-lstdc++,--end-group -o main.elf
Note: You also need the BSP libxil.a located at ./vck190_baremetal/export/vck190_baremetal/standalone_domain/bsplib/lib during linking. Here the assumption is the AI Engine are enabled during the platform management controller (PMC) boot. - Package the System
Finally, you must run the package process to generate the final boot-able image (PDI) for running the design on the bare-metal platform. This command produces the SD card content for booting the device and running the application. Refer to Packaging for more information. This requires the use of the
v++ --package
command as shown below:v++ -p -t hw \ -f xilinx_vck190_base_202110_1 \ libadf.a project.xclbin \ --package.out_dir ./sd_card \ --package.domain aiengine \ --package.defer_aie_run \ --package.boot_mode sd \ --package.ps_elf main.elf,a72-0 \ -o aie_graph.xclbin
TIP: For bare-metal ELF files running on PS cores, you should also add thepackage.ps_elf
option to the--package
command.The use of
--package.defer_aie_run
is related to the way the AI Engine graph is run. If the application is loaded and launched at boot time, these options are not required. If your host application launches and controls the graph, then you need to use these options when compiling and packaging your system as described in Deploying the System.The ./sd_card folder, specified by the
--out_dir
option, contains the following files produced for the hardware build:|-- BOOT.BIN //BOOT.BIN file containing PDI and the application ELF |-- boot_image.bif //bootgen input file used to create BOOT.BIN `-- sd_card //SD card folder |-- aie_graph.xclbin //xclbin output file (not used) `-- BOOT.BIN //BOOT.BIN file containing PDI and the application ELF
Copy the contents of the
sd_card
folder to an SD card to create a boot device for your system.
Running the System
Running the system depends on the build target. The process of running the hardware emulation build is different from running the hardware build.
For the hardware build, copy the contents of the sd_card
folder produced by the package process to an
actual SD card. That device becomes the boot device for your system. Boot your
system and launch your application as designed. To capture event trace data when
running the hardware, see Performance Analysis of AI Engine Graph Application. To
debug the running hardware, see Debugging the AI Engine Application.
Running Hardware Emulation
To build the project for hardware emulation confirm that the target option
of the V++ link command is target=hw_emu
. Next,
the v++ --package
command generates the launch_hw_emu.sh script as part of the process for
packaging the system. This script launches the emulation environment for the
AI Engine application for test and debug
purposes. Hardware emulation runs the AI Engine
simulator for the graph application, runs the Vivado logic simulator for the PL kernels, and runs QEMU for the PS
host application.
Use the following command to launch hardware emulation from the command line.
./launch_hw_emu.sh --graphic-xsim
--graphic-xsim
switch is optional and launches the
Vivado logic simulator window where you
can specify what signals from the design you want to view. It does not include
internal AI Engine signals. Here, you must
click the Run All button in the window to
continue execution.The launch_hw_emu.sh script launches QEMU in system mode, and loads and runs the AI Engine application, running the PL kernels in the Vivado simulator. If the emulation flow completes successfully, at the end of the emulation you should see something like the following:
[LAUNCH_EMULATOR] INFO: 09:44:09 : PS-QEMU exited
[LAUNCH_EMULATOR] INFO: 09:44:09 : PMU/PMC-QEMU exited
[LAUNCH_EMULATOR] INFO: 09:44:09 : Simulation exited
pmu_path /scratch/aie_test1/hw_emu_pmu.log
pl-sim_dir /scratch/aie_test1/sim/behav_waveform/xsim
Please refer PS /simulate logs at /scratch/aie_test1 for more details.
DONE!
INFO: Emulation ran successfully
When launching hardware emulation, you can specify options for the
AI Engine simulator that runs the graph
application. The options can be specified from the launch_hw_emu.sh script using the -aie-sim-options
as described in Simulator Options for Hardware Emulation.
When the emulation is fully booted and the Linux prompt is up, make sure to set the following environment variable.
export XILINX_XRT=/usr
This ensures that the host application works. Note that this also must be done when running on hardware.
Generating Traffic for HW Emulation
PLIO("DataIn", adf::plio_32_bits, "data/input.txt")
For
hardware emulation an equivalent feature exists that emulates the behavior of this
PLIO and AXI4-Stream interface. Both Python and
C++ APIs are provided to make this easier to use.The primary external data interfaces for the AI Engine array are AXI4-Stream interfaces. These are known as PLIOs and allow the AI Engine to receive data, operate on the data, and send data back on a separate AXI4-Stream interface. The input interface to the AI Engine is an AXI4-Stream slave and the output is an AXI4-Stream master. To interact with these top level interfaces during hardware emulation complementary AXI4-Stream modules are provided. These complementary modules are referred to as the AXI traffic generators.
AXI Traffic Generators
The AXI Traffic generators are provided as XO files which need to be linked
to your simulation platform using the Vitis
compiler (v++
). These XO files are called
sim_ipc_axis_master_XY.xo and sim_ipc_axis_slave_ZW.xo where XY and ZW correspond
to the number of bits in the PLIO interface. For example sim_ipc_axis_master_128.xo provides an AXI4-Stream master data bus that is 128 bits wide. A wider interface
allows the PL to achieve the same throughput at a lower clock frequency and allows
the AI Engine array to maximize its memory
bandwidth. However, the PLIO interface tiles are each 64 bits wide and they are a
limited resource. Using one 64-bit PLIO interface at twice the clock speed provides
an equivalent bandwidth to a 128-bit PLIO while using only one PLIO tile. This
requires the PL to run at twice the clock speed and the optimal choice will vary
from application to application.
Two steps are required to use the traffic generators with the
Vitis compiler. First, make the
connections between the sim_ipc
modules and their
corresponding AXI4-Stream ports on the AI Engine array. This is typically done in the
system.cfg file. Here is an example:
[connectivity]
nk=sim_ipc_axis_master:1:inst_sim_ipc_axis_master
nk=sim_ipc_axis_slave:1:inst_sim_ipc_axis_slave
stream_connect=sim_ipc_axis_master.M00_AXIS:ai_engine_0.DataIn
stream_connect=ai_engine_0.DataOut:sim_ipc_axis_slave.S00_AXIS
sim_ipc_axis
XO files is as
follows.nk=sim_ipc_axis_master:<Number Of Masters>:<your_instance_name_1>
nk=sim_ipc_axis_slave:<Number Of Slaves>:<your_instance_name_2>
The
sim_ipc_axis_master/slave
specifies the type
of XO file and the instance name should be meaningful to your application.sim_ipc
XO files can only be used with target hw_emu
.v++ -l --platform <platform.xpfm> sim_ipc_axis_master_128.xo sim_ipc_axis_slave_128.xo libadf.a -target hw_emu --config system.cfg
For additional information on how to use XO files with the Vitis compiler see https://github.com/Xilinx/Vitis-Tutorials/tree/master/AI_Engine_Development/Feature_Tutorials/05-AI-engine-versal-integration.
Number of Masters
field in the
system.cfg file from 1 to as many as needed (up to
8).Formatting Data with Traffic Generators in Python
To emulate AXI4-Stream transactions AXI Traffic Generators require the payload data to be broken into appropriately sized bursts. For example, to send 128 bytes with a PLIO width of 32 bits (4 bytes) requires 128 bytes/4 bytes = 32AXI4-Stream transactions. Converting between bytes arrays and AXI transactions can be handled in Python.
The Python struct
library
provides a mechanism to convert between Python and C data types. Specifically, the
struct.pack
and struct.unpack
functions pack and unpack byte arrays according to a
format string argument. The following table shows format strings for common C data
types and PLIO widths.
For more information see: https://docs.python.org/3/library/struct.html
Data Type | PLIO Width | Python Code Snippet |
---|---|---|
cfloat | PLIO32 | N/A |
PLIO64 | rVec =
np.real(data)
|
|
PLIO128 | ||
cint16 | PLIO32 | rVec =
np.real(data).astype(np.int16)
|
PLIO64 | ||
PLIO128 | ||
int8 | PLIO32 | intvec =
np.real(data).astype(np.int8)
|
PLIO64 | ||
PLIO128 | ||
int32 | PLIO32 | intvec =
np.real(data).astype(np.int32)
|
PLIO64 | ||
PLIO128 |
The remaining aspects of Python libraries, interacting with the
sim_ipc
Python object and providing and
receiving data are beyond the scope of this document.
A significant benefit of this feature is that it enables you to integrate your AI Engine design with a larger system while also minimizing the amount of PS code required. This is useful during development where not all domains of the system are ready to integrate.
Because the data source and sink are kept completely within the
simulated PL domain the host only needs to provide setup and control functionality.
For example the main
in a minimal host.cpp
might look like the following.
int main(int argc, char ** argv)
{
//////////////////////////////////////////
// Open xclbin
//////////////////////////////////////////
auto dhdl = xrtDeviceOpen(0); // Open Device the local device
if(dhdl == nullptr)
throw std::runtime_error("No valid device handle found. Make sure using right xclOpen index.");
auto xclbin = load_xclbin(dhdl, "a.xclbin");
auto top = reinterpret_cast(xclbin.data());
adf::registerXRT(dhdl, top->m_header.uuid);
//////////////////////////////////////////
// graph execution for AIE
//////////////////////////////////////////
printf("graph init.\n");
mygraph_top.init();
printf("graph run\n");
mygraph_top.run(1);
mygraph_top.end();
printf("graph end\n");
xrtDeviceClose(dhdl);
return 0;
}
Deploying the System
The Vitis design execution model has multiple considerations that impact how the AI Engine graph is loaded onto the board, run, reset, and reloaded. Depending on the needs of the application you have a choice of loading the AI Engine graph at board boot up time, or using the PS host application. In addition, you can also control running the graph as soon as the graph is loaded or defer it to a later time. You also have the option of running the graph infinitely or for a fixed number of iterations or cycles.
AI Engine Graph Load and Run
The AI Engine graph can be loaded and
run immediately at boot, or it can be loaded by the host PS application.
Additionally, you also have the option of deferring the running of the graph after
the graph has been loaded using the graph.run()
host
API XRT call. By default, the Xilinx® platform management controller (PMC) loads and runs the
graph. However the v++ --package.defer_aie_run
option will let you defer the graph run until after the graph has been loaded using
the graph.run()
API call. The following table
lists the deployment options.
Host Control | Run Forever |
---|---|
Specify v++
--package.defer_aie_run to stop the AI Engine from starting at
boot-up. |
Enable it in the PDI and let the graph run forever. |
Enable the graph from the PS program using
graph.run() |
AI Engine Run Iterations
The AI Engine graph can run for a
limited number of iterations or infinitely. By default, the graph runs infinitely.
You can use the graph.run(run_iterations)
or
graph.end(cycles)
to limit the number of
graph runs to a specific number of iterations or for a specific number of cycles.
See Run-Time Graph Control API.