Simulating an AI Engine Graph Application

This chapter describes the various execution targets available to simulate your AI Engine applications at different levels of abstraction, accuracy, and speed. System C simulation (see AI Engine System C Simulator) models the timing and resources of the AI Engine array accurately, while using transaction-level, approximately timed System C models for NoC, DDR, PL, and PS. This allows reasonably accurate performance analysis of your AI Engine applications in a reasonable time.

Functional simulation, as described in x86 Functional Simulator (x86simulator), is the fastest simulation but does not provide timing, resource, or performance information. It can be used to functionally test your application, and is useful for early iterations in the design development.

As shown in Integrating the Application Using the Vitis Tools Flow and Using the Vitis IDE, the Vitis™ compiler builds the system-level project runs the simulator from the GUI. Alternatively, the options can be specified on a command line or in a script.

AI Engine System C Simulator

The Versal™ ACAP AI Engine System C simulator (aiesimulator) includes the modeling of the global memory (DDR memory) and the network on chip (NoC) in addition to the AI Engine array. When the application is compiled using the System C simulation target, the AI Engine System C simulator can be invoked as follows:

aiesimulator –-pkg-dir=./Work
IMPORTANT: Using AI Engine simulator requires the setup described in Setting Up the Vitis Tool Environment.

The various configuration and binary files are generated by the AI Engine compiler under the Work directory (see Compiling an AI Engine Graph Application) and specified using the --pkg-dir option to the simulator. The graph is initialized, run, and terminated by a control thread expressed in the main application. The AI Engine compiler compiles that control thread with a PS IP wrapper to be directly loaded into the simulator.

By default, the graph.run() option specifies a graph that runs forever. The AI Engine compiler generates code to execute the data-flow graph in a perpetual While loop, thus simulation also runs perpetually. To create terminating programs for debugging, specify graph.run(<number_of_iterations>) in your graph code to limit the execution for the specified number of iterations. The specified number of iterations can be any positive integer value.

The AI Engine simulator command first configures the simulator as specified in the compiler generated Work/config/scsim_config.json file. This includes loading PL IP blocks and their connections, configuring I/O data file drivers, and configuring the NoC and global memory (DDR memory) connections. It then executes the specified PS application and finally exits the simulator.

The AI Engine simulator has an optional --profile option, which enables printfs in kernel code to appear on the console, and also generates profile information. Also, the --dump-vcd <filename> option generates a value change dump (VCD) for the duration of the simulation. The --simulation-cycle-timeout <number-of-cycles> can be used to exit the simulation after a given number of clock cycles.
IMPORTANT: If you do not provide either the clock cycles or the number of runs to graph.run() the simulation runs forever. You need to press Ctrl+c twice to exit the simulator
TIP: You could observe cycle count differences between simulation runs on the same design. This is because the simulator waits for a few seconds for all pending transactions (such as DMA) to finish. During this wait time, the simulator process is still ticking but can be context-switched by the OS and total cycles can be different for each run. To ensure that the total cycles are the same for each run, you should use the AI Engine simulator --simulation-cycle-timeout option to stop the simulator on the exact cycle. The total cycles that appear on the profiling report are same on each run.

Simulator Options

The complete set of the AI Engine simulator (aiesimulator) options are described in this section. In most cases, just pointing to pkg-dir is sufficient.

Table 1. AI Engine Simulator Options
Options Description
-h, --help Show this help message and exit.
--dump-vcd FILE Dump VCD waveform information into FILE. Because the tool appends .vcd to the specified file name, it is not necessary to include the file suffix.
--gm-init-file <file> Read global memory image from file. This loads the memory initialization file as described in Simulating Global Memory.
--pkg-dir <PKG_DIR> Specify the package directory, for example, ./Work.
--profile Allow generation of printf trace messages on the stdout and collect profiling statistics during simulation. This could slow down the simulator slightly.
--simulation-cycle-timeout CYCLES Run the application for a given number of cycles after it is loaded.
TIP: Specify the --simulation-cycle-timeout option to end the simulation session after the specified number of timeouts. However, when specifying simulation timeout during the debug process, be sure to specify a large number of cycles because the debug will terminate when the timeout cycle is reached.
--online [-ctf] [-wdb]

Call vcdanalyze to parse VCD data on-the-fly, to optionally produce common trace format (CTF), or waveform database (WDB) output.

TIP: The --online option and --dump-vcd option cannot be used together. If both options are specified, only --online option takes effect.
--enable-memory-check Enables run-time program and data memory boundary access - checks. Any access violation will be reported as an [ERROR] message. By default this option is disabled.

Simulation Input and Output Data Streams

The default bit width for input/output streams is 32 bits. The bit width specifies the number of samples per line on the simulation input file. The interpretation of the samples on each line of the input file is dependent on the data type expected and the PLIO data width. The following table shows how the samples in the input data file are interpreted, depending on the data type and its corresponding PLIO interface specification.

Table 2. Simulation Input Data Dependency on Data Type and PLIO Width
Data Type PLIO 32 bit PLIO 64 bit PLIO 128 bit
PLIO *in0 = new PLIO("DataIn1", adf::plio_32_bits) PLIO *in0 = new PLIO("DataIn1", adf::plio_64_bits) PLIO *in0 = new PLIO("DataIn1", adf::plio_128_bits)
int8 //4 values per line

6 8 3 2

//8 values per line

6 8 3 2 6 8 3 2

//16 values per line

6 8 3 2 6 8 3 2 6 8 3 2 6 8 3 2

int16 // 2 values per line

24 18

// 4 values per line

24 18 24 18

// 8 values per line

24 18 24 18 24 18 24 18

int32 // single value per line

2386

// 2 values per line

2386 2386

// 4 values per line

2386 2386 2386 2386

int64 N/A 45678 // 2 values per line

45678 95578

cint16 // 1 cint value per line – real, imaginary

1980 485

// 2 cint values per line

1980 45 180 85

// 4 cint values per line

1980 485 180 85 980 48 190 45

cint32 N/A // 1 cint value per line – real, imaginary

1980 485

// 2 cint values per line

1980 45 180 85

float //1 floating point value per line

893.5689

//2 floating point values per line

893.5689 3459.3452

//4 floating point values per line

893.5689 39.32 459.352 349.345

cfloat N/A //1 floating point cfloat value per line, real, imaginary

893.5689 24156.456

//2 floating point cfloat values per line, real, imaginary

893.5689 24156.456 93.689 256.46

Simulating Global Memory

When an application accesses global memory using the GMIO specification (see GMIO Attributes), the simulation needs to model the DDR memory and the routing network connecting the DDR memory to the PL and AI Engines. AI Engine to DDR memory connections are mediated by the DMA data mover that is embedded in the AI Engine array interface and controlled through the GMIO APIs in the PS program. Connections to DDR memory from an AXI4-Stream port on a PL block are mediated by a soft GMIO data mover block, which is generated automatically by the AI Engine compiler for simulation purposes. The data mover converts the streaming interface from the PL blocks to memory-mapped AXI4 interface transactions over the NoC with a specific start address, block size, and burst size as shown in GMIO Attributes.

While simulating with global memory, a memory data file can be supplied using an additional option, --gm-init-file, which initializes the DDR memory with predefined data. This file is a textual byte-dump of the DDR memory starting at a given address. The format of this file is as follows:

<startaddr>:
<byte>
<byte>
…

For example, the AI Engine simulator can be invoked with global memory initialization in the following way:

aiesimulator –-pkg-dir=./Work -–gm-init-file=dump.txt

The simulator also produces an output byte dump for the DDR memory used, in the simulation output directory (default: aiesimulator_output). The name of the output file is based on the internal location of the DDR memory bank (for example, DDRMC_SITE_X1Y0.mem) starting at the base address 0x0. You can use this dump to verify the global memory transactions.

Simulator Options for Hardware Emulation

The AI Engine simulator generates an options file that lists the options used for simulating the AI Engine graph application. The options file is automatically generated when the AI Engine simulator is run. This allows reuse of the AI Engine simulator options from the initial graph-level simulation to later in the system-level emulation. You can also manually edit the options file to specify other options as required. The following table lists the options that can be specified in the aiesim_options.txt file, which is located in the aiesimulator_output directory and will only be created if the option, --dump-vcd is used with the aiesimulator command. This file can be specified as part of the command line option to launch the emulator using the launch_hw_emu.sh script as described in Running the System. An example command line follows:

./launch_hw_emu.sh \
-add-env VITIS_LAUNCH_WAVEFORM_BATCH=1 \
-add-env AIE_COMPILER_WORKDIR=${FULL_PATH}/Work \
-aie-sim-options ${FULL_PATH}/aiesimulator_output/aiesim_options.txt

where ${FULL_PATH} must be the full path to the file or directory.

Additionally, you can add more advanced options to log waveform data without having to launch emulation with the Vivado logic simulator (XSIM) GUI. An example command line follows:

./launch_hw_emu.sh \
-user-pre-sim-script pre-sim.tcl

The pre-sim.tcl contains Tcl commands to add waveforms or log design waveforms. For an example see Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393) and for Tcl commands see Vivado Design Suite User Guide: Logic Simulation (UG900).

Table 3. Hardware Emulation Options
Command Arguments Description
AIE_DUMP_VCD <filename> When AIE_DUMP_VCD is specified, the simulation will generate VCD data and write it to the specified <filename>.vcd.

Enabling Third-Party Simulators

Third-party simulators such as Questa Advanced Simulator and Xcelium are supported when executing hardware emulation of your design. You can enable these simulators by updating the Vitis configuration file (config.ini or system.cfg).
Table 4. Vitis Link Settings
Simulator v++ --link Configuration
Questa EXPORT simulator=questa
[advanced]
param=hw_emu.simulator=QUESTA
[vivado]
prop=project.__CURRENT__.compxlib.questa_compiled_library_dir=/path/to/questa/2020.2/lin64/lib/
Xcelium EXPORT simulator=xcelium
[advanced]
param=hw_emu.simulator=XCELIUM
[vivado]
prop=project.__CURRENT__.simulator.xcelium_install_dir=/path/to/xcelium/bin/
prop=project.__CURRENT__.compxlib.xcelium_compiled_library_dir=/path/to/xcelium/20.03.005/lin64/lib/
prop=fileset.sim_1.xcelium.simulate.runtime=1000us
prop=fileset.sim_1.xcelium.elaborate.xmelab.more_options={-timescale 1ns/1ps}

When the modifications have been made, build the design as normal, run the script launch_hw_emu.sh and the new simulator will be used. More information on emulation is provided in Running the System.

x86 Functional Simulator (x86simulator)

Note: The x86 simulator is a preliminary feature and enhancements are planned in future releases.
  1. When the application is compiled using the x86 simulation target, the x86 simulator can be invoked.
    x86simulator –-pkg-dir=./Work –-input-dir=<dir> --output-dir=<dir>

    The compiled binary for x86 native simulation is produced by the AI Engine compiler under the Work directory (see Compiling an AI Engine Graph Application) and is started automatically by this wrapper script. The input and the output files are picked up as specified on the command line, with the path default relative to the current directory. The complete x86simulator command help is shown here:

    $ x86simulator --help
     x86simulator [-h] [--help] [--h] [--pkg-dir=PKGDIR] [--gm-init-file=GM_INIT_FILENAME]
     optional arguments:
     -h,--help  --h show this help message and exit
    --pkg-dir=PKG_DIR     Set the package directory. ex: Work
    --gm-init-file=GM_INIT_FILENAME  set the gm-init-file image for GMIO
    --i, -i ,--input-dir=PATH  Set the input-dir to . by Default
    --o, -o ,--output-dir=PATH  Set the input-dir to . by Default
  2. The output files produced by the simulator can be compared with a golden output ignoring white-space differences.
    diff –w <data>/golden.txt <data>/output.txt
  3. In applications where run-time parameters need to be updated dynamically, you can control the application from the main program using update API as described in Run-Time Parameter Update/Read Mechanisms.
  4. The simulator will run continuously, unless you specify a number of iterations to run through the main application, as shown in the following example:
    clipgraph.init()
    clipgraph.run(3)
    clipgraph.end()