Simulating an AI Engine Graph Application
This chapter describes the various execution targets available to simulate your AI Engine applications at different levels of abstraction, accuracy, and speed. System C simulation (see AI Engine System C Simulator) models the timing and resources of the AI Engine array accurately, while using transaction-level, approximately timed System C models for NoC, DDR, PL, and PS. This allows reasonably accurate performance analysis of your AI Engine applications in a reasonable time.
Functional simulation, as described in x86 Functional Simulator (x86simulator), is the fastest simulation but does not provide timing, resource, or performance information. It can be used to functionally test your application, and is useful for early iterations in the design development.
As shown in Integrating the Application Using the Vitis Tools Flow and Using the Vitis IDE, the Vitis™ compiler builds the system-level project runs the simulator from the GUI. Alternatively, the options can be specified on a command line or in a script.
AI Engine System C Simulator
The Versal™
ACAP AI Engine System C simulator (aiesimulator
) includes the modeling of the global
memory (DDR memory) and the network on chip (NoC) in addition to the AI Engine array. When the application is compiled
using the System C simulation target, the AI Engine System C simulator can be invoked as follows:
aiesimulator –-pkg-dir=./Work
The various configuration and binary files are generated by the
AI Engine compiler under the Work directory (see Compiling an AI Engine Graph Application) and specified using the --pkg-dir
option to the simulator. The graph is initialized, run, and
terminated by a control thread expressed in the main
application. The AI Engine
compiler compiles that control thread with a PS IP wrapper to be directly loaded
into the simulator.
By default, the graph.run()
option
specifies a graph that runs forever. The AI Engine compiler generates code to execute the data-flow graph in a perpetual While
loop, thus simulation also runs perpetually. To
create terminating programs for debugging, specify graph.run(<number_of_iterations>)
in your graph code to limit
the execution for the specified number of iterations. The specified number of
iterations can be any positive integer value.
The AI Engine simulator command first configures the simulator as specified in the compiler generated Work/config/scsim_config.json file. This includes loading PL IP blocks and their connections, configuring I/O data file drivers, and configuring the NoC and global memory (DDR memory) connections. It then executes the specified PS application and finally exits the simulator.
printfs
in kernel code to appear on the console, and
also generates profile information. Also, the --dump-vcd
<filename> option generates a value change dump (VCD) for the
duration of the simulation. The --simulation-cycle-timeout
<number-of-cycles> can be used to exit the simulation after a
given number of clock cycles. graph.run()
the simulation runs
forever. You need to press Ctrl+c twice to exit the
simulator--simulation-cycle-timeout
option to stop the simulator on the exact
cycle. The total cycles that appear on the profiling report are same on each
run.Simulator Options
The complete set of the AI Engine simulator
(aiesimulator
) options are described in this section. In most
cases, just pointing to pkg-dir is
sufficient.
Options | Description |
---|---|
-h, --help | Show this help message and exit. |
--dump-vcd FILE | Dump VCD waveform information into FILE . Because the tool appends
.vcd to the specified file
name, it is not necessary to include the file suffix. |
--gm-init-file <file> | Read global memory image from file. This loads the memory initialization file as described in Simulating Global Memory. |
--pkg-dir <PKG_DIR> | Specify the package directory, for example, ./Work. |
--profile | Allow generation of printf trace messages on the stdout and collect profiling statistics during
simulation. This could slow down the simulator slightly. |
--simulation-cycle-timeout CYCLES | Run the application for a given number of cycles
after it is loaded. TIP: Specify the --simulation-cycle-timeout option to end the
simulation session after the specified number of timeouts.
However, when specifying simulation timeout during the debug
process, be sure to specify a large number of cycles because the
debug will terminate when the timeout cycle is
reached. |
--online [-ctf] [-wdb] |
Call TIP: The
--online option and --dump-vcd option cannot be used together. If both
options are specified, only --online option takes effect. |
--enable-memory-check | Enables run-time program and data memory boundary access - checks. Any access violation will be reported as an [ERROR] message. By default this option is disabled. |
Simulation Input and Output Data Streams
The default bit width for input/output streams is 32 bits. The bit width specifies the number of samples per line on the simulation input file. The interpretation of the samples on each line of the input file is dependent on the data type expected and the PLIO data width. The following table shows how the samples in the input data file are interpreted, depending on the data type and its corresponding PLIO interface specification.
Data Type | PLIO 32 bit | PLIO 64 bit | PLIO 128 bit |
---|---|---|---|
PLIO *in0 = new PLIO("DataIn1",
adf::plio_32_bits) |
PLIO *in0 = new PLIO("DataIn1",
adf::plio_64_bits) |
PLIO *in0 = new PLIO("DataIn1",
adf::plio_128_bits) |
|
int8 | //4 values per line 6 8 3 2 |
//8 values per line 6 8 3 2 6 8 3 2 |
//16 values per line 6 8 3 2 6 8 3 2 6 8 3 2 6 8 3 2 |
int16 | // 2 values per line 24 18 |
// 4 values per line 24 18 24 18 |
// 8 values per line 24 18 24 18 24 18 24 18 |
int32 | // single value per line 2386 |
// 2 values per line 2386 2386 |
// 4 values per line 2386 2386 2386 2386 |
int64 | N/A | 45678 | // 2 values per line 45678 95578 |
cint16 | // 1 cint value per line – real, imaginary 1980 485 |
// 2 cint values per line 1980 45 180 85 |
// 4 cint values per line 1980 485 180 85 980 48 190 45 |
cint32 | N/A | // 1 cint value per line – real, imaginary 1980 485 |
// 2 cint values per line 1980 45 180 85 |
float | //1 floating point value per line 893.5689 |
//2 floating point values per line 893.5689 3459.3452 |
//4 floating point values per line 893.5689 39.32 459.352 349.345 |
cfloat | N/A | //1 floating point cfloat value per line, real,
imaginary 893.5689 24156.456 |
//2 floating point cfloat values per line, real,
imaginary 893.5689 24156.456 93.689 256.46 |
Simulating Global Memory
When an application accesses global memory using the GMIO specification (see GMIO Attributes), the simulation needs to model the DDR memory and the routing network connecting the DDR memory to the PL and AI Engines. AI Engine to DDR memory connections are mediated by the DMA data mover that is embedded in the AI Engine array interface and controlled through the GMIO APIs in the PS program. Connections to DDR memory from an AXI4-Stream port on a PL block are mediated by a soft GMIO data mover block, which is generated automatically by the AI Engine compiler for simulation purposes. The data mover converts the streaming interface from the PL blocks to memory-mapped AXI4 interface transactions over the NoC with a specific start address, block size, and burst size as shown in GMIO Attributes.
While simulating with global memory, a memory data file can be supplied using an additional option, --gm-init-file, which initializes the DDR memory with predefined data. This file is a textual byte-dump of the DDR memory starting at a given address. The format of this file is as follows:
<startaddr>:
<byte>
<byte>
…
For example, the AI Engine simulator can be invoked with global memory initialization in the following way:
aiesimulator –-pkg-dir=./Work -–gm-init-file=dump.txt
The simulator also produces an output byte dump for the DDR
memory used, in the simulation output directory (default: aiesimulator_output). The name of the output file is based on the
internal location of the DDR memory bank (for example, DDRMC_SITE_X1Y0.mem) starting at the base address 0x0
. You can use this dump to verify the global memory
transactions.
Simulator Options for Hardware Emulation
The AI Engine simulator generates an
options file that lists the options used for simulating the AI Engine graph application. The options file is automatically
generated when the AI Engine simulator is run.
This allows reuse of the AI Engine simulator
options from the initial graph-level simulation to later in the system-level
emulation. You can also manually edit the options file to specify other options as
required. The following table lists the options that can be specified in the
aiesim_options.txt file, which is located
in the aiesimulator_output directory and will
only be created if the option, --dump-vcd
is used
with the aiesimulator
command. This file can be
specified as part of the command line option to launch the emulator using the
launch_hw_emu.sh script as described in
Running the System. An example command line
follows:
./launch_hw_emu.sh \
-add-env VITIS_LAUNCH_WAVEFORM_BATCH=1 \
-add-env AIE_COMPILER_WORKDIR=${FULL_PATH}/Work \
-aie-sim-options ${FULL_PATH}/aiesimulator_output/aiesim_options.txt
where ${FULL_PATH}
must be the full path to
the file or directory.
Additionally, you can add more advanced options to log waveform data without having to launch emulation with the Vivado logic simulator (XSIM) GUI. An example command line follows:
./launch_hw_emu.sh \
-user-pre-sim-script pre-sim.tcl
The pre-sim.tcl contains Tcl commands to add waveforms or log design waveforms. For an example see Vitis Unified Software Platform Documentation: Application Acceleration Development (UG1393) and for Tcl commands see Vivado Design Suite User Guide: Logic Simulation (UG900).
Command | Arguments | Description |
---|---|---|
AIE_DUMP_VCD |
<filename> | When AIE_DUMP_VCD is specified, the simulation will
generate VCD data and write it to the specified <filename>.vcd. |
Enabling Third-Party Simulators
Simulator | v++ --link
Configuration |
---|---|
Questa | EXPORT
simulator=questa
|
Xcelium | EXPORT
simulator=xcelium
|
When the modifications have been made, build the design as normal, run the script launch_hw_emu.sh and the new simulator will be used. More information on emulation is provided in Running the System.
x86 Functional Simulator (x86simulator)
- When the application is compiled using the x86 simulation target, the x86
simulator can be
invoked.
x86simulator –-pkg-dir=./Work –-input-dir=<dir> --output-dir=<dir>
The compiled binary for x86 native simulation is produced by the AI Engine compiler under the Work directory (see Compiling an AI Engine Graph Application) and is started automatically by this wrapper script. The input and the output files are picked up as specified on the command line, with the path default relative to the current directory. The complete x86simulator command help is shown here:
$ x86simulator --help x86simulator [-h] [--help] [--h] [--pkg-dir=PKGDIR] [--gm-init-file=GM_INIT_FILENAME] optional arguments: -h,--help --h show this help message and exit --pkg-dir=PKG_DIR Set the package directory. ex: Work --gm-init-file=GM_INIT_FILENAME set the gm-init-file image for GMIO --i, -i ,--input-dir=PATH Set the input-dir to . by Default --o, -o ,--output-dir=PATH Set the input-dir to . by Default
- The output files produced by the simulator can be compared with a
golden output ignoring white-space
differences.
diff –w <data>/golden.txt <data>/output.txt
- In applications where run-time parameters need to be updated dynamically, you
can control the application from the
main
program using update API as described in Run-Time Parameter Update/Read Mechanisms. - The simulator will run continuously, unless you specify a number of
iterations to run through the main application, as shown in the following
example:
clipgraph.init() clipgraph.run(3) clipgraph.end()