AI Engine Programming
An AI Engine program consists of a Data Flow Graph Specification written in C++. As described in C++ Template Support you can use template classes or functions for writing the AI Engine graph or kernels. The application can be compiled and executed using the AI Engine tool chain. This chapter provides an introduction to writing an AI Engine program.
A complete class reference guide is shown in Adaptive Data Flow Graph Specification Reference. The example that is used in this chapter can be found as a template example in the Vitis™ environment when creating a new AI Engine project.
Prepare the Kernels
Kernels are computation functions that form the fundamental building blocks of
the data flow graph specifications. Kernels are declared as ordinary C/C++ functions
that return void
and can use special data types as
arguments (discussed in Window and Streaming Data API). Each kernel
should be defined in its own source file. This organization is recommended for
reusability and faster compilation. Furthermore, the kernel source files should include
all relevant header files to allow for independent compilation. It is recommended that a
header file (kernels.h
in this documentation) should
declare the function prototypes for all kernels used in a graph. An example is shown
below.
#ifndef FUNCTION_KERNELS_H
#define FUNCTION_KERNELS_H
void simple(input_window_cint16 * in, output_window_cint16 * out);
#endif
In the example, the #ifndef
and #endif
are present to ensure that the include file is only
included once, which is good C/C++ practice.
Creating a Data Flow Graph (Including Kernels)
- Define your application graph class in a separate header file (for
example
project.h
). First, add the Adaptive Data Flow (ADF) header (adf.h) and include the kernel function prototypes. The ADF library includes all the required constructs for defining and executing the graphs on AI Engines.#include <adf.h> #include "kernels.h"
- Define your graph class by using the objects which are defined in the
adf
name space. All user graphs are derived from the classgraph
.include <adf.h> #include "kernels.h" using namespace adf; class simpleGraph : public graph { private: kernel first; kernel second; };
This is the beginning of a graph class definition that declares two kernels (
first
andsecond
). - Add some top-level ports to the
graph.
#include <adf.h> #include "kernels.h" using namespace adf; class simpleGraph : public graph { private: kernel first; kernel second; public: input_port in; output_port out; };
- Use the
kernel::create
function to instantiate thefirst
andsecond
C++ kernel objects using the functionality of the C functionsimple
.#include <adf.h> #include "kernels.h" using namespace adf; class simpleGraph : public graph { private: kernel first; kernel second; public: input_port in; output_port out; simpleGraph() { first = kernel::create(simple); second = kernel::create(simple); } };
- Add the connectivity information, which is equivalent to the nets in a
data flow graph. In this description, ports are referenced by indices. The first input
window or stream argument in the
simple
function is assigned index 0 in an array of input ports (in
). Subsequent input arguments take ascending consecutive indices. The first output window or stream argument in thesimple
function is assigned index 0 in an array of output ports (out
). Subsequent output arguments take ascending consecutive indices.#include <adf.h> #include "kernels.h" using namespace adf; class simpleGraph : public graph { private: kernel first; kernel second; public: input_port in; output_port out; simpleGraph() { first = kernel::create(simple); second = kernel::create(simple); connect< window<128> > net0 (in, first.in[0]); connect< window<128> > net1 (first.out[0], second.in[0]); connect< window<128> > net2 (second.out[0], out); } };
As shown, the input port from the top level is connected into the input port of the first kernel, the output port of the first kernel is connected to the input port of the second kernel, and the output port of the second kernel is connected to the output exposed to the top level. The first kernel executes when 128 bytes of data (32 complex samples) are collected in a buffer from an external source. This is specified as a window parameter at the connection
net0
. Likewise, the second kernel executes when its input window has valid data being produced as the output of the first kernel expressed via connectionnet1
. Finally, the output of the second kernel is connected to the top level output port as connectionnet2
, specifying that upon termination the second kernel will produce 128 bytes of data. - Set the source file and tile usage for each of the kernels. The source
file kernel.cc contains kernel first and kernel
second source code. Then the ratio of the function run time compared to the cycle budget,
known as the run-time ratio, and must be between
0
and1
. The cycle budget is the number of instruction cycles a function can take to either consume data from its input (when dealing with a rate limited input data stream), or to produce a block of data on its output (when dealing with a rate limited output data stream). This cycle budget can be affected by changing the block sizes.#include <adf.h> #include "kernels.h" using namespace adf; class simpleGraph : public graph { private: kernel first; kernel second; public: input_port in; output_port out; simpleGraph(){ first = kernel::create(simple); second = kernel::create(simple); connect< window<128> > net0 (in, first.in[0]); connect< window<128> > net1 (first.out[0], second.in[0]); connect< window<128> > net2 (second.out[0], out); source(first) = "kernels.cc"; source(second) = "kernels.cc"; runtime<ratio>(first) = 0.1; runtime<ratio>(second) = 0.1; } };
Note: See Run-Time Ratio for more information. - Define a top-level application file (for example project.cpp) that contains an instance of your graph class
and connect the graph to a simulation platform to provide file input and output. In this
example, these files are called input.txt and
output.txt.
#include "project.h" simpleGraph mygraph; simulation::platform<1,1> platform(“input.txt”,”output.txt”); connect<> net0(platform.src[0], mygraph.in); connect<> net1(mygraph.out, platform.sink[0]); int main(void) { adf::return_code ret; mygraph.init(); ret=mygraph.run(<number_of_iterations>); if(ret!=adf::ok){ printf("Run failed\n"); return ret; } ret=mygraph.end(); if(ret!=adf::ok){ printf("End failed\n"); return ret; } return 0; }
mygraph.run()
option specifies a graph that runs forever.
The AI Engine compiler generates code to execute the data
flow graph in a perpetual while loop. To limit the execution of the graph for debugging and
test, specify the mygraph.run(<number_of_iterations>)
in the graph code. The specified number of iterations can be one or more.ADF APIs have return enumerate type return_code
to show the API running status.
The main
program is the driver for the
graph. It is used to load, execute, and terminate the graph. See Run-Time Graph Control API for more details.
Run-Time Ratio
run-time ratio = (cycles for one run of the kernel)/(cycle budget)
The
cycle budget is the cycles allowed to run one invocation of the kernel which depends on
the system throughput requirement. synchronization of synchronous buffers + function initialization + loop count * cycles of each iteration of the loop + preamble and postamble of the loop
Cycles for one run of the kernel can also be profiled in the AI Engine simulator when vectorized code is available.
If multiple AI Engine kernels are put into
a single AI Engine, they run in a sequential manner,
one after the other, and they all run once with each iteration of graph::run
. This means the following.
- If the AI Engine run-time percentage
(specified by the run-time constraint) is allocated for the kernel in each iteration
of
graph::run
(or on an average basis, depending on the system requirement), the kernel performance requirement can be met. - For a single iteration of
graph::run
, the kernel takes no more percentage than that specified by the run-time constraint. Otherwise, it might affect other kernels' performance that are located in the same AI Engine. - Even if multiple kernels have a summarized run-time ratio less than one, they are not necessarily put into a single AI Engine. The mapping of an AI Engine kernel into an AI Engine is also affected by hardware resources. For example, there must be enough program memory to allow the kernels to be in the same AI Engine, and also, stream interfaces must be available to allow all the kernels to be in the same AI Engine.
- When multiple kernels are put into the same AI Engine, resources might be saved. For example, the buffers between the kernels in the same AI Engine are single buffers instead of ping-pong buffers.
- Increasing the run-time ratio of a kernel does not necessarily mean that the performance of the kernel or the graph is increased, because the performance is also affected by the data availability to the kernel and the data throughput in and out of the graph. An unreasonably high run-time ratio setting might result in inefficient resource utilization.
- Low run-time ratio does not necessarily limit the performance of the kernel to the specified percentage of the AI Engine. For example, the kernel can run immediately when all the data is available if there is only one kernel in the AI Engine, no matter what run-time ratio is set.
- Kernels in different top-level graphs can not be put into the same AI Engine, because the graph API needs to control different graphs independently.
- Set the run-time ratio as accurately as possible, because it affects not only the AI Engine to be used, but also the data communication routes between kernels. It might also affect other design flows, for example, the power estimation.
Recommended Project Directory Structure
The following directory structure and coding practices are recommended for organizing your AI Engine projects to provide clarity and reuse.
- All adaptive data flow (ADF) graph class definitions, that is, all the ADF
graphs are derived from graph class
adf::graph
, must be located in a header file. Multiple ADF graph definitions can be included in the same header file. This class header file should be included in themain
application file where the actual top-level graph is declared in the file scope (see Creating a Data Flow Graph (Including Kernels)). - There should be no dependencies on the order that the header files are included. All header files must be self-contained and include all the other header files that it needs.
- There should be no file scoped variable or data-structure definitions in the
graph header files. Any definitions (including
static
) must be declared in a separate source file that can be identified in theheader
property of the kernel where they are referenced (see Look-up Tables). - There is no need to declare the kernels under
extern "C" {...}
. However, this declaration can be used in an application meant to run full-program simulation, but it must adhere to the following conditions:- If the kernel-function declaration is wrapped with
extern "C"
, then the definition must know about it. This can be done by either including the header file inside the definition file, or wrapping the definition withextern "C"
. - The
extern "C"
must be wrapped with#ifdef __cplusplus
. This is synonymous to howextern "C"
is used in stdio.h.
- If the kernel-function declaration is wrapped with
Compiling and Running the Graph from the Command Line
- To compile your graph, execute the following command (see Compiling an AI Engine Graph Application for more
details).
aiecompiler project.cpp
The program is called project.cpp. The AI Engine compiler reads the input graph specified, compiles it to the AI Engine array, produces various reports, and generates output files in the Work directory.
- After parsing the C++ input into a graphical intermediate form
expressed in JavaScript object notation (JSON), the AI Engine compiler does the resource mapping and scheduling
analysis and maps kernel nodes in the graph to the processing cores in the
AI Engine array and data windows to
memory banks. The JSON representation is augmented with mapping information.
Each AI Engine also requires a schedule of
all the kernels mapped to it.
The input graph is first partitioned into groups of kernels to be mapped to the same core.
The output of the mapper can also be viewed as a tabular report in the file project_mapping_analysis_report.txt. This reports the mapping of nodes to processing cores and data windows to memory banks. Inter-processor communication is appropriately double-banked as ping-pong buffers.
- The AI Engine compiler
allocates the necessary locks, memory buffers, and DMA channels and descriptors,
and generates routing information for mapping the graph onto the AI Engine array. It synthesizes a main program for
each core that schedules all the kernels on the cores, and implements the
necessary locking mechanism and data copy among buffers. The C program for each
core is compiled using the Synopsys Single Core Compiler to produce loadable ELF
files. The AI Engine compiler also generates
control APIs to control the graph initialization, execution and termination from
the
main
application and a simulator configuration script scsim_config.json. These are all stored within the Work directory under various sub-folders (see Compiling an AI Engine Graph Application for more details). - After the compilation of the AI Engine
graph, the AI Engine compiler writes a
summary of compilation results called
<graph-file-name>.aiecompile_summary
to view in the Vitis analyzer. The summary contains a collection of reports, and diagrams reflecting the state of the AI Engine application implemented in the compiled build. The summary is written to the working directory of the AI Engine compiler as specified by the--workdir
option, which defaults to ./Work.To open the AI Engine compiler summary, use the following command:
vitis_analyzer ./Work/graph.aiecompile_summary
- To run the graph, execute the following command (see Simulating an AI Engine Graph Application for more details).
aiesimulator –-pkg-dir=./Work
This starts the SystemC-based simulator with the control program being the
main
application. The graph APIs which are used in the control program configure the AI Engine array including setting up static routing, programming the DMAs, loading the ELF files onto the individual cores, and then initiates AI Engine array execution.At the end of the simulation, the output data is produced in the directory aiesimulator_output and it should match the reference data.
The graph can be loaded at device boot time in hardware or through the host application. Details on deploying the graph in hardware and the flow associated with it is described in detail in Integrating the Application Using the Vitis Tools Flow.