AI Engine Programming

An AI Engine program consists of a Data Flow Graph Specification written in C++. As described in C++ Template Support you can use template classes or functions for writing the AI Engine graph or kernels. The application can be compiled and executed using the AI Engine tool chain. This chapter provides an introduction to writing an AI Engine program.

A complete class reference guide is shown in Adaptive Data Flow Graph Specification Reference. The example that is used in this chapter can be found as a template example in the Vitis™ environment when creating a new AI Engine project.

Prepare the Kernels

Kernels are computation functions that form the fundamental building blocks of the data flow graph specifications. Kernels are declared as ordinary C/C++ functions that return void and can use special data types as arguments (discussed in Window and Streaming Data API). Each kernel should be defined in its own source file. This organization is recommended for reusability and faster compilation. Furthermore, the kernel source files should include all relevant header files to allow for independent compilation. It is recommended that a header file (kernels.h in this documentation) should declare the function prototypes for all kernels used in a graph. An example is shown below.

#ifndef FUNCTION_KERNELS_H
#define FUNCTION_KERNELS_H

void simple(input_window_cint16 * in, output_window_cint16 * out);

#endif

In the example, the #ifndef and #endif are present to ensure that the include file is only included once, which is good C/C++ practice.

Creating a Data Flow Graph (Including Kernels)

This following process describes how to construct data flow graphs in C++.

Define your application graph class in a separate header file (for example project.h). First, add the Adaptive Data Flow (ADF) header (adf.h) and include the kernel function prototypes. The ADF library includes all the required constructs for defining and executing the graphs on AI Engines.
```
#include <adf.h>
#include "kernels.h"
```
Define your graph class by using the objects which are defined in the adf name space. All user graphs are derived from the class graph.
```
include <adf.h>
#include "kernels.h"

using namespace adf;

class simpleGraph : public graph {
private:
  kernel first;
  kernel second;
};
```
This is the beginnings of a graph class definition that declares two kernels (first and second).

Add some top-level ports to the graph.

#include <adf.h>
#include "kernels.h"

using namespace adf;

class simpleGraph : public graph {
private:
  kernel first;
  kernel second;
public:
  input_port in;
  output_port out;

};

Use the kernel::create function to instantiate the first and second C++ kernel objects using the functionality of the C function simple.

#include <adf.h>
#include "kernels.h"

using namespace adf;

class simpleGraph : public graph {
private:
  kernel first;
  kernel second;
public:
  input_port in;
  output_port out;
  simpleGraph() {
      first = kernel::create(simple);
      second = kernel::create(simple);
  }
};

Add the connectivity information, which is equivalent to the nets in a data flow graph. In this description, ports are referenced by indices. The first input window or stream argument in the simple function is assigned index 0 in an array of input ports (in). Subsequent input arguments take ascending consecutive indices. The first output window or stream argument in the simple function is assigned index 0 in an array of output ports (out). Subsequent output arguments take ascending consecutive indices.
```
#include <adf.h>
#include "kernels.h"

using namespace adf;

class simpleGraph : public graph {
private:
  kernel first;
  kernel second;
public:
  input_port in;
  output_port out;

  simpleGraph() : platform(p){
    first = kernel::create(simple);
    second = kernel::create(simple);
    connect< window<128> > net0 (in, first.in[0]);
    connect< window<128> > net1 (first.out[0], second.in[0]);
    connect< window<128> > net2 (second.out[0], out);
  }
};
```
As shown, the input port from the top level is connected into the input port of the first kernel, the output port of the first kernel is connected to the input port of the second kernel, and the output port of the second kernel is connected to the output exposed to the top level. The first kernel fires when 128 bytes of data (32 complex samples) are collected in a buffer from an external source. This is specified as a window parameter at the connection net0. Likewise, the second kernel fires when its input window has valid data being produced as the output of the first kernel expressed via connection net1. Finally, the output of the second kernel is connected to the top level output port as connection net2, specifying that upon termination the second kernel will produce 128 bytes of data.
Set the source file and tile usage for each of the kernels. The source file kernel.cc contains kernel first and kernel second source code. Then the ratio of the function run time compared to the cycle budget, known as the run-time ratio, and must be between 0 and 1. The cycle budget is the number of instruction cycles a function can take to either consume data from its input (when dealing with a rate limited input data stream), or to produce a block of data on its output (when dealing with a rate limited output data stream). This cycle budget can be affected by changing the block sizes.
```
#include <adf.h>
#include "kernels.h"

using namespace adf;

class simpleGraph : public graph {
private:
  kernel first;
  kernel second;
public:
  input_port in;
  output_port out;
  simpleGraph(){
    
    first = kernel::create(simple);
    second = kernel::create(simple);
    connect< window<128> > net0 (in, first.in[0]);
    connect< window<128> > net1 (first.out[0], second.in[0]);
    connect< window<128> > net2 (second.out[0], out);

    source(first) = "kernels.cc";
    source(second) = "kernels.cc";

    runtime<ratio>(first) = 0.1;
    runtime<ratio>(second) = 0.1;

  }
};
```

Define a top-level application file (for example project.cpp) that contains an instance of your graph class and connect the graph to a simulation platform to provide file input and output. In this example, these files are called input.txt and output.txt.

#include "project.h"

simpleGraph mygraph;
simulation::platform<1,1> platform(“input.txt”,”output.txt”);
connect<> net0(platform.src[0], mygraph.in);
connect<> net1(mygraph.out, platform.sink[0]);

int main(void) {
  adf::return_code ret;
  mygraph.init();
  ret=mygraph.run(<number_of_iterations>);
  if(ret!=adf::ok){
    printf("Run failed\n");
    return ret;
  }
  ret=mygraph.end();
  if(ret!=adf::ok){
    printf("End failed\n");
    return ret;
  }
  return 0;
}

IMPORTANT: By default, the mygraph.run() option specifies a graph that runs forever. The AI Engine compiler generates code to execute the data flow graph in a perpetual While loop. To limit the execution of the graph for debugging and test, specify the mygraph.run(<number_of_iterations>) in the graph code. The specified number of iterations can be one or more.

ADF APIs have return enumerate type return_code to show the API running status.

The main program is the driver for the graph. It is used to load, execute, and terminate the graph. See Run-Time Graph Control API for more details.

Recommended Project Directory Structure

The following directory structure and coding practices are recommended for organizing your AI Engine projects to provide clarity and reuse.

All adaptive data flow (ADF) graph class definitions, that is, all the ADF graphs that are a subclass of graph, must be located in a header file. Multiple ADF graph definitions can be included in the same header file. This class header file should be included in the main application file where the actual top-level graph is declared in the file scope (see Creating a Data Flow Graph (Including Kernels)).
There should be no dependencies on the order that the header files are included. All header files must be self-contained and include all the other header files that it needs.
There should be no file scoped variable or data-structure definitions in the graph header files. Any definitions (including static) must be declared in a separate source file that can be identified in the header property of the kernel where they are referenced (see Look-up Tables).
There is no need to declare the kernels under extern "C" {...}. However, this declaration can be used in an application meant to run full-program simulation, but it must adhere to the following conditions:
- If the kernel-function declaration is wrapped with extern "C", then the definition must know about it. This can be done by either including the header file inside the definition file, or wrapping the definition with extern "C".
- The extern "C" must be wrapped with #ifdef __cplusplus. This is synonymous to how extern "C" is used in stdio.h.

Compiling and Running the Graph from the Command Line

To compile your graph, execute the following command (see Compiling an AI Engine Graph Application for more details).
```
aiecompiler  project.cpp
```
The program is called project.cpp. The AI Engine compiler reads the input graph specified, compiles it to the AI Engine array, produces various reports, and generates output files in the Work directory.
After parsing the C++ input into a graphical intermediate form expressed in JavaScript object notation (JSON), the AI Engine compiler does the resource mapping and scheduling analysis and maps kernel nodes in the graph to the processing cores in the AI Engine array and data windows to memory banks. The JSON representation is augmented with mapping information. Each AI Engine also requires a schedule of all the kernels mapped to it.
The input graph is first partitioned into groups of kernels to be mapped to the same core.
The output of the mapper can also be viewed as a tabular report in the file project_mapping_analysis_report.txt. This reports the mapping of nodes to processing cores and data windows to memory banks. Inter-processor communication is appropriately double-banked as ping-pong buffers.
The AI Engine compiler allocates the necessary locks, memory buffers, and DMA channels and descriptors, and generates routing information for mapping the graph onto the AI Engine array. It synthesizes a main program for each core that schedules all the kernels on the cores, and implements the necessary locking mechanism and data copy among buffers. The C program for each core is compiled using the Synopsys Single Core Compiler to produce loadable ELF files. The AI Engine compiler also generates control APIs to control the graph initialization, execution and termination from the main application and a simulator configuration script scsim_config.json. These are all stored within the Work directory under various sub-folders (see Compiling an AI Engine Graph Application for more details).
After the compilation of the AI Engine graph, the AI Engine compiler writes a summary of compilation results called <graph-file-name>.aiecompile_summary to view in the Vitis analyzer. The summary contains a collection of reports, and diagrams reflecting the state of the AI Engine application implemented in the compiled build. The summary is written to the working directory of the AI Engine compiler as specified by the --workdir option, which defaults to ./Work.
To open the AI Engine compiler summary, use the following command:
```
vitis_analyzer ./Work/graph.aiecompile_summary
```
To run the graph, execute the following command (see Simulating an AI Engine Graph Application for more details).
```
aiesimulator –-pkg-dir=./Work
```
This starts the System C-based simulator with the control program being the main application. The graph APIs which are used in the control program configure the AI Engine array including setting up static routing, programming the DMAs, loading the ELF files onto the individual cores, and then initiates AI Engine array execution.
At the end of the simulation, the output data is produced in the directory aiesimulator_output and it should match the reference data.