Run-Time Graph Control API

This chapter describes the control APIs that can be used to initialize, run, update, and control the graph execution from an external controller. This chapter also describes how run-time parameters can be specified in the input graph specification that affect the data processing within the kernels and change the control flow of the overall graph synchronously or asynchronously.

Graph Execution Control

In Versal™ ACAPs with AI Engines, the processing system (PS) can be used to dynamically load, monitor, and control the graphs that are executing on the AI Engine array. Even if the AI Engine graph is loaded once as a single bitstream image, the PS program can be used to monitor the state of the execution and modify the run-time parameters of the graph.

The graph base class provides a number of API methods to control the initialization and execution of the graph that can be used in the main program. See Adaptive Data Flow Graph Specification Reference for more details.

Basic Iterative Graph Execution

The following graph control API illustrates how to use graph APIs to initialize, run, wait, and terminate graphs for a specific number of iterations. A graph object mygraph is declared using a pre-defined graph class called simpleGraph. Then, in the main application, this graph object is initialized and run. The init() method loads the graph to the AI Engine array at prespecified AI Engine tiles. This includes loading the ELF binaries for each AI Engine, configuring the stream switches for routing, and configuring the DMAs for I/O. It leaves the processors in a disabled state. The run() method starts the graph execution by enabling the processors. The run API is where a specific number of iterations of the graph can be run by supplying a positive integer argument at run time. This form is useful for debugging your graph execution.

#include "project.h" 
simpleGraph mygraph; 

int main(void) { 
  mygraph.init(); 
  mygraph.run(3); // run 3 iterations 
  mygraph.wait(); // wait for 3 iterations to finish 
  mygraph.run(10); // run 10 iterations 
  mygraph.end(); // wait for 10 iterations to finish 
  return 0; 
}

Note: The API wait() is used to wait for the first run to finish before starting the second run. wait has the same blocking effect as end except that it allows re-running the graph again without having to re-initialize it. Calling run back-to-back without an intervening wait to finish that run can have an unpredictable effect because the run API modifies the loop bounds of the active processors of the graph.

Finite Execution of Graph

For finite graph execution, the graph state is maintained across the graph.run(n). The AI Engine is not reinitialized and memory contents are not cleared after graph.run(n). In the following code example, after the first run of three invocations, the core-main wrapper code is left in a state where the kernel will start with the pong buffer in the next run (of ten iterations). The ping-pong buffer selector state is left as-is. graph.end() does not clean up the graph state (specifically, does not re-initialize global variables), nor clean up stream switch configurations. It merely exits the core-main. To re-run the graph, you have to reload the PDI/XCLBIN.

#include "project.h" 
simpleGraph mygraph; 

int main(void) { 
  mygraph.init(); 
  mygraph.run(3); // run 3 iterations 
  mygraph.wait(); // wait for 3 iterations to finish 
  mygraph.run(10); // run 10 iterations 
  mygraph.end(); // wait for 10 iterations to finish 
  return 0; 
}

Infinite Graph Execution

The following graph control API illustrates how to run the graph infinitely.

#include "project.h"
simpleGraph mygraph;

int main(void) {
  mygraph.init();  // load the graph
  mygraph.run();   // start the graph
  return 0;
}

A graph object mygraph is declared using a pre-defined graph class called simpleGraph. Then, in the main application, this graph object is initialized and run. The init() method loads the graph to the AI Engine array at prespecified AI Engine tiles. This includes loading the ELF binaries for each AI Engine, configuring the stream switches for routing, and configuring the DMAs for I/O. It leaves the processors in a disabled state. The run() method starts the graph execution by enabling the processors. This graph runs forever because the number of iterations to be run is not provided to the run() method.

Note: graph::run() without an argument runs the AI Engine kernels for a previously specified number of iterations (which is infinity by default if the graph is run without any arguments). If the graph is run with a finite number of iterations, for example, mygraph.run(3);mygraph.run() the second run call will also run for three iterations.

Parallel Graph Execution

Among the above API methods, only the wait() and end() methods are blocking operations that can block the main application indefinitely. Therefore, if you declare multiple graphs at the top level, you need to interleave the APIs suitably to execute the graphs in parallel, as shown.

#include "project.h"
simpleGraph g1, g2, g3;

int main(void) {
  g1.init(); g2.init(); g3.init();
  g1.run(<num-iter>); g2.run(<num-iter>); g3.run(<num-iter>);
  g1.end(); g2.end(); g3.end();
  return 0;
}

Note: Each graph should be started (run) only after it has been initialized (init). Also, to get parallel execution, all the graphs must be started (run) before any graph is waited upon for termination (end).

Timed Execution

In multi-rate graphs, all kernels need not execute for the same number of iterations. In such situations, a timed execution model is more suitable for testing. There are variants of the wait and end APIs with a positive integer that specifies a cycle timeout. This is the number of AI Engine cycles that the API call will block before disabling the processors and returning. The blocking condition does not depend on any graph termination event. The graph can be in an arbitrary state at the expiration of the timeout.

#include "project.h"
simpleGraph mygraph;

int main(void) {
  mygraph.init();
  mygraph.run();
  mygraph.wait(10000);  // wait for 10000 AI Engine cycles
  mygraph.resume();     // continue executing 
  mygraph.end(15000);   // wait for another 15000 cycles and terminate
}

Note: The API resume() is used to resume execution from the point it was stopped after the first timeout. resume only resets the timer and enables the AI Engines. Calling resume after the AI Engine execution has already terminated will have no effect.

Run-Time Parameter Specification

The data flow graphs shown until now are defined completely statically. However, in real situations you might need to modify the behavior of the graph based on some dynamic condition or event. The required modification could be in the data being processed, for example a modified mode of operation or a new coefficient table, or it could be in the control flow of the graph such as conditional execution or dynamically reconfiguring a graph with another graph. Run-time parameters are useful in such situations. Either the kernels or the graphs can be defined to execute with parameters. Additional graph API are also provided to update or read these parameter values while the graph is running.

Two types of run-time parameters are supported. The first is the asynchronous or sticky parameters which can be changed at any time by either a controlling processor such as the Processing System (PS), or by another AI Engine kernel. They are read each time a kernel is invoked without any specific synchronization. These types of parameters can be used as filter coefficients that change infrequently, for example.

Synchronous or triggering parameters are the other type of supported run-time parameters. A kernel that requires a triggering parameter does not execute until these parameters have been written by a controlling processor. Upon a write, the kernel executes once, reading the new updated value. After completion, the kernel is blocked from executing until the parameter is updated again. This allows a different type of execution model from the normal streaming model, which can be useful for certain updating operations where blocking synchronization is important.

Run-time parameters can either be scalar values or array values. In the case where a controlling processor (such as the PS) is responsible for the update, the graph.update() API should be used.

Specifying Run-Time Data Parameters

Parameter Inference

If an integer scalar value appears in the formal arguments of a kernel function, then that parameter becomes a run-time parameter. In the following example, the argument select is a run-time parameter.

#ifndef FUNCTION_KERNELS_H
#define FUNCTION_KERNELS_H
 
  void simple_param(input_window_cint16 * in, output_window_cint16 * out, int select);

#endif

Run-time parameters are processed as ports alongside those created by streams and windows. Both scalar and array of scalar data types can be passed as run-time parameters. The valid scalar data types supported are

int8, int16, int32, int64, uint8, uint16, uint32, uint64,
					cint16, cint32, float, cfloat

Note: Structs and pointers cannot be passed as run-time parameters. Every AI Engine has a different memory space and pointers have an inconsistent meaning when passed between AI Engines. Furthermore, the PS and AI Engines have different pointer representations. Because structs can contain pointers as members, the passing of structs as run-time parameters is not supported.

In the following example, an array of 32 integers is passed as a parameter into the filter_with_array_param function.

#ifndef FUNCTION_KERNELS_H
#define FUNCTION_KERNELS_H

  void filter_with_array_param(input_window_cint16 * in, output_window_cint16 * out, const int32 (&coefficients)[32]);

#endif

Implicit ports are inferred for each parameter in the function argument, including the array parameters. The following table describes the type of port inferred for each function argument.

Table 1. Port Type per Parameter
Formal Parameter	Port Class
T	Input
const T	Input
T &	Inout
const T &	Input
const T (&)[ …]	Input
T(&)[…]	Inout

From the table, you can see that when the AI Engine cannot make externally visible changes to the function parameter, an input port is inferred. When the formal parameter is passed by value, a copy is made, so changes to that copy are not externally visible. When a parameter is passed with a const qualifier, the parameter cannot be written, so these are also treated as input ports.

When the AI Engine kernel is passed a parameter reference and it is able to modify it, an inout port is inferred and can be used to pass parameters between AI Engine kernels or to allow reading back of results from the control processor.

Note: The inout port is a port that a kernel itself can read or write. But, from the graph, the inout port can only behave as an output port. Therefore, the inout port can only be connected to the input RTP port, or be read by graph::read(). The inout port cannot be updated by graph::update().

Note: If a kernel wants to accept an input run-time parameter, modify its value, and pass back this modified value via an output run-time parameter, then the variable has to appear twice in the arg list, once as an input and once as an inout, for example,

kernel_function(int32 foo_in, int32
					&foo_out)

Parameter Hookup

Both input and inout run-time parameter ports can be connected to corresponding hierarchical ports in their enclosing graph. This is the mechanism that parameters are exposed for run-time modification. In the following graph, an instance is created of the previously defined simple_param kernel. This kernel has two input ports and one output port. The first argument to appear in the argument list, in[0], is an input window. The second argument is an output window. The third argument is a run-time parameter (it is not a window or stream type) and is inferred as an input parameter, in[1], because it is passed by value.

In the following graph definition, a simple_param kernel is instantiated and windows are connected to in[0] and out[0] (the input and output windows of the kernel). The run-time parameter is connected to the graph input port, select_value.

class parameterGraph : public graph {
private:
  kernel first;
  
public:
  input_port select_value;
  input_port in;
  output_port out;
  parameterGraph() {
    first = kernel::create(simple_param);

    connect< window <32> >(in, first.in[0]);
    connect< window <32> >(first.out[0], out);
    connect<parameter>(select_value, first.in[1]);
  }
};

An array parameter can be hooked up in the same way. The compiler automatically allocates space for the array data so that it is accessible from the processor where this kernel gets mapped.

class arrayParameterGraph : public graph {
private:
  kernel first;
  
public:
  input_port coeffs;
  input_port in;
  output_port out;
  arrayParameterGraph() {
    first = kernel::create(filter_with_array_param);

    connect< window <32> >(in, first.in[0]);
    connect< window <32> >(first.out[0], out);
    connect<parameter>(coeffs, first.in[1]);
  }
};

Input Parameter Synchronization

The default behavior for input run-time parameters ports is triggering behavior. This means that the parameter plays a part in the rules that determine when a kernel could fire. In this graph example, the kernel only fires when three conditions are met:

A valid window of 32 bytes of input data is available
An empty window of 32 bytes is available for the output data
A write to the input parameter takes place

In triggering mode, a single write to the input parameter allows the kernel to fire once, setting the input parameter value on every individual kernel call.

There is an alternative mode to allow input kernels parameters to be set asynchronously. To specify that parameters update asynchronously, use the async modifier when connecting a port.

connect<parameter>(param_port, async(first.in[1]));

When a kernel port is designated as asynchronous, it no longer plays a role in the firing rules for the kernel. When the parameter is written once, the value is observed in subsequent firings. At any time, the PS can write a new value for the run-time parameter. That value is observed on the next and any subsequent kernel firing.

Inout Parameter Synchronization

The default behavior for inout run-time parameters ports is asynchronous behavior. This means that the parameter can be read back by the controlling processor or another kernel, but the producer kernel execution is not affected. For synchronous behavior from the inout parameter where the kernel blocks until the parameter value is read out on each invocation of the kernel, you can use a sync modifier when connecting the inout port to the enclosing graph as follows.

connect<parameter>(sync(first.out[1]), param_port);

Run-Time Parameter Update/Read Mechanisms

This section describes the mechanisms to update or read back the run-time parameters. For these types of applications, it is usually better not to specify an iteration limit at compile time to allow the cores to run freely and monitor the effect of the parameter change.

Parameter Update/Read Using Graph APIs

In default compilation mode, the main application is compiled as a separate control thread which needs to be executed on the PS in parallel with the graph executing on the AI Engine array. The main application can use update and read APIs to access run-time parameters declared within the graphs at any level. This section describes these APIs using examples.

Synchronous Update/Read

The following code shows the main application of the simple_param graph described in Specifying Run-Time Data Parameters.

#include "param.h"
parameterGraph mygraph;

int main(void) {
  mygraph.init();
  mygraph.run(2);
  
  mygraph.update(mygraph.select_value, 23);
  mygraph.update(mygraph.select_value, 45);

  mygraph.end();
  return 0;
}

In this example, the graph mygraph is initialized first and then run for two iterations. It has a triggered input parameter port select_value that must be updated with a new value for each invocation of the receiving kernel. The first argument of the update API identifies the port to be updated and the second argument provides the value. Several other forms of update APIs are supported based on the direction of the port, its data type, and whether it is a scalar or array parameter, see Adaptive Data Flow Graph Specification Reference.

If the program is compiled with a fixed number of test iterations, then for triggered parameters the number of update API calls in the main program must match the number of test iterations, otherwise the simulation could be waiting for additional updates. For asynchronous parameters, the updates are done asynchronously with the graph execution and the kernel uses the old value if the update was not made.

If additionally, the previous graph was compiled with a synchronous inout parameter, then the update and read calls must be interleaved as shown in the following example.

#include "param.h"
parameterGraph mygraph;

int main(void) {
  int result0, result1;
  mygraph.init();
  mygraph.run(2);

  mygraph.update(mygraph.select_value, 23);
  mygraph.read(mygraph.result_out, result0);
  mygraph.update(mygraph.select_value, 45);
  mygraph.read(mygraph.result_out, result1);

  mygraph.end();
  return 0;
}

In this example, it is assumed that the graph produces a scalar result every iteration through the input port result_out. The read API is used to read out the value of this port synchronously after each iteration. The first argument of the read API is the graph inout port to be read back and the second argument is the location where the value will be stored (passed by reference).

The synchronous protocol ensures that the read operation will wait for the value to be produced by the graph before sampling it and the graph will wait for the value to be read before proceeding to the next iteration. This is why it is important to interleave the update and read operations.

Asynchronous Update/Read

When an input parameter is specified with asynchronous protocol, the kernel execution waits for the first update to happen for parameter initialization. However, an arbitrary number of kernel invocations can take place before the next update. This is usually the intent of the asynchronous update during application deployment. However, for debugging, wait API can be used to finish a predetermined set of iterations before the next update as shown in the following example.

#include "param.h"
asyncGraph mygraph;

int main(void) {
  int result0, result1;
  mygraph.init();

  mygraph.update(mygraph.select_value, 23);
  mygraph.run(5);
  mygraph.wait();
  mygraph.update(mygraph.select_value, 45);
  mygraph.run(15);
  mygraph.end();
  return 0;
}

In the previous example, after the initial update, five iterations are run to completion followed by another update, then followed by another set of 15 iterations. If the graph has asynchronous inout ports, that data can also be read back immediately after the wait (or end).

Another template for asynchronous updates is to use timeouts in wait API as shown in the following example.

#include "param.h"
asyncGraph mygraph;

int main(void) {
  int result0, result1;
  mygraph.init();
  mygraph.run();
  mygraph.update(mygraph.select_value, 23);
  mygraph.wait(10000);
  mygraph.update(mygraph.select_value, 45);
  mygraph.resume();
  mygraph.end(15000);
  return 0;
}

In this example, the graph is set up to run forever. However, after the run API is called, it still blocks for the first update to happen for parameter initialization. Then, it runs for 10,000 cycles (approximately) before allowing the control thread to make another update. The new update takes effect at the next kernel invocation boundary. Then the graph is allowed to run for another 15,000 cycles before terminating.

Chained Updates Between AI Engine Kernels

The previous run-time parameter examples highlight the ability to do run-time parameter updates from the control processor. It is also possible to propagate parameter updates between AI Engines. If an inout port on a kernel is connected to an input port on another kernel, then a chain of updates can be triggered through multiple AI Engines. Consider the two kernels defined in the following code. The producer has an input port that reads a scalar integer and an inout port that can read and write to an array of 32 integers. The consumer has an input port that can read an array of coefficients, and an output port that can write a window of data.

#ifndef FUNCTION_KERNELS_H
#define FUNCTION_KERNELS_H
 
  void producer(const int32 &, int32 (&)[32] );
  void consumer(const int32 (&)[32], output_window_cint16 *);

#endif

As shown in the following graph, the PS updates the scalar input of the producer kernel. When the producer kernel is run, it automatically triggers execution of the consumer kernel (when a buffer is available for the output data).

#include <adf.h>
#include "kernels.h"

using namespace adf;

class chainedGraph : public graph {
private:
  kernel first;
  kernel second;
  
public:
  input_port select_value; 
  output_port out;

  chainedGraph() {
      first =  kernel::create(producer);
      second = kernel::create(consumer);

      connect< window <32> >(second.out[0], out);
      connect<parameter>(select_value, first.in[0]);
      connect<parameter>(first.inout[0], second.in[0]);
  }
};

If the intention is to make a one-time update of values that are used in continuous processing of streams, the consumer parameter input port can use the async modifier to ensure it runs continuously (when a parameter is provided).

Run-Time Graph Reconfiguration Using Control Parameters

The run-time parameters are also used to switch the flow of data within the graph and provide alternative routes for processing dynamically. The most basic version of this type of processing is a kernel bypass that allows the data to be processed or pass-through based on a run-time parameter (see Kernel Bypass). This can be useful, for example, in multi-modal applications where switching from one mode to another requires bypassing a kernel.

Bypass Control Using Run-Time Parameters

The following figure shows an application supporting two channels of signal data, where one is split into two channels of lower bandwidth while the other must continue to run undisturbed. This type of dynamic reconfiguration is common in wireless applications.

Figure 1: Dynamic Reconfiguration of 2 LTE20 Channels into 1 LTE20 and 2 LTE10 Channels

In the figure, the first channel processes LTE20 data unchanged, while the middle channel is dynamically split into two LTE10 channels. The control parameters marked as carrier configuration RTP are used to split the data processing on a block boundary. When the middle channel is operating as an LTE20 channel, the 11-tap half-band kernel is bypassed. However, when the bandwidth of the middle channel is split between itself and the third channel forming two LTE10 channels, both of them need a 3-stage filter chain before the data can be mixed together. This is achieved by switching the 11-tap half-band filter back into the flow and reconfiguring the mixer to handle three streams of data instead of two.

TIP: The delay alignment kernels are needed to balance the sample delay when mixing LTE20 and LTE10 signals and must also be part of the control flow due to dynamic switching between the two.

The top-level input graph specification for the above application is shown in the following code.

class lte_reconfig : public graph {
  private:
    kernel demux;
    kernel cf[3];
    kernel interp0[3];
    kernel interp1[2];
    bypass bphb11;
    kernel delay ;
    kernel delay_byp ;
    bypass bpdelay ;
    kernel mixer ;
  public:
    input_port in;
    input_port  fromPS;
    output_port out ;

    lte_reconfig() {
      // demux also handles the control
      demux = kernel::create(demultiplexor);
      connect< window<1536> >(in, demux.in[0]);
      connect< parameter >(fromPS, demux.in[1]);

      runtime<ratio>(demux) = 0.1;
      source(demux) = "kernels/demux.cc";

      // instantiate all channel kernels
      for (int i=0;i<3;i++) {
        cf[i] = kernel::create(fir_89t_sym);
        source(cf[i]) = "kernels/fir_89t_sym.cc";
        runtime<ratio>(cf[i]) = 0.12;
      }
      for (int i=0;i<3;i++) {
        interp0[i] = kernel::create(fir_23t_sym_hb_2i);
        source(interp0[i]) = "kernels/hb23_2i.cc";
        runtime<ratio>(interp0[i]) = 0.1;
      }
      for (int i=0;i<2;i++) {
        interp1[i] = kernel::create(fir_11t_sym_hb_2i);
        source(interp1[i]) = "kernels/hb11_2i.cc";
        runtime<ratio>(interp1[i]) = 0.1;
      }
      bphb11 = bypass::create(interp1[0]);
      mixer =  kernel::create(mixer_dynamic);
      source(mixer) = "kernels/mixer_dynamic.cc";
      runtime<ratio>(mixer) = 0.4;
      delay =  kernel::create(sample_delay);
      source(delay) = "kernels/delay.cc";
      runtime<ratio>(delay) = 0.1;
      delay_byp =  kernel::create(sample_delay);
      source(delay_byp) = "kernels/delay.cc";
      runtime<ratio>(delay_byp) = 0.1;
      bpdelay = bypass::create(delay_byp) ;

      // Graph connections
      for (int i=0; i<3; i++) {
        connect< window<512, 352> >(demux.out[i], cf[i].in[0]);
        connect< parameter >(demux.inout[i], cf[i].in[1]);
      }
      connect< parameter >(demux.inout[3], bphb11.bp);
      connect< parameter >(demux.inout[3], negate(bpdelay.bp)) ;
      for (int i=0;i<3;i++) {
        connect< window<512, 64> >(cf[i].out[0], interp0[i].in[0]);
        connect< parameter >(cf[i].inout[0], interp0[i].in[1]);
      }
       // chan0 is LTE20 and is output right away
      connect< window<1024, 416> >(interp0[0].out[0], delay.in[0]);
      connect< window<1024> >(delay.out[0], mixer.in[0]);
      // chan1 is LTE20/10 and uses bypass
      connect< window<1024, 32> >(interp0[1].out[0], bphb11.in[0]);
      connect< parameter >(interp0[1].inout[0], bphb11.in[1]);
      connect< window<1024, 416> >(bphb11.out[0], bpdelay.in[0]);
      connect< window<1024> >(bpdelay.out[0], mixer.in[1]);
      // chan2 is LTE10 always
      connect< window<512, 32> >(interp0[2].out[0], interp1[1].in[0]);
      connect< parameter >(interp0[2].inout[0], interp1[1].in[1]);
      connect< window<1024> >(interp1[1].out[0], mixer.in[2]);
      //Mixer
      connect< parameter >(demux.inout[3], mixer.in[3]);
      connect< window<1024> >(mixer.out[0], out);
    };
};

The bypass specification is coded as a special encapsulator over the kernel to be bypassed. The port signature of the bypass matches the port signature of the kernel that it encapsulates. It also receives a run-time parameter to control the bypass: 0 for no bypass and 1 for bypass. The control can also be inverted by using the negate function as shown.

The bypass parameter port of this graph is an ordinary scalar run-time parameter and can be driven by another kernel or by the Arm® processor using the interactive or scripted mechanisms described in Run-Time Parameter Update/Read Mechanisms. This can also be connected hierarchically by embedding it into an enclosing graph.

Sharing Run-Time Parameters Across Multiple Kernels

The run-time parameter to switch channels in the previous graph is shared by the bypass encapsulator and the mixer kernel. Both entities need to see the same switched value at the same data boundary. When the nodes sharing the run-time parameter are mapped to the same AI Engine, switching of the parameter value is synchronized because each node, mapped to the same processor, processes the current set of data before any node processes the next set of data. However, when the sharing kernels are mapped to different processors, they can execute in a pipelined fashion on different sets of data, as shown in the following figure. Then, the run-time parameter should be pipelined along with the data.

Figure 2: Pipelined Run-Time Parameters

In the current release, you need to pipeline the control parameter through the kernel by making it an inout parameter on the producing kernel connected to an input parameter on the consuming kernel. Pipelining across processors can be intermixed with a one-to-many broadcast connection within a single processor to create arbitrary control topologies.

Run-Time Parameter Support Summary

This section summarizes the AI Engine run-time parameter (RTP) support status. For RTP support for the PL kernel inside the graph, see Run-Time Parameter Support Summary for PL Kernel.

Table 2. AI Engine RTP Default and Support Status
AI Engine RTP (from/to PS)	Input		Output
AI Engine RTP (from/to PS)	Synchronous	Asynchronous	Synchronous	Asynchronous
Scalar	Default	Supported	Supported	Default
Array	Default	Supported	Supported	Default

Code snippets for RTP connections from or to the PS:

connect<parameter>(fromPS, first.in[0]); //Synchronous RTP, default for input
connect<parameter>(fromPS, sync(first.in[0])); //Synchronous RTP
connect<parameter>(fromPS, async(first.in[0])); //Asynchronous RTP
connect<parameter>(second.inout[0], toPS); //Asynchronous RTP, default for output
connect<parameter>(async(second.inout[0]), toPS); //Asynchronous RTP
connect<parameter>(sync(second.inout[0]), toPS); //Synchronous RTP

Table 3. AI Engine RTP to AI Engine RTP Connection Support Status Defaults
AI Engine RTP to AI Engine RTP		To
AI Engine RTP to AI Engine RTP		Synchronous	Asynchronous	Not Specified
From	Synchronous	Synchronous	Not Supported	Synchronous
	Asynchronous	Not Supported	Asynchronous	Asynchronous
	Not Specified	Synchronous	Asynchronous	Synchronous

Code snippets for RTP connections between AI Engines:

connect<parameter>(first.inout[0], second.in[0]); //Not specified for output and input. Synchronous RTP from first.inout to second.in
connect<parameter>(sync(first.inout[0]), second.in[0]); //Specify “sync” for output. Synchronous RTP from first.inout to second.in
connect<parameter>(first.inout[0], sync(second.in[0])); //Specify “sync” for input. Synchronous RTP from first.inout to second.in
connect<parameter>(sync(first.inout[0]), sync(second.in[0])); //Specify “sync” for both. Synchronous RTP from first.inout to second.in
connect<parameter>(async(first.inout[0]), async(second.in[0])); //Specify “async” for both. Asynchronous RTP from first.inout to second.in
connect<parameter>(first.inout[0], async(second.in[0])); //Specify “async” for input. Asynchronous RTP from first.inout to second.in
connect<parameter>(async(first.inout[0]), second.in[0]); //Specify “async” for output. Asynchronous RTP from first.inout to second.in
connect<parameter>(async(first.inout[0]), sync(second.in[0])); //Not supported
connect<parameter>(sync(first.inout[0]), async(second.in[0])); //Not supported

Note: When using AI Engine RTP to AI Engine RTP connections, the source and destination kernels use shared memory for data communication. DMA is not involved in data communication. So, AI Engine kernels with RTP connections must be placed in the same AI Engine tile or adjacent tiles.