Adaptive Data Flow Graph Specification Reference

Unless otherwise stated, all classes and their member functions belong to the adf name space.

Return Code

ADF APIs have defined return codes to indicate success or different kinds of failures in the adf namespace.

enum return_code
  {
    ok = 0,
    user_error,
    aie_driver_error,
    xrt_error,
    internal_error,
    unsupported
  };

The following defines the different return codes:

ok: success
user error: such as invalid argument, use the API in an unsupported way, etc.
aie_driver_error: if AI Engine driver returns errors, graph API returns this error code.
xrt_error: if XRT returns errors, graph APIs returns this error code.
internal error: it means something wrong with the tool, users should contact Xilinx support.
unsupported: unsupported feature or unsupported scenario.

Graph Objects

graph

This is the main graph abstraction exported by the ADF tools. All user-defined graphs should be inherited from class graph.

Scope

All instances of those user-defined graph types that form part of a user design must be declared in global scope, but can be declared under any name space.

Member Functions

virtual return_code init() ;

This method loads and initializes a precompiled graph object onto the AI Engine array using a predetermined set of processor tiles. Currently, no relocation is supported. All existing information in the program memory, data memory, and stream switches belonging to the tiles being loaded is replaced. The loaded processors are left in a disabled state.

virtual return_code run();
virtual return_code run(unsigned int num_iterations);

This method enables the processors associated with a graph to start execution from the beginning of their respective mainprograms. Without any arguments, the graph will run forever. The API with arguments can set the number of iterations for each run differently. This is a non-blocking operation on the PS application.

virtual return_code end();
virtual return_code end(unsigned int cycle_timeout);

The end method is used to wait for the termination of the graph. A graph is considered to be terminated when all its active processors exit their main thread and disable themselves. This is a blocking operation for the PS application. This method also cleans up the state of the graph such as forcing the release of all locks and cleaning up the stream switch configurations used in the graph. The end method with cycle timeout terminates and cleans up the graph when the timeout expires rather than waiting for any graph related event. Attempting to run the graph after end without re-initializing it can give unpredictable results.

virtual return_code wait();
virtual return_code wait(unsigned int cycle_timeout);

virtual return_code resume();

The wait method is used to pause the graph execution temporarily without cleaning up its state so that it can be restarted with a run or resume method. The wait method without arguments is useful when waiting for a previous run with a fixed number of iterations to finish. This can be followed by another run with a new set of iterations. The wait method with cycle timeout pauses the graph execution when the timeout expires counted from a previous run or resume call. This should only be followed by a resume to let the graph to continue to execute. Attempting to run after a wait with cycle timeout call can lead to unpredictable results because the graph can be paused in an unpredictable state and the run can restart the processors from the beginning of their main programs.

virtual return_code update(input_port& pName, <type> value);

virtual return_code update(input_port& pName, const <type>* value, size_t size);

These methods are various forms of run-time parameter update APIs that can be used to update scalar or array run-time parameter ports. The port name is a fully qualified path name such as graph1.graph2.port or graph1.graph2.kernel.port. The <type> can be one of int8,int16,int32,int64,uint8,uint16,uint32,uint64,cint16,cint32,float,cfloat. For array run-time parameter updates, a size argument specifies the number of elements in the array to be updated. This size must match the RTP array size defined in the graph, meaning that the full RTP array must be updated at one time.

virtual return_code read(input_port& pName, <type>& value);

virtual return_code read(input_port& pName, <type>* value, size_t size);

These methods are various forms of run-time parameter read APIs that can be used to read scalar or array run-time parameter ports. The port name is a fully qualified path name such as graph1.graph2.portor graph1.graph2.kernel.port. The <type> can be one of int8,int16,int32,int64,uint8,uint16,uint32,uint64,cint16,cint32,float,cfloat. For array run-time parameter reads, a size argument specifies the number of elements in the array to be read.

kernel

This class represents a single node of the graph. User-defined graph types contain kernel objects as member variables that wrap over some C function computation mapped to the AI Engine array.

Scope

kernel objects can be declared in class scope as member variables in a user-defined graph type (i.e., inside a class that inherits from graph).

kernel objects must be initialized by assignment in the graph constructor.

Member Functions

static kernel & create( function );

The static create method creates a kernel object from a C kernel function. It automatically determines how many input ports and output ports each kernel has and their appropriate element type. Any other arguments in a kernel function are treated as run-time parameters or lookup tables, passed to the kernel on each invocation. Run-time parameters are passed by value, while lookup tables are passed by reference each time the kernel is invoked by the compiler generated static-schedule.

kernel & operator()(…)

Takes one or more parameter objects as arguments. The number of parameter arguments must match the number of non-window formal arguments in the kernel function used to construct the kernel. When used in the body of a graph constructor to assign to a kernel member variable, the operator ensures that updated parameter arguments are passed to the kernel function on every invocation.

Member Variables

std::vector<port<input>> in;

This variable provides access to the logical inputs of a kernel, allowing user graphs to specify connections between kernels in a graph. The i'th index selects the i’th input port (window, stream, or rtp) declared in the kernel function arguments.

std::vector<port<output>> out;

This variable provides access to the logical outputs, allowing user graphs to specify connections between kernels in a graph. The i'th index selects the i’th output port (window or stream) declared in the kernel function arguments.

std::vector<port<inout>> inout;

This variable provides access to the logical inout ports, allowing user graphs to specify connections between kernels in a graph. The i'th index selects the i’th inout port (rtp) declared in the kernel function arguments.

port<T>

Scope

Objects of type port<T> are port objects that can be declared in class scope as member variables of a user-defined graph type (i.e., member variables of a class that inherits from graph), or they are defined implicitly for a kernel according to its function signature. The template parameter T can be one of input, output, or inout.

Aliases

input_port is an alias for the type port<input>.

output_port is an alias for the type port<output>.

inout_port is an alias for the type port<inout>.

Purpose

Used to connect between kernels within a graph and across levels of hierarchy in customer specification containing platform, graphs, and subgraphs.

Operators

port<T>& negate(port<T>&)

When applied to a destination port within a connection, this operator inverts the Boolean semantics of the source port to which it is connected. Therefore, it has the effect of converting a 0 to 1 and 1 to 0.

port<T>& async(port<T>&)

When applied to a destination RTP port within a connection, this operator specifies an asynchronous update of the destination port's RTP buffer from the source port that it is connected to or from the external control application if the source is a graph port left unconnected. Therefore, the receiving kernel does not wait for the value for each invocation, rather it uses previous value stored in the corresponding buffer.

When applied to a source or destination window port, this operator specifies that the window object will not be synchronized upon kernel entry. Instead, the window_acquire and window_release APIs must be used to manage the window object synchronization explicitly within the kernel code.

port<T>& sync(port<T>&)

When applied to a source RTP port within a connection, this operator specifies a synchronous read of the source port's RTP buffer from the destination port that it is connected to or from the external control application if the destination is a graph port left unconnected. Therefore, the receiving kernel waits for a new value to be produced for each invocation of the producing kernel.

parameter

The parameter class contains two static member functions to allow users to associate globally declared variables with kernels.

Member Functions

static parameter & array(X)

Wrap around any extern declaration of an array to capture the size and type of that array variable.

static parameter & scalar(Y)

Wrap around any extern declaration of a scalar value (including user defined structs).

bypass

This class is a control flow encapsulator with data bypass. It wraps around an individual node or subgraph to create a bypass data path based on a dynamic control condition. The dynamic control is coded as a run-time parameter port bp (with integer value 0 or 1) that controls whether the input window (or stream) data will flow into the graph encapsulated by the bypass (bp=0) or whether it will be directly bypassed into the output window (or stream) (bp=1).

Scope

bypass objects can be declared in class scope as member variables in a user-defined graph type (i.e., inside a class that inherits from graph).

bypass objects must be initialized by assignment in the graph constructor.

Member Functions

static bypass & create( kernel );

The static create method creates a bypass object around a given kernel object. The number of inputs and outputs of the bypass are inferred automatically from the corresponding ports of the kernel.

Graph Objects for Packet Processing

The following predefined object classes in the adf namespace are used to define the connectivity of packet streams.

template <int nway> class pktsplit { ... }
template <int nway> class pktmerge { ... }

Scope

Objects of type pktsplit<n> and pktmerge<n> can be declared as member variables in a user-defined graph type (i.e., inside a class that inherits from graph). The template parameter n must be a compile-time constant positive integer denoting the n-way degree of split or merge. These objects behave like ordinary nodes of the graph with input and output connections, but are only used for explicit packet routing.

Member Functions

static pktsplit<nway> & create();
static pktmerge<nway> & create();

The static create method for these classes work in the same way as kernel create method. The degree of split or merge is already specified in the template variable declaration.

Member Variables

std::vector<port<input>> in;

This variable provides access to the logical inputs of the node. There is only one input for pktsplit nodes. For pktmerge nodes the i'th index selects the i’th input port.

std::vector<port<output>> out;

This variable provides access to the logical outputs of the node. There is only one output for pktmerge nodes. For pktsplit nodes the i'th index selects the i’th output port.

Platform Objects

platform<#in,#out>

This templated class abstractly represents the external environment under which a top-level graph object executes. It provides a mechanism to source/sink input/output data that is consumed/produced during graph execution.

Constructor

simulation::platform<#in,#out> (IOAttr* in_0,..., IOAttr* out_0,...);

This platform constructor is provided for software simulation purposes. The template parameters #in and #out are non-negative integers specifying the number of input and output ports supported by this abstract platform object. The constructor takes as many I/O attribute specification arguments as the sum of input and output ports: first all output attributes, then all input attributes. Output platform attributes feed external data to graph inputs, and input platform attributes receive graph output data for external consumption.

An I/O attribute specification is either FileIO, GMIO, or PLIO objects, and is declared separately. A direct std::string argument can also be used to represent a FileIO attribute object.

Member Variables

std::vector<port<output>> src;

This variable provides access to the output attributes of a platform in the form of an output port, allowing connections between platform outputs and graph inputs to be specified. The i'th index selects the i'th output port (window, stream, or RTP) declared in the platform constructor arguments.

std::vector<port<input>> sink;

This variable provides access to the input attributes of a platform in the form of an input port, allowing connections between graph outputs and platform inputs for be specified. The i'th index selects the i'th input port (window, stream, or RTP) declared in the platform constructor arguments.

FileIO

This class represents the I/O port attribute specification used to connect an external file to a graph input or output port for simulation purposes.

Constructor

FileIO(std::string data_file);
FileIO(std::string logical_name, std::string data_file);

The data_file argument is the path name of an external file relative to the application project directory that is opened for input or output during simulation. The logical_name must be the same as the annotation field of the corresponding port as presented in the logical architecture interface specification.

GMIO

This class represents the I/O port attribute specification used to connect graph kernels to the external virtual platform ports representing global memory (DDR) connections.

Constructors

GMIO(const std::string& logical_name, int burst_length, int bandwidth);

This GMIO port attribute specification is used to connect AI Engine kernels with the DDR or connect PL blocks with the DDR. The logical_name is the name of the port as presented in the interface data sheet. The burst_length is the length of the DDR burst transaction (can be 64, 128, or 256 bytes), and the bandwidth is the average expected throughput in MB/s.

Member Functions

static void* malloc(size_t size);

The malloc method allocates contiguous physical memory space and returns the corresponding virtual address. It accepts a parameter, size, to specify how many bytes to be allocated. If successful, the pointer to the allocated memory space is returned. nullptr is returned in the event of a failure.

static void free(void* address);

The free method frees memory space allocated by GMIO::malloc.

return_code gm2aie_nb(const void* address, size_t transaction_size);

The gm2aie_nb method initiates a DDR to AI Engine transfer. The memory space for the transaction is specified by the address pointer and the transaction_size parameter (in bytes). The transaction memory space needs to be within the total memory space allocated by the GMIO::malloc method. This method can only be used by platform source GMIO objects. It is a non-blocking function in that it does not wait for the read transaction to complete.

return_code aie2gm_nb(void* address, size_t transaction_size);

The aie2gm_nb method initiates an AI Engine to DDR transfer. The memory space for the transaction is specified by the address pointer and the transaction_size parameter (in bytes). The transaction memory space needs to be within the total memory space allocated by the GMIO::malloc method. This method can only be used by platform sink GMIO objects. It is a non-blocking function in that it does not wait for the write transaction to complete.

return_code wait();

The wait method blocks until all the previously issued transactions are complete. This method is only applicable for GMIO objects connected to AI Engine.

return_code gm2aie(const void* address, size_t transaction_size);

The gm2aie method is a blocking version of gm2aie_nb. It blocks until the AI Engine–DDR read transaction completes.

return_code aie2gm(void* address, size_t transaction_size);

The aie2gm method is a blocking version of aie2gm_nb. It blocks until the AI Engine–DDR write transaction completes.

return_code pl_gm(void* address, size_t total_size);

The pl_gm method sets the PL m_axi port start address in the AXI4-Lite interface. The start address of the vitual memory space for the PL m_axi port is specified by the address parameter. The total size of the memory to be accessed is specified by size parameter.

In Linux, these GMIO member functions must use PS virtual memory addresses through void* pointers returned by GMIO::malloc in the PS program.

For bare metal, the virtual address and physical address are the same. There is no need to call GMIO::malloc and GMIO::free. But you can call it for consistency.

PLIO

This class represents the I/O port attribute specification used to connect AI Engine kernels to the external platform ports representing programmable logic.

Constructor

PLIO(std::string logical_name, std::string datafile);

The above PLIO port attribute specification is used to represent a single 32-bit input or output AXI4-Stream port at the AI Engine array interface as part of a virtual platform specification. The logical_name must be the same as the annotation field of the corresponding port as presented in the logical architecture interface specification. The datafile is an input or output file path that sources input data or receives output data for simulation purposes. This data could be captured separately during platform design and then replayed here for simulation.

PLIO(std::string logical_name, plio_type pliowidth, std::string datafile);

The above PLIO port attribute specification is used to represent a single 32-bit, 64-bit, or 128-bit input or output AXI4-Stream port at the AI Engine array interface as part of a virtual platform specification. Here the pliowidth can be one of plio_32_bits (default), plio_64_bits, or plio_128_bits.

PLIO(std::string logical_name, plio_type pliowidth, std::string datafile, double frequency);

The above PLIO port attribute specification is used to represent a single 32-bit, 64-bit, or 128-bit input or output AXI4-Stream port at the AI Engine array interface as part of a virtual platform specification. Here pliowidth can be one of plio_32_bits (default), plio_64_bits, or plio_128_bits. The frequency of the PLIO port can also be specified as part of the constructor.

PLIO(std::string logical_name, plio_type pliowidth, std::string datafile, double frequency, bool binary, bool hex);

The above PLIO boolean port attribute specification is used to indicate if the contents in the input data file are in hex or binary formats.

The data in the data files must be organized according to the bus width of the PLIO attribute (32, 64, or 128) per line as well as the data type of the graph port it is connected to. For example, a 64-bit PLIO feeding a kernel port with data type int32 requires file data organized as two columns. However, the same 64-bit PLIO feeding to a kernel port with data type cint16 requires the data to be organized into four columns, each representing a 16-bit real or imaginary part of the complex data type.

Event API

The event API provides functions to configure AI Engine hardware resources for performance profiling and event tracing. In this release, a subset of performance profiling use cases are supported.

Enumeration

enum io_profiling_option
{
    io_total_stream_running_to_idle_cycles,
    io_stream_start_to_bytes_transferred_cycles,
    io_stream_start_difference_cycles,
    io_stream_running_event_count
};

The io_profiling_option contains the enumerated options for performance profiling using PLIO and GMIO objects. The io_total_stream_running_to_idle_cycles option represents the total accumulated clock cycles in between stream running event and stream idle event of the corresponding stream port in the shim tile. This option can be used to profile platform I/O bandwidth.

The io_stream_start_to_bytes_transferred_cycles option represents the clock cycles in between the first stream running event to the event that the specified number of bytes are transferred through the stream port in the shim tile. This option can be used to profile graph throughput.

The io_stream_start_difference_cycles option represents the clock cycles elapsed between the first stream running events of the two platform I/O objects. This option can be used to profile graph latency.

The io_stream_running_event_count option represents the number of stream running events. This option can be used to profile graph throughput during a period of time for streaming applications.

Member Functions

static handle start_profiling(IoAttr& io, io_profiling_option option, uint32 value = 0);

This function configures the performance counters in the AI Engine and starts profiling. io is the platform GMIO or PLIO object. option is one of the io_profiling_option enumerations described above. If the io_stream_start_to_bytes_transferred_cycles option is used, the number of bytes can be specified in the value parameter. This function should be called after graph::init(). It returns a handle to be used by read_profiling and stop_profiling. If the specification is incorrect or there is insufficient hardware resources to perform the profiling, an invalid_handle is returned.

static handle start_profiling(IoAttr& io1, IoAttr& io2, io_profiling_option option, uint32 value = 0);

This function configures the performance counters in the AI Engine and starts profiling for the option represents the number of stream running events. This option can be used toio_stream_start_difference_cycles option. Parameters io1 and io2 specify the two platform I/O objects. This function should be called after graph::init(). It returns a handle to be used by read_profiling and stop_profiling. If the specification is incorrect or there is insufficient hardware resources to perform the profiling, an invalid_handle is returned.

static long long read_profiling(handle h);

This function returns the current performance counter value associated with the handle.

static void stop_profiling(handle h);

This function stops the performance profiling associated with the handle and releases the corresponding hardware resources.

Enumeration

enum kernel_profiling_option
{
	kernel_between_pc_cycles /// Number of accumulated cycles between two specified program counters for a kernel object
};

The kernel_profiling_option contains the enumerated options for performance profiling the numbers of cycles between two program counters of kernel objects.

The kernel_between_pc_cycles option represents the number of accumulated cycles between two specified program counters for a kernel object.

Member Functions

static handle start_profiling(kernel& k, kernel_profiling_option option, uint16 pc1, uint16 pc2);

This function configures performance counters in an AI Engine to record the number of accumulated cycles between two specified program counters (pc1 and pc2) for a kernel object (k).

Connections

The following template object constructors specify different types of connections between ports. Each of them support the appropriate overloading for input/output/inout ports. Specifying the connection object name while creating a connection is optional, but it is recommended for better debugging.

Connection Constructor Templates

template<int blocksize, int overlap>  connect<stream , window<blocksize, overlap> > [name](portA, portB)

Connects a stream port to a windowed buffer port of specified block size and overlap.

template<int blocksize>  connect<stream , window<blocksize> > [name](portA, portB)

Connects a stream port to a windowed buffer port of specified block size and zero overlap.

template<int blocksize>  connect<window<blocksize>, stream> [name](portA, portB)

Connects a windowed buffer port of specified block size to a stream port.

template<> connect<stream> [name](portA, portB)

Connects between two stream ports.

template<> connect<cascade> [name](portA, portB)

Connects between two AI Engine cascade ports.

template<> connect<> [name](portA, portB)

Connects between hierarchical ports between different levels of hierarchy.

template<> connect<parameter> [name](portA, portB)

Connects a parameter port to a kernel port.

template<> connect<> [name](parameter, kernel)

Connects a LUT parameter array object to a kernel.

Port Combinations

The port combinations used in the constructor templates are specified in the following table.

Table 1. Port Combinations
PortA	PortB	Comment
`port <output>`	`port <output>`	Connect a kernel output to a parent graph output
`port <output>`	`port <input>`	Connect a kernel output to a kernel input
`port <output>`	`port <inout>`	Connect an output of a kernel or a subgraph to an inout port of another kernel or a subgraph
`port <input>`	`port <input>`	Connect a graph input to a kernel input
`port <input>`	`port <inout>`	Connect an input of a parent graph to an inout port of a child subgraph or a kernel
`port <inout> >`	`port <input>`	Connect an inout port of a parent graph or a kernel to an input of another kernel or a subgraph
`port <inout>`	`port <output>`	Connect an inout port of a subgraph or a kernel to an output of a parent graph
`parameter&`	`kernel&`	Connect an initialized parameter variable to a kernel ensuring that the compiler allocates space for the variable in the memory around the kernel

Constraints

Constraints are user-defined properties for graph nodes that provide additional information to the compiler.

constraint<T>

This template class is used to build scalar data constraints on kernels, connections, and ports.

Scope

A constraint must appear inside a user graph constructor.

Member Functions

constraint<T> operator=(T)

This overloaded equality operator allows you to assign a value to a scalar constraint.

Constructors

The default constructor is not used. Instead the following special constructors are used with specific meaning.

void fabric<pl>(kernel&)

This constraint allows you to mark a kernel to be implemented on the internal programmable logic.

void fabric<aiengine>(kernel&)

This constraint allows you to mark a kernel to be implemented on the AI Engine (default).

constraint<std::string>& initialization_function(kernel&)

This constraint allows you to set a specific initialization function for each kernel. The constraint expects a string denoting the name of the initialization function. Where multiple kernels are packed on a core, each initialization function packed on the core is called exactly once. No kernel functions are scheduled until all the initialization functions packed on a core are completed.

constraint<float>& runtime<ratio>(kernel&)

This constraint allows you to set a specific core usage fraction for a kernel. This is computed as a ratio of the number of cycles taken by one invocation of a kernel (processing one block of data) to the cycle budget. The cycle budget for an application is typically fixed according to the expected data throughput and the block size being processed.

constraint<std::string>& source(kernel&)

This constraint allows you to specify the source file containing the definition of each kernel function. A source constraint must be specified for each kernel.

constraint<int>& fifo_depth(connect&)=[<depth> | (source_depth, dest_depth)]

This constraint allows you to specify the amount of slack to be inserted on a streaming connection to allow deadlock free execution.

void single_buffer(port<T>&)

This constraint allows you to specify single buffer constraint on a window port. By default, a window port is double buffered.

void initial_value(async_AIE_RTP_port)

This constraint allows you to set the initial value for an asynchronous AI Engine input run-time parameter port. It allows the destination kernel to start asynchronously with the specified initial value. You can set both scalar and array run-time parameters using this constraint.

Example scalar: initial_value(6).

Example array: initial_value({1,2,3})

void pl_axi_lite(pl_kernel) = true | false

This constraint allows you to specify whether a PL kernel uses the AXI4-Lite interface. If this constraint is not specified, the AI Engine compiler uses the --pl-axi-lite compiler option by default.

void pl_frequency(pl_kernel) = freq_in_MHz

This constraint allows you to set the frequency of a PL kernel in MHz.

constraint<int> stack_size(adf::kernel& k);

This constraint allows you to set stack size for individual kernel.

constraint<int> heap_size(adf::kernel& k);

This constraint allows you to set heap size for individual kernel.

constraint< std::vector<T>>

This template class is used to build vector data constraints on kernels, connections, and ports.

Scope

Constraint must appear inside a user graph constructor.

Member Function

constraint<std::vector<T> > operator=(std::vector<T>)

Constraint must appear inside a user graph constructor.

Constructors

The default constructor is not used. Instead the following special constructors are used with specific meaning.

constraint <std::vector<std::string > >& headers (kernel&)

This constraint allows you to specify a set of header files for a kernel that define objects to be shared with other kernels and hence have to be included once in the corresponding main program. The kernel source file would instead include an extern declaration for that object.

Mapping Constraints

The following functions help to build various types of constraints on the physical mapping of the kernels and buffers onto the AI Engine array.

Scope

A constraint must appear inside a user graph constructor.

Kernel Location Constructors

location_constraint tile(int col, int row)

This location constructor points to a specific AI Engine tile located at specified column and row within the AI Engine array. The column and row values are zero based, where the zero'th row is counted from the bottom-most row with a compute processor and the zero'th column is counted from the left-most column. The previously used constructor proc(col,row) is now deprecated.

location_constraint location<kernel> (kernel&)

This constraint provides a handle to the location of a kernel so that it can be constrained to be located on a specific tile or co-located with another kernel using the following assignment operator.

Buffer Location Constructors

location_constraint address(int col, int row, int offset)

This location constructor points to a specific data memory address offset on a specific AI Engine tile. The offset address is relative to that tile starting at zero with a maximum value of 32768 (32K).

location_constraint bank(int col, int row, int bankid)

This location constructor points to a specific data memory bank on a specific AI Engine tile. The bank ID is relative to that tile and can take values 0, 1, 2, 3.

location_constraint offset(int offset_value)

This location constructor specifies data memory address offset. The offset address is between 0 and 32768 (32K) and is relative to a tile allocated by the compiler.

location_constraint location<buffer> (port<T>&)

This location constructor provides a handle to the location of a buffer attached to an input, output, or inout port of a kernel. It can be used to constrain the location of the buffer to a specific address or bank, or to be on the same tile as another kernel, or to be on the same bank as another buffer using the following assignment operator. It is an error to constrain two buffers to the same address. This constructor only applies to window kernel ports.

location_constraint location<stack> (kernel&)

This location constructor provides a handle to the location of the system memory (stack and heap) of the AI Engine where the specified kernel is mapped. This provides a mechanism to constrain the location of the system memory with respect to other buffers used by that kernel.

IMPORTANT: The stack location offset must be in multiples of 32 bytes.

location_constraint location<parameter> (parameter&)

This location constructor provides a handle to the location of the parameter array (for example, a lookup table) declared within a graph.

Bounding Box Constructor

location_constraint bounding_box(int column_min, int row_min, int column_max, int row_max)

This bounding box constructor specifies a rectangular bounding box for a graph to be placed in AI Engine tiles, between columns from column_min to column_max and rows from row_min to row_max. Multiple bounding box location constraints can be used in an initializer list to specify an irregular shape bounding region.

Operator Functions

location_constraint& operator=(location_constraint)

This operator expresses the equality constraint between two location constructors. It allows various types of absolute or relative location constraints to be expressed.

The following example shows how to constrain a kernel to be placed on a specified AI Engine tile.

location<kernel>(k1) = tile(3,2);

The following template shows how to constrain the location of double buffers attached to a port that are to be placed on a specific address or a bankid. At most, two elements should be specified in the initializer list to constrain the location of the double banks. Furthermore, if these buffers are read or written by a DMA engine, then they must be on the same tile.

location<buffer>(port1) = { [address(c,r,o) | bank(c,r,id)] , [address(c,r,o) | bank(c,r,id)] };

The following template shows how to constrain the location of a parameter lookup table or the system memory of a kernel to be placed on a specific address or a bankid.

location<parameter>(param1) = [address(c,r,o) | bank(c,r,id)];

location<stack>(k1) = [address(c,r,o) | bank(c,r,id)];

The following example shows how to constrain two kernels to be placed relatively on the same AI Engine. This forces them to be sequenced in topological order and be able to share memory buffers without synchronization.

location<kernel>(k1) = location<kernel>(k2);

The following example shows how to constrain a buffer, stack, or parameter location to be on the same tile as that of a kernel. This ensures that the buffer, or parameter array can be accessed by the other kernel k1 without requiring a DMA.

location<buffer>(port1) = location<kernel>(k1);

location<stack>(k2) = location<kernel>(k1);

location<parameter>(param1) = location<kernel>(k1);

The following example shows how to constrain a buffer, stack, or parameter location to be on the same bank as that of another buffer, stack, or parameter. When two double buffers are co-located, this constrains both the ping buffers to be on one bank and both the pong buffers to be on another bank.

location<buffer>(port2) = location<buffer>(port1);

location<stack>(k1) = location<buffer>(port1);

location<parameter>(param1) = location<buffer>(port1);

The following example shows how to constraint a graph to be placed within a bounding box or a joint region among multiple bounding boxes.

location<graph>(g1) = bounding_box(1,1,2,2);

location<graph>(g2) = { bounding_box(3,3,4,4), bounding_box(5,5,6,6) };

Non-Equality Function

void not_equal(location_constraint lhs, location_constraint rhs)

This function expresses lhs ≠ rhs for the two location_constraint parameters lhs and rhs. It allows relative non-collocation constraint to be specified. The not_equal buffer constraint only works for single buffers. This constraint should not be used with double buffers.

The following example shows how to specify two kernels, k1 and k2, should not be mapped to the same AI Engine.

not_equal(location<kernel>(k1), location<kernel>(k2));

The following example shows how to specify two buffers, port1 and port2, should not be mapped to the same memory bank.

not_equal(location<buffer>(port1), location<buffer>(port2));

Stamp and Repeat Constraint

The following example shows how to constraint a graph to be placed within a bounding box. In this case, the tx_chain0 graph is the reference and its objects will be placed first and stamped to graph tx_chain1 and tx_chain2. The number of rows for all identical graphs (reference plus stamp-able ones) must be the same, and must begin and end at the same parity of row, meaning if the reference graph's bounding box begins at an even row and ends at an odd row, then all of the stamped graphs must follow the same convention. This limitation occurs because of the mirrored tiles in AI Engine array. In one row the AI Engine is followed by a memory group, and in the next row the memory group is followed by an AI Engine within a tile.

location<graph>(tx_chain0) = bounding_box(0,0,3,3);
location<graph>(tx_chain1) = bounding_box(4,0,7,3);
location<graph>(tx_chain2) = bounding_box(0,4,3,7);

location<graph>(tx_chain1) = stamp(location<graph>(tx_chain0));
location<graph>(tx_chain2) = stamp(location<graph>(tx_chain0));

JSON Constraints

The constraints JSON file can contain one or more of the following sections:

NodeConstraints: Constrain graph nodes, such as kernels
PortConstraints: Constrain kernel ports and params

GlobalConstraints: Specify global constraints, i.e., constraints that are not associated with a specific object

Node Constraints

The NodeConstraints section is used to constrain graph nodes. Constraints are grouped by node, such that one or more constraints can be specified per node.

Syntax

{
  "NodeConstraints": {
    "<node name>": {
      <constraint>,
      <constraint>,
      ...
    }
  }
}
<node name> ::= string
<constraint> ::= tile
               | shim
               | reserved_memory
               | colocated_nodes
               | not_colocated_nodes
               | colocated_reserved_memories
               | not_colocated_reserved_memories

Example

{
  "NodeConstraints": {
    "mygraph.k1": {
      "tile": {
        "column": 2,
        "row": 1
      },
      "reserved_memory": {
        "column": 2,
        "row": 1,
        "bankId": 3,
        "offset": 4128
      }
    },
    "mygraph.k2": {
      "tile": {
        "column": 2,
        "row": 2
      }
    }
  }
}

Node Names

Nodes must be specified by their fully qualified name, for example: <graph name>.<kernel name>.

In the following example, the graph name is myGraph and the kernel name is k1. The fully specified node name is myGraph.k1.

class graph : public adf::graph {
private:
  adf::kernel k1;
public:
  my_graph() { 
    k1 = kernel::create(kernel1);
    source(k1) = "src/kernels/kernel1.cc";
  }
};
graph myGraph;

Anytime this kernel is referenced in the constraints JSON file it must be named myGraph.k1, as shown in the various examples throughout this document.

Tile Constraint

This constrains a kernel to a specific tile located at a specified column and row within the array. The column and row values are zero based, where the zeroth row is counted from the bottom-most compute processor and the zero-th column is counted from the left-most column.

Syntax

"tile": {
  "column": integer,
  "row": integer
}

Example

{
  "NodeConstraints": {
    "mygraph.k1": {
      "tile": {
        "column": 2,
        "row": 1
      }
    }
  }
}

Shim Constraint

This constrains a node (PLIO or GMIO) to a specific AI Engine array interface, which is specified by column and channel. The column and channel are zero based. The channel is optional, and if omitted, the compiler selects the optimal channel.

Note: PLIOs cannot be placed in every column. The availability of columns is device dependent. For example, columns 0-5 cannot be used for PLIO for the xcvc1902-vsva2197-2MP-e-S-es1 device. Refer to the relevant device data sheet for more information.

Syntax

"shim": {
  "column": integer,
  "channel": integer (optional)
}

Example

{
  "NodeConstraints": {
    "plioOut1": {
      "shim": {
        "column": 0,
        "channel": 1
      }
    },
    "plioOut2": {
      "shim": {
        "column": 1
      }
    }
  }
}

Reserved Memory Constraint

This constrains the location of system memory (stack and heap) for a kernel to a specific address on a specific tile. The address can be specified in one of two different ways:

Column, row, bankID and offset, where the tile is specified by column, row, bankID and the offset address is relative to the bankID, starting at zero with a maximum value of 8192 bytes of a bank.
Column, row, and bankId, where the bank ID is relative to the tile and can take values 0, 1, 2, 3.

Syntax

"reserved_memory": <bank_address>
<bank_address> ::= {
  "column": integer,
  "row": integer,
  "bankId": integer,
  "offset": integer
}
<bank_address> ::= {
  "column": integer,
  "row": integer,
  "bankId": integer
}

Example

{
  "NodeConstraints": {
    "mygraph.k1": {
      "reserved_memory": {
        "column": 2,
        "row": 1,
        "bankId": 3, 
        "offset": 4128
      }
    },
    "mygraph.k2": {
      "reserved_memory": {
        "column": 1,
        "row": 1,
        "bankId": 3
      }
    }
  }
}

Colocated Nodes Constraint

The colocated nodes constraint requires two or more kernels to be on the same tile and forces sequencing of the kernels in a topological order. It also allows them to share memory buffers without synchronization.

Syntax

"colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string

Example

{
  "NodeConstraints": {
    "mygraph.k2": {
      "colocated_nodes": ["mygraph.k1"]
    }
  }
}

Not Colocated Nodes Constraint

This constrains two or more kernels to not be on the same tile.

Syntax

"not_colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string

Example

{
  "NodeConstraints": {
    "mygraph.k2": {
      "not_colocated_nodes": ["mygraph.k1"]
    }
  }
}

Colocated Reserved Memories Constraint

This constrains a kernel location to be on the same tile as that of one or more stacks. This ensures that the stacks can be accessed by the kernel without requiring a DMA.

Syntax

"colocated_reserved_memories": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string

Example

{
  "NodeConstraints": {
    "mygraph.k2": {
      "colocated_reserved_memories": ["mygraph.k1"]
    }
  }
}

Not Colocated Reserved Memories Constraint

This constrains a kernel location so that it will not be on the same tile as one or more stacks.

Syntax

"not_colocated_reserved_memories": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string

Example

{
  "NodeConstraints": {
    "mygraph.k2": {
      "not_colocated_reserved_memories": ["mygraph.k1"]
    }
  }
}

Port Constraints

Port constraints are specified in the PortConstraints section. Constraints are grouped by port, such that one or more constraints can be specified per port.

Syntax

{
  "PortConstraints": {
    "<port name>": {
      <constraint>[,
      <constraint>...]
    }
  }
}
<port name> ::= string
<constraint> ::= buffers
               | colocated_nodes
               | not_colocated_nodes
               | colocated_ports
               | not_colocated_ports
               | exclusive_colocated_ports
               | colocated_reserved_memories
               | not_colocated_reserved_memories

Example

{
  "PortConstraints": {
    "mygraph.k1.in[0]": {
      "colocated_nodes": ["mygraph.k1"]
    },
    "mygraph.k2.in[0]": {
      "colocated_nodes": ["mygraph.k2"]
    },
    "mygraph.p1": {
      "buffers": [{
        "column": 2,
        "row": 1,
        "bankId": 2
      }]
    }
  }
}

Port Names

Ports must be specified by their fully qualified name: <graph name>.<kernel name>.<port name>. In the following example, the graph name is myGraph, the kernel name is k1, and the kernel has two ports named in[0] and out[0] (as specified in kernel1.cc). The fully specified port names are then myGraph.k1.in[0] and myGraph.k1.out[0].

class graph : public adf::graph {
private:
  adf::kernel k1;
public:
  my_graph() { 
    k1 = kernel::create(kernel1);
    source(k1) = "src/kernels/kernel1.cc";
  }
};
graph myGraph;

Anytime either of these ports are referenced in the constraints JSON file, they must be named myGraph.k1.in[0] and myGraph.k1.out[0], as shown in the various examples throughout this document.

Buffers Constraint

This constrains a data buffer to a specific address on a specific tile. The data buffer can be attached to an input, output, or inout port of a kernel or param (e.g., a lookup table). The address can be specified in one of three different ways:

Column, row, and offset, where the tile is specified by column and row and the offset address is relative to the tile, starting at zero with a maximum value of 32768 (32k).
Column, row, and bankId, where the bank ID is relative to the tile and can take values 0, 1, 2, 3.
Offset, that can be between zero and 32768 (32k) and is relative to the tile allocated by the compiler.

Note: One or two buffers can be constrained for a port.

Syntax

"buffers": [<address>, <(optional) address>]
<address> ::= <offset_address> | <bank_address> | <offset_address>
<tile_address> ::= {
  "column": integer,
  "row": integer,
  "offset": integer
}
<bank_address> ::= {
  "column": integer,
  "row": integer,
  "bankId": integer
}
<offset_address> ::= {
  "offset": integer
}

Example

{
  "PortConstraints": {
    "mygraph.k2.out[0]": {
      "buffers": [{
        "column": 2,
        "row": 2,
        "offset": 5632
      }, {
        "column": 2,
        "row": 2,
        "offset": 4608
      }]
    },
    "mygraph.k1.out[0]": {
      "buffers": [{
        "column": 2,
        "row": 3,
        "bankId": 2
      }, {
        "column": 2,
        "row": 3,
        "bankId": 3
      }]
    },
    "mygraph.p1": {
      "buffers": [{
        "offset": 512
      }]
    }
  }
}

Colocated Nodes Constraint

This constrains a port (i.e., the port buffer) location to be on the same tile as that of one or more kernels. This ensures that the data buffer can be accessed by the other kernels without requiring a DMA.

Syntax

"colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k1.in[0]": {
      "colocated_nodes": ["mygraph.k1"]
    },
    "mygraph.k2.in[0]": {
      "colocated_nodes": ["mygraph.k2"]
    }
  }
}

Not Colocated Nodes Constraint

This constrains a port (i.e., the port buffer) location to not be on the same tile as that of one or more kernels.

Syntax

"not_colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k2.in[0]": {
      "not_colocated_nodes": ["mygraph.k1"]
    }
  }
}

Colocated Ports Constraint

This constrains a ports buffer location to be on the same bank as that of one or more other port buffers. When two double buffers are co-located, this constraints both of the ping buffers to be on one bank and both of the pong buffers to be on another bank.

Syntax

"colocated_ports": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k2.in[0]": {
      "colocated_ports": ["mygraph.k2.out[0]"]
    }
  }
}

Not Colocated Ports Constraint

This constrains a port buffer location to not be on the same bank as that of one or more other port buffers.

Syntax

"not_colocated_ports": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k2.in[0]": {
      "not_colocated_ports": ["mygraph.k2.out[0]"]
    }
  }
}

Exclusive Colocated Ports Constraint

This constrains a port buffer location to be exclusively on the same bank as that of one or more other port buffers, meaning that no other port buffers can be on the same bank.

Syntax

"exclusive_colocated_ports": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k2.in[0]": {
      "exclusive_colocated_ports": ["mygraph.k2.out[0]"]
    }
  }
}

Colocated Reserved Memories Constraint

This constrains a ports buffer location to be on the same bank as that of one or more stacks.

Syntax

"colocated_reserved_memories": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k2.in[0]": {
      "colocated_reserved_memories": ["mygraph.k1"]
    }
  }
}

Not Colocated Reserved Memories Constraint

This constrains a ports buffer location to not be on the same bank as that of one or more stacks.

Syntax

"not_colocated_reserved_memories": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string

Example

{
  "PortConstraints": {
    "mygraph.k2.in[0]": {
      "not_colocated_reserved_memories": ["mygraph.k1"]
    }
  }
}

Global Constraints

Global constraints are specified in the GlobalConstraints section.

Syntax

{
  "GlobalConstraints": {
    <constraint>[,
    <constraint>...]
  }
}
<constraint> ::= areaGroup
               | IsomorphicGraphGroup

Example

{
  "GlobalConstraints": {
    "areaGroup": {
      "name": "root_area_group",
      "nodeGroup": ["mygraph.k1", "mygraph.k2"],
      "tileGroup": ["(2,0):(2,3)"],
      "shimGroup": ["0:3"]
    },
    "isomorphicGraphGroup": {
      "name": "isoGroup1",
      "referenceGraph": "clipGraph0",
      "stampedGraphs": ["clipGraph1", "clipGraph2"]
    }
  }
}

AreaGroup Constraint

The areaGroup constraint specifies a range of tile and/or shim locations that a group of one or more nodes can be mapped. The areaGroup constraint can be specified with up to five properties:

nodeGroup: An array of one or more node names (e.g., kernel names)
tileGroup: An array of one or more tile ranges
shimGroup: An array of one or more shim ranges
exclude: A boolean value that when true causes the compiler to not map any resources to the tile and/or shim ranges
issoft: A boolean value that when true allows the router to use routing resources within the tile and/or shim ranges

A tile range is in the form of (column,row):(column,row), where the first tile is the bottom left corner and the second tile the upper right corner. The column and row values are zero based, where the zeroth row is counted from the bottom-most compute processor and the zeroth column is counted from the left-most column.

A shim range is in the form of (column):(column), where the first value is the left-most column and the second value the right-most column. The column is zero based, where the zeroth row is counted from the bottom-most compute processor and the zeroth column is counted from the left-most column. The shim range also allows an optional channel to be specified, e.g., (column,channel):(column,channel).

The AreaGroup is used to exclude a range on the device from being used by the compiler for mapping and optionally routing:

To exclude a range from both mapper and router, omit nodeGroup and set exclude to true.
To exclude a range from mapper, but not router, omit nodeGroup and set exclude to true and issoft to true.

Note: There can be any number of areaGroup constraints in the GlobalConstraints section, as long as each constraint has a unique name.

Syntax

"areaGroup": {
  "name": string,
  "exclude": bool, (*optional)
  "issoft": bool, (*optional)
  "nodeGroup": [<node list>], (*optional)
  "tileGroup": [<tile list>], (*optional)
  "shimGroup": [<shim list>] (*optional)
}

<node list> ::= <node name>[,<node name>...]

<tile array> ::= <tile value>[,<tile value>...]
<tile value> ::= <tile range> | <tile address>
<tile range> ::= "<tile address>[:<tile address>]"
<tile address> ::= (<column>, <row>)

<shim array> ::= <shim value>[,<shim value>...]
<shim value> ::= <shim range> | <shim address>
<shim range> ::= "<shim address>[:<shim address>]"
<shim address> ::= (<column>[,<channel>])

<node name> ::= string
<column> ::= integer
<row> ::= integer
<channel> ::= integer

Example

{
  "GlobalConstraints": {
    "areaGroup": {
      "name": "mygraph_area_group",
      "nodeGroup": ["mygraph.k1", "mygraph.k2"],
      "tileGroup": ["(2,0):(2,3)"],
      "shimGroup": ["0:3"]
    }
  }
}

Example of Exclude

{
  "GlobalConstraints": {
    "areaGroup": {
      "name": "mygraph_excluded_area_group",
      "exclude": true,
      "tileGroup": ["(3,0):(4,3)"],
      "shimGroup": ["3:4"]
    }
  }
}

IsomorphicGraphGroup Constraint

The isomorphicGraphGroup constraint is used to specify isomorphic graphs that are used in the stamp and repeat flow.

Syntax

"isomorphicGraphGroup": {
  "name": string,
   "referenceGraph": <reference graph name>,
   "stampedGraphs": [<stamped graph name list>]
}

Example

  "isomorphicGraphGroup": {
  "name": "isoGroup",
   "referenceGraph": "tx_chain0",
   "stampedGraphs": ["tx_chain1", "tx_chain2", "tx_chain3"]
}

General Description

The stamp and repeat feature of the AI Engine compiler can be used when the same graph has multiple instances that can be constrained to the same geometry in AI Engines. There are two main advantages to using this feature when the same graph is instantiated multiple times.

Small variation in performance: All graphs will have very similar throughput because buffers and kernels are mapped identically with respect to each other. Throughput might not be exactly identical due to differences in routing. However, it will be much closer than when stamping is not used.
Smaller run time of AI Engine compiler: Because the AI Engine compiler only solves a reference graph instead of the entire design, run time required will be significantly less than the default flow.

Capabilities and Limitations

If required, you are allowed to stamp multiple different graphs. For example, if a design contains four instances of a graph called tx_chain and four instances of rx_chain, then both sets of graphs can be independently stamped. This feature is only supported for designs which have one or more sets of isomorphic graphs, with no interaction between the different isomorphic graph sets. All reference and stamped graphs must have area group constraints. You must declare identical size area groups for each instance of the graph that needs to be stamped. All area groups must be non-overlapping. For example:

"areaGroup": {
   "name": "ant0_cores",
   "nodeGroup": ["tx_chain0*"],
   "tileGroup": ["(0,0):(3,3)"]
},
"areaGroup": {
   "name": "ant1_cores",
   "nodeGroup": ["tx_chain1*"],
   "tileGroup": ["(0,4):(3,7)"]
},

Note: The node group must contain all node instances in the graphs to be stamped. Pattern matching can be used as in shown in the previous example.

You must declare an isomorphic graph group in the constraints file that specifies the reference graph and the stamped graphs. For example:

"isomorphicGraphGroup": {
  "name": "isoGroup",
   "referenceGraph": "tx_chain0",
   "stampedGraphs": ["tx_chain1", "tx_chain2"]
}
,

In this case, the tx_chain0 graph is the reference and its objects will be placed first and stamped to graph tx_chain1 and tx_chain2. Area groups must follow these rules for number of rows: the number of rows for all identical graphs (reference + stamp-able ones) must be the same, and must begin and end at the same parity of row, meaning if the reference graph's tileGroup begins at an even row and ends at an odd row, then all of the stamped graphs must follow the same convention. This limitation occurs because of the mirrored tiles in AI Engine array. In one row, the AI Engine is followed by a memory group and in the next row the memory group is followed by an AI Engine within a tile.

Note: To add absolute, co-location, or other constraints to your design, only add constraints to the reference graph and all these constraints will automatically be applied to the stamped graphs.

IMPORTANT: Only top-level graphs can be stamped. You cannot instantiate a single graph at the top level and stamp graphs at a lower level of a hierarchy.