Adaptive Data Flow Graph Specification Reference
Unless otherwise stated, all classes and their member
functions belong to the adf
name space.
Return Code
ADF APIs have defined return codes to indicate success or different kinds
of failures in the adf
namespace.
enum return_code
{
ok = 0,
user_error,
aie_driver_error,
xrt_error,
internal_error,
unsupported
};
The following defines the different return codes:
- ok
- success
- user error
- such as invalid argument, use the API in an unsupported way, etc.
- aie_driver_error
- if AI Engine driver returns errors, graph API returns this error code.
- xrt_error
- if XRT returns errors, graph APIs returns this error code.
- internal error
- it means something wrong with the tool, users should contact Xilinx support.
- unsupported
- unsupported feature or unsupported scenario.
Graph Objects
graph
This is the main graph abstraction exported by the ADF tools. All user-defined graphs should be inherited from class graph
.
Scope
All instances of those user-defined graph types that form part of a user design must be declared in global scope, but can be declared under any name space.
Member Functions
virtual return_code init() ;
This method loads and initializes a precompiled graph object onto the AI Engine array using a predetermined set of processor tiles. Currently, no relocation is supported. All existing information in the program memory, data memory, and stream switches belonging to the tiles being loaded is replaced. The loaded processors are left in a disabled state.
virtual return_code run();
virtual return_code run(unsigned int num_iterations);
This method enables the processors associated with a graph to start execution
from the beginning of their respective main
programs. Without any arguments, the graph will run forever. The API
with arguments can set the number of iterations for each run differently. This is a
non-blocking operation on the PS application.
virtual return_code end();
virtual return_code end(unsigned int cycle_timeout);
The end
method is used to wait for the
termination of the graph. A graph is considered to be terminated when all its active
processors exit their main
thread and disable
themselves. This is a blocking operation for the PS application. This method also
cleans up the state of the graph such as forcing the release of all locks and
cleaning up the stream switch configurations used in the graph. The end
method with cycle timeout terminates and cleans up
the graph when the timeout expires rather than waiting for any graph related event.
Attempting to run
the graph after end
without re-initializing it can give unpredictable
results.
virtual return_code wait();
virtual return_code wait(unsigned int cycle_timeout);
virtual return_code resume();
The wait
method is used to pause the graph
execution temporarily without cleaning up its state so that it can be restarted with
a run
or resume
method. The wait
method without arguments is useful
when waiting for a previous run
with a fixed number
of iterations to finish. This can be followed by another run
with a new set of iterations. The wait
method with cycle timeout pauses the graph execution when the
timeout expires counted from a previous run
or
resume
call. This should only be followed by a
resume
to let the graph to continue to execute.
Attempting to run
after a wait
with cycle timeout call can lead to unpredictable results because
the graph can be paused in an unpredictable state and the run
can restart the processors from the beginning of their main
programs.
virtual return_code update(input_port& pName, <type> value);
virtual return_code update(input_port& pName, const <type>* value, size_t size);
These methods are various forms of run-time parameter update APIs that can be
used to update scalar or array run-time parameter ports. The port name is a fully
qualified path name such as graph1.graph2.port
or
graph1.graph2.kernel.port
. The <type>
can be one of int8,int16,int32,int64,uint8,uint16,uint32,uint64,cint16,cint32,float,cfloat
.
For array run-time parameter updates, a size
argument specifies the number of elements in the array to be updated. This size
must match the RTP array size defined in the
graph, meaning that the full RTP array must be updated at one time.
virtual return_code read(input_port& pName, <type>& value);
virtual return_code read(input_port& pName, <type>* value, size_t size);
These methods are various forms of run-time parameter read APIs that can be
used to read scalar or array run-time parameter ports. The port name is a fully
qualified path name such as graph1.graph2.port
or
graph1.graph2.kernel.port
. The <type>
can be one of int8,int16,int32,int64,uint8,uint16,uint32,uint64,cint16,cint32,float,cfloat
.
For array run-time parameter reads, a size
argument
specifies the number of elements in the array to be read.
kernel
This class represents a single node of the graph. User-defined graph types contain kernel objects as member variables that wrap over some C function computation mapped to the AI Engine array.
Scope
kernel
objects can be
declared in class scope as member variables in a user-defined graph type (i.e.,
inside a class that inherits from graph
).
kernel
objects must be initialized by
assignment in the graph constructor.
Member Functions
static kernel & create( function );
The static create method creates a kernel object from a C kernel function. It automatically determines how many input ports and output ports each kernel has and their appropriate element type. Any other arguments in a kernel function are treated as run-time parameters or lookup tables, passed to the kernel on each invocation. Run-time parameters are passed by value, while lookup tables are passed by reference each time the kernel is invoked by the compiler generated static-schedule.
kernel & operator()(…)
Takes one or more parameter
objects as arguments. The number of parameter
arguments must match the number of non-window formal
arguments in the kernel function used to construct the kernel
. When used in the body of a graph constructor to assign to a
kernel
member variable, the operator ensures
that updated parameter arguments are passed to the kernel function on every
invocation.
Member Variables
std::vector<port<input>> in;
This variable provides access to the logical inputs of a kernel, allowing user graphs to specify connections between kernels in a graph. The i'th index selects the i’th input port (window, stream, or rtp) declared in the kernel function arguments.
std::vector<port<output>> out;
This variable provides access to the logical outputs, allowing user graphs to specify connections between kernels in a graph. The i'th index selects the i’th output port (window or stream) declared in the kernel function arguments.
std::vector<port<inout>> inout;
This variable provides access to the logical inout ports, allowing user graphs to specify connections between kernels in a graph. The i'th index selects the i’th inout port (rtp) declared in the kernel function arguments.
port<T>
Scope
Objects of type port<T>
are port objects that can be declared in class scope as
member variables of a user-defined graph type (i.e., member variables of a class
that inherits from graph
), or they are defined
implicitly for a kernel according to its function signature. The template parameter
T
can be one of input
, output
, or inout
.
Aliases
input_port
is an
alias for the type port<input>
.
output_port
is an alias for the type
port<output>
.
inout_port
is an alias for the type
port<inout>
.
Purpose
Used to connect between kernels within a graph and across levels of hierarchy in customer specification containing platform, graphs, and subgraphs.
Operators
port<T>& negate(port<T>&)
When applied to a destination port within a connection, this operator inverts the Boolean semantics of the source port to which it is connected. Therefore, it has the effect of converting a 0 to 1 and 1 to 0.
port<T>& async(port<T>&)
When applied to a destination RTP port within a connection, this operator specifies an asynchronous update of the destination port's RTP buffer from the source port that it is connected to or from the external control application if the source is a graph port left unconnected. Therefore, the receiving kernel does not wait for the value for each invocation, rather it uses previous value stored in the corresponding buffer.
When applied to a source or destination window
port, this operator specifies that the window object will not be synchronized upon
kernel entry. Instead, the window_acquire
and
window_release
APIs must be used to manage the
window object synchronization explicitly within the kernel code.
port<T>& sync(port<T>&)
When applied to a source RTP port within a connection, this operator specifies a synchronous read of the source port's RTP buffer from the destination port that it is connected to or from the external control application if the destination is a graph port left unconnected. Therefore, the receiving kernel waits for a new value to be produced for each invocation of the producing kernel.
parameter
The parameter
class
contains two static member functions to allow users to associate globally declared
variables with kernels.
Member Functions
static parameter & array(X)
Wrap around any extern declaration of an array to capture the size and type of that array variable.
static parameter & scalar(Y)
Wrap around any extern declaration of a scalar value (including user defined structs).
bypass
This class is a control flow encapsulator with data bypass.
It wraps around an individual node or subgraph to create a bypass data path based on
a dynamic control condition. The dynamic control is coded as a run-time parameter
port bp
(with integer value 0 or 1) that controls
whether the input window (or stream) data will flow into the graph encapsulated by
the bypass (bp=0
) or whether it will be directly
bypassed into the output window (or stream) (bp=1
).
Scope
bypass
objects can be
declared in class scope as member variables in a user-defined graph type (i.e.,
inside a class that inherits from graph
).
bypass
objects must be
initialized by assignment in the graph constructor.
Member Functions
static bypass & create( kernel );
The static create method creates a bypass object around a given kernel object. The number of inputs and outputs of the bypass are inferred automatically from the corresponding ports of the kernel.
Graph Objects for Packet Processing
The following predefined object classes in the adf
namespace are used to define the connectivity of packet
streams.
template <int nway> class pktsplit { ... }
template <int nway> class pktmerge { ... }
Scope
Objects of type pktsplit<n>
and pktmerge<n>
can be declared as member variables in a
user-defined graph type (i.e., inside a class that inherits from graph
). The template parameter n
must be a compile-time constant positive integer denoting the n-way
degree of split or merge. These objects behave like ordinary nodes of the graph with
input and output connections, but are only used for explicit packet routing.
Member Functions
static pktsplit<nway> & create();
static pktmerge<nway> & create();
The static create method for these classes work in
the same way as kernel
create method. The degree of
split or merge is already specified in the template variable declaration.
Member Variables
std::vector<port<input>> in;
This variable provides access to the logical inputs of the
node. There is only one input for pktsplit
nodes.
For pktmerge
nodes the i'th index selects the i’th
input port.
std::vector<port<output>> out;
This variable provides access to the logical outputs of the
node. There is only one output for pktmerge
nodes.
For pktsplit
nodes the i'th index selects the i’th
output port.
Platform Objects
platform<#in,#out>
This templated class abstractly represents the external environment under which a top-level graph object executes. It provides a mechanism to source/sink input/output data that is consumed/produced during graph execution.
Constructor
simulation::platform<#in,#out> (IOAttr* in_0,..., IOAttr* out_0,...);
This platform constructor is provided for software simulation purposes. The template parameters #in and #out are non-negative integers specifying the number of input and output ports supported by this abstract platform object. The constructor takes as many I/O attribute specification arguments as the sum of input and output ports: first all output attributes, then all input attributes. Output platform attributes feed external data to graph inputs, and input platform attributes receive graph output data for external consumption.
An I/O attribute specification is either FileIO
, GMIO
, or PLIO
objects, and is declared separately. A direct
std::string
argument can also be used to
represent a FileIO
attribute object.
Member Variables
std::vector<port<output>> src;
This variable provides access to the output attributes of a platform in the form of an output port, allowing connections between platform outputs and graph inputs to be specified. The i'th index selects the i'th output port (window, stream, or RTP) declared in the platform constructor arguments.
std::vector<port<input>> sink;
This variable provides access to the input attributes of a platform in the form of an input port, allowing connections between graph outputs and platform inputs for be specified. The i'th index selects the i'th input port (window, stream, or RTP) declared in the platform constructor arguments.
FileIO
This class represents the I/O port attribute specification used to connect an external file to a graph input or output port for simulation purposes.
Constructor
FileIO(std::string data_file);
FileIO(std::string logical_name, std::string data_file);
The data_file
argument is
the path name of an external file relative to the application project directory that
is opened for input or output during simulation. The logical_name
must be the same as the annotation field of the
corresponding port as presented in the logical architecture interface
specification.
GMIO
This class represents the I/O port attribute specification used to connect graph kernels to the external virtual platform ports representing global memory (DDR) connections.
Constructors
GMIO(const std::string& logical_name, int burst_length, int bandwidth);
This GMIO port attribute specification is used to connect AI Engine kernels with the DDR or connect PL blocks
with the DDR. The logical_name
is the name of the
port as presented in the interface data sheet. The burst_length
is the length of the DDR burst transaction (can be 64,
128, or 256 bytes), and the bandwidth
is the
average expected throughput in MB/s.
Member Functions
static void* malloc(size_t size);
The malloc
method allocates
contiguous physical memory space and returns the corresponding virtual address. It
accepts a parameter, size
, to specify how many
bytes to be allocated. If successful, the pointer to the allocated memory space is
returned. nullptr
is returned in the event of a
failure.
static void free(void* address);
The free
method frees memory
space allocated by GMIO::malloc
.
return_code gm2aie_nb(const void* address, size_t transaction_size);
The gm2aie_nb
method initiates a
DDR to AI Engine transfer. The memory space for
the transaction is specified by the address
pointer
and the transaction_size
parameter (in bytes). The
transaction memory space needs to be within the total memory space allocated by the
GMIO::malloc
method. This method can only be
used by platform source GMIO objects. It is a
non-blocking function in that it does not wait for the read transaction to
complete.
return_code aie2gm_nb(void* address, size_t transaction_size);
The aie2gm_nb
method initiates an
AI Engine to DDR transfer. The memory space
for the transaction is specified by the address
pointer and the transaction_size
parameter (in
bytes). The transaction memory space needs to be within the total memory space
allocated by the GMIO::malloc
method. This method
can only be used by platform sink GMIO objects.
It is a non-blocking function in that it does not wait for the write transaction to
complete.
return_code wait();
The wait
method blocks until all
the previously issued transactions are complete. This method is only applicable for
GMIO
objects connected to AI Engine.
return_code gm2aie(const void* address, size_t transaction_size);
The gm2aie
method is a blocking
version of gm2aie_nb
. It blocks until the AI Engine–DDR read transaction
completes.
return_code aie2gm(void* address, size_t transaction_size);
The aie2gm
method is a blocking
version of aie2gm_nb
. It blocks until the AI Engine–DDR write transaction
completes.
return_code pl_gm(void* address, size_t total_size);
The pl_gm
method sets the PL
m_axi
port start address in the AXI4-Lite
interface. The start address of the vitual memory space for the PL
m_axi
port is specified by the address
parameter. The total size of the memory to be accessed is
specified by size
parameter. In Linux, these GMIO member functions must use PS virtual memory
addresses through void*
pointers returned by
GMIO::malloc
in the PS program.
For bare metal, the virtual address and physical address are the
same. There is no need to call GMIO::malloc
and
GMIO::free
. But you can call it for consistency.
PLIO
This class represents the I/O port attribute specification used to connect AI Engine kernels to the external platform ports representing programmable logic.
Constructor
PLIO(std::string logical_name, std::string datafile);
The above PLIO port attribute specification is
used to represent a single 32-bit input or output AXI4-Stream port at the AI Engine array interface as part of a virtual platform
specification. The logical_name
must be the same
as the annotation field of the corresponding port as presented in the logical
architecture interface specification. The datafile
is an input or output file path that sources input data or receives output data for
simulation purposes. This data could be captured separately during platform design
and then replayed here for simulation.
PLIO(std::string logical_name, plio_type pliowidth, std::string datafile);
The above PLIO port attribute specification is used to represent a
single 32-bit, 64-bit, or 128-bit input or output AXI4-Stream port at the AI Engine array interface as part of a virtual platform
specification. Here the pliowidth
can be one of
plio_32_bits
(default), plio_64_bits
, or plio_128_bits
.
PLIO(std::string logical_name, plio_type pliowidth, std::string datafile, double frequency);
The above PLIO port attribute specification is used to represent a
single 32-bit, 64-bit, or 128-bit input or output AXI4-Stream port at the AI Engine array interface as part of a virtual platform
specification. Here pliowidth
can be one of
plio_32_bits
(default), plio_64_bits
, or plio_128_bits
. The frequency of the PLIO port can also be specified as
part of the constructor.
PLIO(std::string logical_name, plio_type pliowidth, std::string datafile, double frequency, bool binary, bool hex);
The above PLIO boolean port attribute specification is used to indicate if the contents in the input data file are in hex or binary formats.
The data in the data files must be organized according to the bus
width of the PLIO attribute (32, 64, or 128) per line as well as the data type of
the graph port it is connected to. For example, a 64-bit PLIO feeding a kernel port
with data type int32
requires file data organized
as two columns. However, the same 64-bit PLIO feeding to a kernel port with data
type cint16
requires the data to be organized into
four columns, each representing a 16-bit real or imaginary part of the complex data
type.
Event API
The event API provides functions to configure AI Engine hardware resources for performance profiling and event tracing. In this release, a subset of performance profiling use cases are supported.
Enumeration
enum io_profiling_option
{
io_total_stream_running_to_idle_cycles,
io_stream_start_to_bytes_transferred_cycles,
io_stream_start_difference_cycles,
io_stream_running_event_count
};
The io_profiling_option
contains the
enumerated options for performance profiling using PLIO and GMIO objects. The
io_total_stream_running_to_idle_cycles
option
represents the total accumulated clock cycles in between stream running event and
stream idle event of the corresponding stream port in the shim tile. This option can
be used to profile platform I/O bandwidth.
The io_stream_start_to_bytes_transferred_cycles
option represents the
clock cycles in between the first stream running event to the event that the
specified number of bytes are transferred through the stream port in the shim tile.
This option can be used to profile graph throughput.
The io_stream_start_difference_cycles
option represents the clock cycles
elapsed between the first stream running events of the two platform I/O objects.
This option can be used to profile graph latency.
The io_stream_running_event_count
option represents the number of stream running events. This option can be used to
profile graph throughput during a period of time for streaming applications.
Member Functions
static handle start_profiling(IoAttr& io, io_profiling_option option, uint32 value = 0);
This function configures the performance counters in the AI Engine and starts profiling. io
is the platform GMIO or PLIO object. option
is one of the io_profiling_option
enumerations described above. If the io_stream_start_to_bytes_transferred_cycles
option is
used, the number of bytes can be specified in the value
parameter. This function should be called after graph::init()
. It returns a handle
to be used by read_profiling
and stop_profiling
. If the specification is
incorrect or there is insufficient hardware resources to perform the profiling, an
invalid_handle
is returned.
static handle start_profiling(IoAttr& io1, IoAttr& io2, io_profiling_option option, uint32 value = 0);
This function configures the performance counters in the AI Engine and starts profiling for the option represents the number of stream running events. This
option can be used toio_stream_start_difference_cycles
option.
Parameters io1
and io2
specify the two platform I/O objects. This function should be
called after graph::init()
. It returns a handle
to be used by read_profiling
and stop_profiling
. If
the specification is incorrect or there is insufficient hardware resources to
perform the profiling, an invalid_handle
is
returned.
static long long read_profiling(handle h);
This function returns the current performance counter value
associated with the handle
.
static void stop_profiling(handle h);
This function stops the performance profiling associated with the
handle
and releases the corresponding hardware
resources.
Enumeration
enum kernel_profiling_option
{
kernel_between_pc_cycles /// Number of accumulated cycles between two specified program counters for a kernel object
};
The kernel_profiling_option
contains
the enumerated options for performance profiling the numbers of cycles between two
program counters of kernel objects.
The kernel_between_pc_cycles
option represents the
number of accumulated cycles between two specified program counters for a kernel
object.
Member Functions
static handle start_profiling(kernel& k, kernel_profiling_option option, uint16 pc1, uint16 pc2);
This function configures performance counters in an AI Engine to record the number of accumulated cycles between two specified program counters (pc1 and pc2) for a kernel object (k).
Connections
The following template object constructors specify different types of connections between ports. Each of them support the appropriate overloading for input/output/inout ports. Specifying the connection object name while creating a connection is optional, but it is recommended for better debugging.
Connection Constructor Templates
template<int blocksize, int overlap> connect<stream , window<blocksize, overlap> > [name](portA, portB)
Connects a stream port to a windowed buffer port of specified block size and overlap.
template<int blocksize> connect<stream , window<blocksize> > [name](portA, portB)
Connects a stream port to a windowed buffer port of specified block size and zero overlap.
template<int blocksize> connect<window<blocksize>, stream> [name](portA, portB)
Connects a windowed buffer port of specified block size to a stream port.
template<> connect<stream> [name](portA, portB)
Connects between two stream ports.
template<> connect<cascade> [name](portA, portB)
Connects between two AI Engine cascade ports.
template<> connect<> [name](portA, portB)
Connects between hierarchical ports between different levels of hierarchy.
template<> connect<parameter> [name](portA, portB)
Connects a parameter port to a kernel port.
template<> connect<> [name](parameter, kernel)
Connects a LUT parameter array object to a kernel.
Port Combinations
The port combinations used in the constructor templates are specified in the following table.
PortA | PortB | Comment |
---|---|---|
port
<output> |
port
<output> |
Connect a kernel output to a parent graph output |
port
<output> |
port
<input> |
Connect a kernel output to a kernel input |
port
<output> |
port
<inout> |
Connect an output of a kernel or a subgraph to an inout port of another kernel or a subgraph |
port
<input> |
port
<input> |
Connect a graph input to a kernel input |
port
<input> |
port
<inout> |
Connect an input of a parent graph to an inout port of a child subgraph or a kernel |
port <inout>
> |
port
<input> |
Connect an inout port of a parent graph or a kernel to an input of another kernel or a subgraph |
port
<inout> |
port
<output> |
Connect an inout port of a subgraph or a kernel to an output of a parent graph |
parameter& |
kernel& |
Connect an initialized parameter variable to a kernel ensuring that the compiler allocates space for the variable in the memory around the kernel |
Constraints
Constraints are user-defined properties for graph nodes that provide additional information to the compiler.
constraint<T>
This template class is used to build scalar data constraints on kernels, connections, and ports.
Scope
A constraint must appear inside a user graph constructor.
Member Functions
constraint<T> operator=(T)
This overloaded equality operator allows you to assign a value to a scalar constraint.
Constructors
The default constructor is not used. Instead the following special constructors are used with specific meaning.
void fabric<pl>(kernel&)
This constraint allows you to mark a kernel to be implemented on the internal programmable logic.
void fabric<aiengine>(kernel&)
This constraint allows you to mark a kernel to be implemented on the AI Engine (default).
constraint<std::string>& initialization_function(kernel&)
This constraint allows you to set a specific initialization function for each kernel. The constraint expects a string denoting the name of the initialization function. Where multiple kernels are packed on a core, each initialization function packed on the core is called exactly once. No kernel functions are scheduled until all the initialization functions packed on a core are completed.
constraint<float>& runtime<ratio>(kernel&)
This constraint allows you to set a specific core usage fraction for a kernel. This is computed as a ratio of the number of cycles taken by one invocation of a kernel (processing one block of data) to the cycle budget. The cycle budget for an application is typically fixed according to the expected data throughput and the block size being processed.
constraint<std::string>& source(kernel&)
This constraint allows you to specify the source file containing the definition of each kernel function. A source constraint must be specified for each kernel.
constraint<int>& fifo_depth(connect&)=[<depth> | (source_depth, dest_depth)]
This constraint allows you to specify the amount of slack to be inserted on a streaming connection to allow deadlock free execution.
void single_buffer(port<T>&)
This constraint allows you to specify single buffer constraint on a window port. By default, a window port is double buffered.
void initial_value(async_AIE_RTP_port)
This constraint allows you to set the initial value for an asynchronous AI Engine input run-time parameter port. It allows the destination kernel to start asynchronously with the specified initial value. You can set both scalar and array run-time parameters using this constraint.
Example scalar: initial_value(6).
Example array: initial_value({1,2,3})
void pl_axi_lite(pl_kernel) = true | false
This constraint allows you to specify whether a PL kernel uses the AXI4-Lite interface. If this constraint is not
specified, the AI Engine compiler uses the --pl-axi-lite
compiler option by default.
void pl_frequency(pl_kernel) = freq_in_MHz
This constraint allows you to set the frequency of a PL kernel in MHz.
constraint<int> stack_size(adf::kernel& k);
This constraint allows you to set stack size for individual kernel.
constraint<int> heap_size(adf::kernel& k);
This constraint allows you to set heap size for individual kernel.
constraint< std::vector<T>>
This template class is used to build vector data constraints on kernels, connections, and ports.
Scope
Constraint must appear inside a user graph constructor.
Member Function
constraint<std::vector<T> > operator=(std::vector<T>)
Constraint must appear inside a user graph constructor.
Constructors
The default constructor is not used. Instead the following special constructors are used with specific meaning.
constraint <std::vector<std::string > >& headers (kernel&)
This constraint allows you to specify a set of
header files for a kernel that define objects to be shared with other kernels and
hence have to be included once in the corresponding main
program. The kernel source file would instead include an extern
declaration for that object.
Mapping Constraints
The following functions help to build various types of constraints on the physical mapping of the kernels and buffers onto the AI Engine array.
Scope
A constraint must appear inside a user graph constructor.
Kernel Location Constructors
location_constraint tile(int col, int row)
This location constructor points to a specific AI Engine tile located at specified column
and row within the AI Engine array. The
column and row values are zero based, where the zero'th row is counted from the
bottom-most row with a compute processor and the zero'th column is counted from the
left-most column. The previously used constructor proc(col,row)
is now deprecated.
location_constraint location<kernel> (kernel&)
This constraint provides a handle to the location of a kernel so that it can be constrained to be located on a specific tile or co-located with another kernel using the following assignment operator.
Buffer Location Constructors
location_constraint address(int col, int row, int offset)
This location constructor points to a specific data memory address offset on a specific AI Engine tile. The offset address is relative to that tile starting at zero with a maximum value of 32768 (32K).
location_constraint bank(int col, int row, int bankid)
This location constructor points to a specific data memory bank on a specific AI Engine tile. The bank ID is relative to that tile and can take values 0, 1, 2, 3.
location_constraint offset(int offset_value)
This location constructor specifies data memory address offset. The offset address is between 0 and 32768 (32K) and is relative to a tile allocated by the compiler.
location_constraint location<buffer> (port<T>&)
This location constructor provides a handle to the location of a buffer attached to an input, output, or inout port of a kernel. It can be used to constrain the location of the buffer to a specific address or bank, or to be on the same tile as another kernel, or to be on the same bank as another buffer using the following assignment operator. It is an error to constrain two buffers to the same address. This constructor only applies to window kernel ports.
location_constraint location<stack> (kernel&)
This location constructor provides a handle to the location of the system memory (stack and heap) of the AI Engine where the specified kernel is mapped. This provides a mechanism to constrain the location of the system memory with respect to other buffers used by that kernel.
location_constraint location<parameter> (parameter&)
This location constructor provides a handle to the location of the parameter array (for example, a lookup table) declared within a graph.
Bounding Box Constructor
location_constraint bounding_box(int column_min, int row_min, int column_max, int row_max)
This bounding box constructor specifies a rectangular bounding box
for a graph to be placed in AI Engine
tiles, between columns from column_min
to column_max
and rows from row_min
to row_max
. Multiple
bounding box location constraints can be used in an initializer list to specify an
irregular shape bounding region.
Operator Functions
location_constraint& operator=(location_constraint)
This operator expresses the equality constraint between two location constructors. It allows various types of absolute or relative location constraints to be expressed.
The following example shows how to constrain a kernel to be placed on a specified AI Engine tile.
location<kernel>(k1) = tile(3,2);
The following template shows how to constrain the location of double buffers attached to a port that are to be placed on a specific address or a bankid. At most, two elements should be specified in the initializer list to constrain the location of the double banks. Furthermore, if these buffers are read or written by a DMA engine, then they must be on the same tile.
location<buffer>(port1) = { [address(c,r,o) | bank(c,r,id)] , [address(c,r,o) | bank(c,r,id)] };
The following template shows how to constrain the location of a parameter lookup table or the system memory of a kernel to be placed on a specific address or a bankid.
location<parameter>(param1) = [address(c,r,o) | bank(c,r,id)];
location<stack>(k1) = [address(c,r,o) | bank(c,r,id)];
The following example shows how to constrain two kernels to be placed relatively on the same AI Engine. This forces them to be sequenced in topological order and be able to share memory buffers without synchronization.
location<kernel>(k1) = location<kernel>(k2);
The following example shows how to constrain a buffer, stack, or parameter location to be on the same tile as that of a kernel. This ensures that the buffer, or parameter array can be accessed by the other kernel k1 without requiring a DMA.
location<buffer>(port1) = location<kernel>(k1);
location<stack>(k2) = location<kernel>(k1);
location<parameter>(param1) = location<kernel>(k1);
The following example shows how to constrain a buffer, stack, or parameter location to be on the same bank as that of another buffer, stack, or parameter. When two double buffers are co-located, this constrains both the ping buffers to be on one bank and both the pong buffers to be on another bank.
location<buffer>(port2) = location<buffer>(port1);
location<stack>(k1) = location<buffer>(port1);
location<parameter>(param1) = location<buffer>(port1);
The following example shows how to constraint a graph to be placed within a bounding box or a joint region among multiple bounding boxes.
location<graph>(g1) = bounding_box(1,1,2,2);
location<graph>(g2) = { bounding_box(3,3,4,4), bounding_box(5,5,6,6) };
Non-Equality Function
void not_equal(location_constraint lhs, location_constraint rhs)
This function expresses lhs
≠ rhs
for the two location_constraint
parameters lhs
and rhs
. It allows relative non-collocation
constraint to be specified. The not_equal
buffer constraint only
works for single buffers. This constraint should not be used with double
buffers.
The following example shows how to specify two kernels, k1
and k2
, should not
be mapped to the same AI Engine.
not_equal(location<kernel>(k1), location<kernel>(k2));
The following example shows how to specify two buffers, port1
and port2
,
should not be mapped to the same memory bank.
not_equal(location<buffer>(port1), location<buffer>(port2));
Stamp and Repeat Constraint
The following example shows how to constraint a graph to be placed within a
bounding box. In this case, the tx_chain0
graph is
the reference and its objects will be placed first and stamped to graph tx_chain1
and tx_chain2
. The number of rows for all identical graphs (reference plus
stamp-able ones) must be the same, and must begin and end at the same parity of row,
meaning if the reference graph's bounding box begins at an even row and ends at an
odd row, then all of the stamped graphs must follow the same convention. This
limitation occurs because of the mirrored tiles in AI Engine array. In one row the AI Engine is followed by a memory group, and in the next row the
memory group is followed by an AI Engine within
a tile.
location<graph>(tx_chain0) = bounding_box(0,0,3,3);
location<graph>(tx_chain1) = bounding_box(4,0,7,3);
location<graph>(tx_chain2) = bounding_box(0,4,3,7);
location<graph>(tx_chain1) = stamp(location<graph>(tx_chain0));
location<graph>(tx_chain2) = stamp(location<graph>(tx_chain0));
JSON Constraints
The constraints JSON file can contain one or more of the following sections:
- NodeConstraints
- Constrain graph nodes, such as kernels
- PortConstraints
- Constrain kernel ports and params
- GlobalConstraints
- Specify global constraints, i.e., constraints that are not associated with a specific object
Node Constraints
The NodeConstraints section is used to constrain graph nodes. Constraints are grouped by node, such that one or more constraints can be specified per node.
Syntax
{
"NodeConstraints": {
"<node name>": {
<constraint>,
<constraint>,
...
}
}
}
<node name> ::= string
<constraint> ::= tile
| shim
| reserved_memory
| colocated_nodes
| not_colocated_nodes
| colocated_reserved_memories
| not_colocated_reserved_memories
Example
{
"NodeConstraints": {
"mygraph.k1": {
"tile": {
"column": 2,
"row": 1
},
"reserved_memory": {
"column": 2,
"row": 1,
"bankId": 3,
"offset": 4128
}
},
"mygraph.k2": {
"tile": {
"column": 2,
"row": 2
}
}
}
}
Node Names
Nodes must be specified by their fully qualified name, for example: <graph name>.<kernel name>.
In the following example, the graph name is myGraph and the kernel name is k1. The fully specified node name is myGraph.k1.
class graph : public adf::graph {
private:
adf::kernel k1;
public:
my_graph() {
k1 = kernel::create(kernel1);
source(k1) = "src/kernels/kernel1.cc";
}
};
graph myGraph;
Anytime this kernel is referenced in the constraints JSON file it must be named myGraph.k1, as shown in the various examples throughout this document.
Tile Constraint
This constrains a kernel to a specific tile located at a specified column and row within the array. The column and row values are zero based, where the zeroth row is counted from the bottom-most compute processor and the zero-th column is counted from the left-most column.
Syntax
"tile": {
"column": integer,
"row": integer
}
Example
{
"NodeConstraints": {
"mygraph.k1": {
"tile": {
"column": 2,
"row": 1
}
}
}
}
Shim Constraint
Syntax
"shim": {
"column": integer,
"channel": integer (optional)
}
Example
{
"NodeConstraints": {
"plioOut1": {
"shim": {
"column": 0,
"channel": 1
}
},
"plioOut2": {
"shim": {
"column": 1
}
}
}
}
Reserved Memory Constraint
This constrains the location of system memory (stack and heap) for a kernel to a specific address on a specific tile. The address can be specified in one of two different ways:
- Column, row, bankID and offset, where the tile is specified by column, row, bankID and the offset address is relative to the bankID, starting at zero with a maximum value of 8192 bytes of a bank.
- Column, row, and bankId, where the bank ID is relative to the tile and can take values 0, 1, 2, 3.
Syntax
"reserved_memory": <bank_address>
<bank_address> ::= {
"column": integer,
"row": integer,
"bankId": integer,
"offset": integer
}
<bank_address> ::= {
"column": integer,
"row": integer,
"bankId": integer
}
Example
{
"NodeConstraints": {
"mygraph.k1": {
"reserved_memory": {
"column": 2,
"row": 1,
"bankId": 3,
"offset": 4128
}
},
"mygraph.k2": {
"reserved_memory": {
"column": 1,
"row": 1,
"bankId": 3
}
}
}
}
Colocated Nodes Constraint
The colocated nodes constraint requires two or more kernels to be on the same tile and forces sequencing of the kernels in a topological order. It also allows them to share memory buffers without synchronization.
Syntax
"colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string
Example
{
"NodeConstraints": {
"mygraph.k2": {
"colocated_nodes": ["mygraph.k1"]
}
}
}
Not Colocated Nodes Constraint
This constrains two or more kernels to not be on the same tile.
Syntax
"not_colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string
Example
{
"NodeConstraints": {
"mygraph.k2": {
"not_colocated_nodes": ["mygraph.k1"]
}
}
}
Colocated Reserved Memories Constraint
This constrains a kernel location to be on the same tile as that of one or more stacks. This ensures that the stacks can be accessed by the kernel without requiring a DMA.
Syntax
"colocated_reserved_memories": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string
Example
{
"NodeConstraints": {
"mygraph.k2": {
"colocated_reserved_memories": ["mygraph.k1"]
}
}
}
Not Colocated Reserved Memories Constraint
This constrains a kernel location so that it will not be on the same tile as one or more stacks.
Syntax
"not_colocated_reserved_memories": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string
Example
{
"NodeConstraints": {
"mygraph.k2": {
"not_colocated_reserved_memories": ["mygraph.k1"]
}
}
}
Port Constraints
Port constraints are specified in the PortConstraints section. Constraints are grouped by port, such that one or more constraints can be specified per port.
Syntax
{
"PortConstraints": {
"<port name>": {
<constraint>[,
<constraint>...]
}
}
}
<port name> ::= string
<constraint> ::= buffers
| colocated_nodes
| not_colocated_nodes
| colocated_ports
| not_colocated_ports
| exclusive_colocated_ports
| colocated_reserved_memories
| not_colocated_reserved_memories
Example
{
"PortConstraints": {
"mygraph.k1.in[0]": {
"colocated_nodes": ["mygraph.k1"]
},
"mygraph.k2.in[0]": {
"colocated_nodes": ["mygraph.k2"]
},
"mygraph.p1": {
"buffers": [{
"column": 2,
"row": 1,
"bankId": 2
}]
}
}
}
Port Names
Ports must be specified by their fully qualified name: <graph name>.<kernel name>.<port name>
. In the
following example, the graph name is myGraph, the kernel name is k1, and the kernel has two
ports named in[0] and out[0] (as specified in kernel1.cc). The fully
specified port names are then myGraph.k1.in[0] and
myGraph.k1.out[0].
class graph : public adf::graph {
private:
adf::kernel k1;
public:
my_graph() {
k1 = kernel::create(kernel1);
source(k1) = "src/kernels/kernel1.cc";
}
};
graph myGraph;
Anytime either of these ports are referenced in the constraints JSON file, they must be named myGraph.k1.in[0] and myGraph.k1.out[0], as shown in the various examples throughout this document.
Buffers Constraint
This constrains a data buffer to a specific address on a specific tile. The data buffer can be attached to an input, output, or inout port of a kernel or param (e.g., a lookup table). The address can be specified in one of three different ways:
- Column, row, and offset, where the tile is specified by column and row and the offset address is relative to the tile, starting at zero with a maximum value of 32768 (32k).
- Column, row, and bankId, where the bank ID is relative to the tile and can take values 0, 1, 2, 3.
- Offset, that can be between zero and 32768 (32k) and is relative to the tile allocated by the compiler.
Syntax
"buffers": [<address>, <(optional) address>]
<address> ::= <offset_address> | <bank_address> | <offset_address>
<tile_address> ::= {
"column": integer,
"row": integer,
"offset": integer
}
<bank_address> ::= {
"column": integer,
"row": integer,
"bankId": integer
}
<offset_address> ::= {
"offset": integer
}
Example
{
"PortConstraints": {
"mygraph.k2.out[0]": {
"buffers": [{
"column": 2,
"row": 2,
"offset": 5632
}, {
"column": 2,
"row": 2,
"offset": 4608
}]
},
"mygraph.k1.out[0]": {
"buffers": [{
"column": 2,
"row": 3,
"bankId": 2
}, {
"column": 2,
"row": 3,
"bankId": 3
}]
},
"mygraph.p1": {
"buffers": [{
"offset": 512
}]
}
}
}
Colocated Nodes Constraint
This constrains a port (i.e., the port buffer) location to be on the same tile as that of one or more kernels. This ensures that the data buffer can be accessed by the other kernels without requiring a DMA.
Syntax
"colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string
Example
{
"PortConstraints": {
"mygraph.k1.in[0]": {
"colocated_nodes": ["mygraph.k1"]
},
"mygraph.k2.in[0]": {
"colocated_nodes": ["mygraph.k2"]
}
}
}
Not Colocated Nodes Constraint
This constrains a port (i.e., the port buffer) location to not be on the same tile as that of one or more kernels.
Syntax
"not_colocated_nodes": [<node list>]
<node list> ::= <node name>[,<node name>...]
<node name> ::= string
Example
{
"PortConstraints": {
"mygraph.k2.in[0]": {
"not_colocated_nodes": ["mygraph.k1"]
}
}
}
Colocated Ports Constraint
This constrains a ports buffer location to be on the same bank as that of one or more other port buffers. When two double buffers are co-located, this constraints both of the ping buffers to be on one bank and both of the pong buffers to be on another bank.
Syntax
"colocated_ports": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string
Example
{
"PortConstraints": {
"mygraph.k2.in[0]": {
"colocated_ports": ["mygraph.k2.out[0]"]
}
}
}
Not Colocated Ports Constraint
This constrains a port buffer location to not be on the same bank as that of one or more other port buffers.
Syntax
"not_colocated_ports": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string
Example
{
"PortConstraints": {
"mygraph.k2.in[0]": {
"not_colocated_ports": ["mygraph.k2.out[0]"]
}
}
}
Exclusive Colocated Ports Constraint
This constrains a port buffer location to be exclusively on the same bank as that of one or more other port buffers, meaning that no other port buffers can be on the same bank.
Syntax
"exclusive_colocated_ports": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string
Example
{
"PortConstraints": {
"mygraph.k2.in[0]": {
"exclusive_colocated_ports": ["mygraph.k2.out[0]"]
}
}
}
Colocated Reserved Memories Constraint
This constrains a ports buffer location to be on the same bank as that of one or more stacks.
Syntax
"colocated_reserved_memories": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string
Example
{
"PortConstraints": {
"mygraph.k2.in[0]": {
"colocated_reserved_memories": ["mygraph.k1"]
}
}
}
Not Colocated Reserved Memories Constraint
This constrains a ports buffer location to not be on the same bank as that of one or more stacks.
Syntax
"not_colocated_reserved_memories": [<port list>]
<port list> ::= <port name>[, <port name>...]
<port name> ::= string
Example
{
"PortConstraints": {
"mygraph.k2.in[0]": {
"not_colocated_reserved_memories": ["mygraph.k1"]
}
}
}
Global Constraints
Global constraints are specified in the GlobalConstraints section.
Syntax
{
"GlobalConstraints": {
<constraint>[,
<constraint>...]
}
}
<constraint> ::= areaGroup
| IsomorphicGraphGroup
Example
{
"GlobalConstraints": {
"areaGroup": {
"name": "root_area_group",
"nodeGroup": ["mygraph.k1", "mygraph.k2"],
"tileGroup": ["(2,0):(2,3)"],
"shimGroup": ["0:3"]
},
"isomorphicGraphGroup": {
"name": "isoGroup1",
"referenceGraph": "clipGraph0",
"stampedGraphs": ["clipGraph1", "clipGraph2"]
}
}
}
AreaGroup Constraint
The areaGroup constraint specifies a range of tile and/or shim locations that a group of one or more nodes can be mapped. The areaGroup constraint can be specified with up to five properties:
- nodeGroup
- An array of one or more node names (e.g., kernel names)
- tileGroup
- An array of one or more tile ranges
- shimGroup
- An array of one or more shim ranges
- exclude
- A boolean value that when true causes the compiler to not map any resources to the tile and/or shim ranges
- issoft
- A boolean value that when true allows the router to use routing resources within the tile and/or shim ranges
A tile range is in the form of (column,row):(column,row), where the first tile is the bottom left corner and the second tile the upper right corner. The column and row values are zero based, where the zeroth row is counted from the bottom-most compute processor and the zeroth column is counted from the left-most column.
A shim range is in the form of (column):(column), where the first value is the left-most column and the second value the right-most column. The column is zero based, where the zeroth row is counted from the bottom-most compute processor and the zeroth column is counted from the left-most column. The shim range also allows an optional channel to be specified, e.g., (column,channel):(column,channel).
The AreaGroup is used to exclude a range on the device from being used by the compiler for mapping and optionally routing:
- To exclude a range from both mapper and router, omit nodeGroup and set exclude to true.
- To exclude a range from mapper, but not router, omit nodeGroup and set exclude to true and issoft to true.
Syntax
"areaGroup": {
"name": string,
"exclude": bool, (*optional)
"issoft": bool, (*optional)
"nodeGroup": [<node list>], (*optional)
"tileGroup": [<tile list>], (*optional)
"shimGroup": [<shim list>] (*optional)
}
<node list> ::= <node name>[,<node name>...]
<tile array> ::= <tile value>[,<tile value>...]
<tile value> ::= <tile range> | <tile address>
<tile range> ::= "<tile address>[:<tile address>]"
<tile address> ::= (<column>, <row>)
<shim array> ::= <shim value>[,<shim value>...]
<shim value> ::= <shim range> | <shim address>
<shim range> ::= "<shim address>[:<shim address>]"
<shim address> ::= (<column>[,<channel>])
<node name> ::= string
<column> ::= integer
<row> ::= integer
<channel> ::= integer
Example
{
"GlobalConstraints": {
"areaGroup": {
"name": "mygraph_area_group",
"nodeGroup": ["mygraph.k1", "mygraph.k2"],
"tileGroup": ["(2,0):(2,3)"],
"shimGroup": ["0:3"]
}
}
}
Example of Exclude
{
"GlobalConstraints": {
"areaGroup": {
"name": "mygraph_excluded_area_group",
"exclude": true,
"tileGroup": ["(3,0):(4,3)"],
"shimGroup": ["3:4"]
}
}
}
IsomorphicGraphGroup Constraint
The isomorphicGraphGroup constraint is used to specify isomorphic graphs that are used in the stamp and repeat flow.
Syntax
"isomorphicGraphGroup": {
"name": string,
"referenceGraph": <reference graph name>,
"stampedGraphs": [<stamped graph name list>]
}
Example
"isomorphicGraphGroup": {
"name": "isoGroup",
"referenceGraph": "tx_chain0",
"stampedGraphs": ["tx_chain1", "tx_chain2", "tx_chain3"]
}
General Description
The stamp and repeat feature of the AI Engine compiler can be used when the same graph has multiple instances that can be constrained to the same geometry in AI Engines. There are two main advantages to using this feature when the same graph is instantiated multiple times.
- Small variation in performance
- All graphs will have very similar throughput because buffers and kernels are mapped identically with respect to each other. Throughput might not be exactly identical due to differences in routing. However, it will be much closer than when stamping is not used.
- Smaller run time of AI Engine compiler
- Because the AI Engine compiler only solves a reference graph instead of the entire design, run time required will be significantly less than the default flow.
Capabilities and Limitations
If required, you are allowed to stamp multiple different graphs. For example, if a design contains four instances of a graph called tx_chain and four instances of rx_chain, then both sets of graphs can be independently stamped. This feature is only supported for designs which have one or more sets of isomorphic graphs, with no interaction between the different isomorphic graph sets. All reference and stamped graphs must have area group constraints. You must declare identical size area groups for each instance of the graph that needs to be stamped. All area groups must be non-overlapping. For example:
"areaGroup": {
"name": "ant0_cores",
"nodeGroup": ["tx_chain0*"],
"tileGroup": ["(0,0):(3,3)"]
},
"areaGroup": {
"name": "ant1_cores",
"nodeGroup": ["tx_chain1*"],
"tileGroup": ["(0,4):(3,7)"]
},
You must declare an isomorphic graph group
in the constraints file that specifies the reference graph and the stamped graphs.
For example:
"isomorphicGraphGroup": {
"name": "isoGroup",
"referenceGraph": "tx_chain0",
"stampedGraphs": ["tx_chain1", "tx_chain2"]
}
,
In this case, the tx_chain0
graph is the
reference and its objects will be placed first and stamped to graph tx_chain1
and tx_chain2
. Area groups must follow these rules for number of rows: the
number of rows for all identical graphs (reference + stamp-able ones) must be the
same, and must begin and end at the same parity of row, meaning if the reference graph's
tileGroup begins at an even row and
ends at an odd row, then all of the stamped graphs must follow the same convention.
This limitation occurs because of the mirrored tiles in AI Engine array. In one row, the AI Engine is followed by a memory group and in the next row the memory
group is followed by an AI Engine within a
tile.