Using the Restrict Keyword in AI Engine Kernels
The use of restrict keyword (__restrict
) is
permitted in the AI Engine kernel C++ code. This
appendix highlights Xilinx recommendations for using
the restrict keyword in the context of AI Engine kernel code.
Pointer Aliasing
Pointer aliasing refers to the situation where the same memory location can be
accessed using different pointer names. The strict aliasing rule in C/C++ means that
pointers are assumed not to alias if they point to fundamentally different types.
Aliasing introduces strong constraints on program execution order. The following shows
the aliasing of p
and q
.
The following is an example of pointer aliasing, in which both the pointers
p
and q
point to
the same address. The assembly language code produced by the compiler is shown in the
middle column, and the operations and clock cycles are shown on the right.
By adding the restrict keyword into this code example, the compiler can optimize the resulting assembly language to increase parallelization of the operations in hardware. The following example shows that using the restrict keyword to prevent aliasing uses fewer clock cycles to complete the same operation.
Memory Dependencies
Memory dependencies in the code can limit the kinds of optimizations
attempted by the compiler. For example in the following code, xyz
and pointers p
and q
might be unrelated. However, within the
function code both pointer p
and pointer q
point to same global variable xyz
. The compiler must guarantee the correct execution
under both these conditions. Due to these kinds of memory dependencies the compiler
needs to be conservative and limit optimizations.
Strict Aliasing Rule
The strict aliasing rule dictates that pointers are assumed not to alias if
they point to fundamentally different types, except for char*
and void*
which can alias to any
other data type. This is is shown in the following graphic which shows the object
universes and the associated pointers.
- Pointers are associated with a type universe: U(T)
- T is the template and in the preceding graphic the various templates are
shown, including an
int
universe and afloat
universe; there is also aMyClass
universe per design. Additionally there is achar
universe that includes all universes by default. - Universes do not alias
- Pointer
p
can only point to any address within theint
universe whereas pointerq
can only point to any address within thefloat
universe. Because of this pointerp
and pointerq
cannot be aliased. - Derived pointers point to the original universe
- Pointers derived from a restrict pointer are considered restrict pointers and point to the same restricted memory region. See Derived Pointers.
char*
universe contains all universes- A
char
pointer can point to any variable in all universes.
For two pointers of the same type, as in the following, where both p
and q
are
int
, the compiler is conservative and aliasing is applied,
resulting in loss of performance.
For two pointers of different types, as in the following example, where p
is an int
and q
is float
, the compiler applies the strict aliasing rule
and an undefined behavior occurs if aliasing exists.
Restrict Keyword
The restrict keyword is mainly used in pointer declarations as a type qualifier
for pointers. It does not add any new functionality. It allows you to tell the compiler
about a potential optimization. Using __restrict
with a
pointer informs the compiler that the pointer is the only way to access the object
pointed at, and the compiler does not need to perform any additional checks.
The following is another example with pointers that, by default, have no aliasing.
Apply the restrict keyword for performance improvement. The following example shows no memory dependencies with other pointers.
Restrict Qualification
The C standard provides a specific pointer qualifier, __restrict
, intended to allow more aggressive compiler optimization by
explicitly stating data independence between whatever the pointer references and all
other variables. For example :
int a; // global variable
void foo(int* __restrict p, int* q)
{
for (...) { ... *p += a + *q; ...}
}
Now the analysis of foo
can proceed with
the knowledge that *p
does not denote the same object as
*q
and a
. So,
a
and *q
can now
be loaded once, before the loop.
Currently, the compiler front end does not disambiguate between different
accesses to the same array. So, when updating one element of an array, it assumes that
the complete array has changed value. The __restrict
qualifier can be
used to override this conservative assumption. This is useful when you want to obtain
multiple independent pointers to the same array.
void foo(int A[])
{
int* __restrict rA = A; // force independent access
for (int i = ...)
rA[i] = ... A[i];
}
In this example, the __restrict
qualifier allows software
pipelining of the loop; the next array element can already be loaded, while the previous
one must still be stored. To maximize the impact of the __restrict
qualifier, the compiler front end, by default, inserts a chess_copy
operation in the initializer, as if was written:
int* __restrict rA = chess_copy(A);
This is needed to keep both pointers distinct within the optimizer (for
example, no common subexpression elimination). This behavior can be disabled for the
AI Engine compiler front end by means of the
option -mllvm -chess-implicit-chess_copy=false
. So, the
chess_copy
creates two pointers, while __restrict
informs the compiler not to consider any mutual
dependencies between the stores/loads through these pointers. For
__restrict
pointers having a local scope, the mutual independence
assumption only holds during the lifetime of the __restrict
pointer.
Pointers derived from a __restrict
pointer (such as rA+1
or through pointer intrinsics) keep the restrictness,
that is, they are considered to point to the same restricted memory region.
chess_copy
is available from the Chess Compiler User Manual, which can be
found in the AI Engine
lounge.Undefined Behavior
Using the restrict keyword improves performance as shown in the previous topic.
However, there are issues if the keyword is used inappropriately. The __restrict
child pointers must be used in a different block-level scope than the parent pointers,
such as pointer p
and q
as shown in the following example.
Working Example 1
Use of parent pointers in the same scope might break the __restrict
contract which produces an undefined
behavior, such as pointers p
and q
in the following example.
Working Example 2
This can also happen during the load
operation, as shown in the green text (return *p;
) in the following
figure.
The undefined behavior occurs when the restrict pointers are used within the
same scope, such as pointers p
and q
in the following example.
Working Example with Inline Function
The following code shows the working inline function call, in which pointer
p
and pointer q
are used in different scopes.
The undefined behavior occurs when the restrict pointers are used within the
same scope, such as pointers p
and q
in the following example.
Scope of Restrict Keyword in Inline Function
When there are no other accesses within the scope, declaring the restrict pointer has no performance benefits.
In a special case, you can have non-aliasing accesses, as in the following
example. Here the parent pointer, p
, is used but
points to a different location and therefore this is acceptable.
Benefits of Using the Restrict Keyword for Read/Modify/Write Loops
The following example works without the restrict keyword, but has poor performance.
Adding the restrict keyword allows every iteration to access a different
location where there is no aliasing between iterations (__restrict
) and aliasing within iterations preserved by data dependency.
The increased parallelization results in improved performance.
Derived Pointers
Pointers derived from a restrict pointer are considered restrict pointers and
point to the same restricted memory region, as shown in the following example, where
rq2
, derived from rq1
(defined as a restrict pointer) is also a restrict pointer and points
to the same universe.
Summary
Proper use of the restrict keyword (__restrict
) in AI Engine kernel
programming can result in performance gains and eliminate undefined behaviors in your
code. However, be aware that when assigned to the same scope, the restrict pointers
might result in undefined behavior in your design.