Using the Restrict Keyword in AI Engine Kernels

The use of restrict keyword (__restrict) is permitted in the AI Engine kernel C++ code. This appendix highlights Xilinx recommendations for using the restrict keyword in the context of AI Engine kernel code.

Pointer Aliasing

Pointer aliasing refers to the situation where the same memory location can be accessed using different pointer names. The strict aliasing rule in C/C++ means that pointers are assumed not to alias if they point to fundamentally different types. Aliasing introduces strong constraints on program execution order. The following shows the aliasing of p and q.

Figure 1: Pointer Aliasing

The following is an example of pointer aliasing, in which both the pointers p and q point to the same address. The assembly language code produced by the compiler is shown in the middle column, and the operations and clock cycles are shown on the right.

Figure 2: Aliasing Code Example

By adding the restrict keyword into this code example, the compiler can optimize the resulting assembly language to increase parallelization of the operations in hardware. The following example shows that using the restrict keyword to prevent aliasing uses fewer clock cycles to complete the same operation.

Figure 3: Use of Restrict Keyword to Avoid Aliasing

Memory Dependencies

Memory dependencies in the code can limit the kinds of optimizations attempted by the compiler. For example in the following code, xyz and pointers p and q might be unrelated. However, within the function code both pointer p and pointer q point to same global variable xyz. The compiler must guarantee the correct execution under both these conditions. Due to these kinds of memory dependencies the compiler needs to be conservative and limit optimizations.

Figure 4: Unrelated Pointers

Strict Aliasing Rule

The strict aliasing rule dictates that pointers are assumed not to alias if they point to fundamentally different types, except for char* and void* which can alias to any other data type. This is is shown in the following graphic which shows the object universes and the associated pointers.

Figure 5: Object Universes
Pointers are associated with a type universe: U(T)
T is the template and in the preceding graphic the various templates are shown, including an int universe and a float universe; there is also a MyClass universe per design. Additionally there is a char universe that includes all universes by default.
Universes do not alias
Pointer p can only point to any address within the int universe whereas pointer q can only point to any address within the float universe. Because of this pointer p and pointer q cannot be aliased.
Derived pointers point to the original universe
Pointers derived from a restrict pointer are considered restrict pointers and point to the same restricted memory region. See Derived Pointers.
char* universe contains all universes
A char pointer can point to any variable in all universes.

For two pointers of the same type, as in the following, where both p and q are int, the compiler is conservative and aliasing is applied, resulting in loss of performance.

Figure 6: Loss of Performance

For two pointers of different types, as in the following example, where p is an int and q is float, the compiler applies the strict aliasing rule and an undefined behavior occurs if aliasing exists.

Figure 7: Two Pointers of Different Types

Restrict Keyword

The restrict keyword is mainly used in pointer declarations as a type qualifier for pointers. It does not add any new functionality. It allows you to tell the compiler about a potential optimization. Using __restrict with a pointer informs the compiler that the pointer is the only way to access the object pointed at, and the compiler does not need to perform any additional checks.

Note: If a programmer uses the restrict keyword and violates the above condition, undefined behavior can occur.

The following is another example with pointers that, by default, have no aliasing.

Figure 8: No Aliasing Example

Apply the restrict keyword for performance improvement. The following example shows no memory dependencies with other pointers.

Figure 9: No Memory Dependencies with Other Pointers

Restrict Qualification

The C standard provides a specific pointer qualifier, __restrict, intended to allow more aggressive compiler optimization by explicitly stating data independence between whatever the pointer references and all other variables. For example :

int a; // global variable
void foo(int* __restrict p, int* q)
{
  for (...) { ... *p += a + *q; ...}
}

Now the analysis of foo can proceed with the knowledge that *p does not denote the same object as *q and a. So, a and *q can now be loaded once, before the loop.

Currently, the compiler front end does not disambiguate between different accesses to the same array. So, when updating one element of an array, it assumes that the complete array has changed value. The __restrict qualifier can be used to override this conservative assumption. This is useful when you want to obtain multiple independent pointers to the same array.

void foo(int A[])
{
  int* __restrict rA = A; // force independent access
  for (int i = ...)
  rA[i] = ... A[i];
}

In this example, the __restrict qualifier allows software pipelining of the loop; the next array element can already be loaded, while the previous one must still be stored. To maximize the impact of the __restrict qualifier, the compiler front end, by default, inserts a chess_copy operation in the initializer, as if was written:

int* __restrict rA = chess_copy(A);

This is needed to keep both pointers distinct within the optimizer (for example, no common subexpression elimination). This behavior can be disabled for the AI Engine compiler front end by means of the option -mllvm -chess-implicit-chess_copy=false. So, the chess_copy creates two pointers, while __restrict informs the compiler not to consider any mutual dependencies between the stores/loads through these pointers. For __restrict pointers having a local scope, the mutual independence assumption only holds during the lifetime of the __restrict pointer.

Pointers derived from a __restrict pointer (such as rA+1 or through pointer intrinsics) keep the restrictness, that is, they are considered to point to the same restricted memory region.

Note: Details of chess_copy is available from the Chess Compiler User Manual, which can be found in the AI Engine lounge.

Undefined Behavior

Using the restrict keyword improves performance as shown in the previous topic. However, there are issues if the keyword is used inappropriately. The __restrict child pointers must be used in a different block-level scope than the parent pointers, such as pointer p and q as shown in the following example.

Working Example 1

Figure 10: Use of Restrict Keyword

Use of parent pointers in the same scope might break the __restrict contract which produces an undefined behavior, such as pointers p and q in the following example.

Figure 11: Undefined Behavior

Working Example 2

This can also happen during the load operation, as shown in the green text (return *p;) in the following figure.

Figure 12: Load Operation

The undefined behavior occurs when the restrict pointers are used within the same scope, such as pointers p and q in the following example.

Figure 13: Restrict Pointers in Same Scope

Working Example with Inline Function

The following code shows the working inline function call, in which pointer p and pointer q are used in different scopes.

Figure 14: Inline Function Calls

The undefined behavior occurs when the restrict pointers are used within the same scope, such as pointers p and q in the following example.

Figure 15: Inline Function Calls in Same Scope

Scope of Restrict Keyword in Inline Function

When there are no other accesses within the scope, declaring the restrict pointer has no performance benefits.

Figure 16: Working Example with No Performance Benefits

In a special case, you can have non-aliasing accesses, as in the following example. Here the parent pointer, p, is used but points to a different location and therefore this is acceptable.

Figure 17: Special Case—Non-aliasing Accesses

Benefits of Using the Restrict Keyword for Read/Modify/Write Loops

The following example works without the restrict keyword, but has poor performance.

Figure 18: Example Without Restrict Keyword

Adding the restrict keyword allows every iteration to access a different location where there is no aliasing between iterations (__restrict) and aliasing within iterations preserved by data dependency. The increased parallelization results in improved performance.

Figure 19: Add Restrict Keyword

Derived Pointers

Pointers derived from a restrict pointer are considered restrict pointers and point to the same restricted memory region, as shown in the following example, where rq2, derived from rq1 (defined as a restrict pointer) is also a restrict pointer and points to the same universe.

Figure 20: Pointers to Same Restricted Memory Region

Summary

Proper use of the restrict keyword (__restrict) in AI Engine kernel programming can result in performance gains and eliminate undefined behaviors in your code. However, be aware that when assigned to the same scope, the restrict pointers might result in undefined behavior in your design.

Figure 21: Restrict Keyword Use Summary