Configuring the System Architecture

In SDAccel Compilation Flow and Execution Model, you learned of the two distinct phases in the SDAccel™ environment kernel build process:

  1. Compilation stage: The compilation process is controlled by the xocc –c option. At the end of the compilation stage one or more kernel functions are compiled into separate .xo files. At this stage, the xocc compiler extracts the hardware intent from the C/C++ code and associated pragmas. Refer to the SDx Command and Utility Reference Guide for more information on the xocc compiler.
  2. Linking stage: The linking stage is controlled by the xocc –l option. During the linking process all the .xo files are integrated into the FPGA hardware.

If needed, the kernel linking process can be customized to improve the SDAccel environment runtime performance. This chapter introduces a few such techniques.

Multiple Instances of a Kernel

By default, a single hardware instance is implemented from a kernel. If the host intends to execute the same kernel multiple times, then multiple such kernel executions take place on the same hardware instance sequentially. However, you can customize the kernel compilation (linking stage) to create multiple hardware instances from a single kernel. This can improve execution performance as the multiple kernel calls can now run concurrently, overlapping their execution while running on separate hardware instances.

Multiple instances of the kernel can be created by using the xocc -–nk switch during linking.

For example, for a kernel name foo, two hardware instances can be implemented as follows:

# xocc -–nk <kernel name>:<number of instance>
xocc --nk foo:2 

By default, the implemented instance names are <kernel_name>_1 and <kernel_name>_2. However, you can optionally change the default instance names as shown below:

# xocc -–nk <kernel name>:<no of instance>:<name 1>.<name 2>…<name N> 
xocc --nk foo:3:fooA.fooB.fooC

This example implements three identical copies, or hardware instances of kernel foo, named fooA, fooB, and fooC on the FPGA programmable logic.

Connecting Kernel Ports to Global Memory

By default, all kernel memory interfaces are connected to the same global memory bank. As a result, only one kernel port at a time can transfer data to and from the memory bank, limiting the performance of the application.

However, all off-the-shelf SDAccel platforms contain multiple global memory banks. During the linking stage, it is possible to specify for each kernel port (or interface) which global memory bank it should be connected to.

Proper configuration of kernel to memory connectivity is important to maximize bandwidth, optimize data transfers, and improve overall performance.

Consider the following example:

void cnn( int *image, // Read-Only Image 
  int *weights, // Read-Only Weight Matrix
  int *out, // Output Filters/Images
  ... // Other input or Output ports

  #pragma HLS INTERFACE m_axi port=image offset=slave bundle=gmem
  #pragma HLS INTERFACE m_axi port=weights offset=slave bundle=gmem
  #pragma HLS INTERFACE m_axi port=out offset=slave bundle=gmem

The example shows two memory interface inputs for the kernel: image and weights. If both are connected to the same memory bank, a concurrent transfer of both of these inputs into the kernel is not possible.

The following steps are needed to implement separate memory bank connections for the image and weights inputs:

  1. Specify separate bundle names for these inputs. This is discussed in Memory Data Inputs and Outputs. However, for reference the code is shown here again.
    void cnn( int *image, // Read-Only Image 
     int *weights, // Read-Only Weight Matrix
     int *out, // Output Filters/Images
     ... // Other input or Output ports
    		   
     #pragma HLS INTERFACE m_axi port=image offset=slave bundle=gmem
     #pragma HLS INTERFACE m_axi port=weights offset=slave bundle=gmem1
     #pragma HLS INTERFACE m_axi port=out offset=slave bundle=gmem
    
    IMPORTANT: When specifying a bundle= name, you should use all lowercase characters to be able to assign it to a specific memory bank using the --sp option.

    The memory interface inputs image and weights are assigned different bundle names in the example above.

  2. Specify kernel port to global memory connection during the linking stage with the -–sp switch:
    --sp <kernel_instance_name>.<interface_name>:<bank name> 

    Where:

    • <kernel_instance_name> is the instance name of the kernel as specified by the --nk option, described in Multiple Instances of a Kernel.
    • <interface_name> is the name of the interface bundle defined by the HLS INTERFACE pragma, including m_axi_ as a prefix, and the bundle= name when specified.
      Note: It is also possible to use the <interface_name> (image, weights).
      TIP: If the port is not specified as part of a bundle, then the <interface_name> is the specified port= name, without the m_axi_ prefix.
    • <bank_name> is denoted as DDR[0], DDR[1], etc. For a platform with four DDR banks, the bank names are DDR[0], DDR[1], DDR[2], and DDR[3]. Some platforms also provide support for PLRAM and HBM memory banks.

    For the above example, considering a single instance of the cnn kernel, the -–sp switch can be specified as follows:

    --sp cnn_1.m_axi_gmem:DDR[0]  \            
    -–sp cnn_1.m_axi_gmem1:DDR[1]
Note: Up to 15 kernel ports can be connected to a given memory bank. Therefore, if there are more than 15 ports in the entire design, it is not possible to rely on the default mapping and the --sp option must be used to distribute connections across different banks.

Summary

This section discusses two powerful ways to customize the kernel compilation to improve the system performance during execution.

  1. Consider creating multiple instances of a kernel on the fabric of the FPGA by specifying the xocc --nk if the kernel is called multiple times from the host code.
  2. Consider using the xocc --sp switch to customize the global memory connection to kernel memory interfaces to achieve concurrent access.

Depending on the host and kernel design, these options can be exploited to improve the kernel acceleration on Xilinx® FPGAs.