Topological Optimization
This section focuses on the topological optimization. It looks at the attributes related to the rough layout and implementation of multiple compute units and their impact on performance.
Multiple Compute Units
Depending on available resources on the target device, multiple compute units of the same kernel (or different kernels) can be created to run in parallel, which improves the system processing time and throughput.
Different kernels are provided as separate .xo
files on the xocc link line. Multiple compute units of a kernel can be
added by using the --nk
option:
xocc -l --nk <kernel_name:number(:compute_unit_name1.compute_unit_name2...)>
Using Multiple DDR Banks
Acceleration cards supported in SDAccel™ environment provide one, two, or four DDR banks, and up to 80 GB/s raw DDR bandwidth. For kernels moving large amount of data between the FPGA and the DDR, Xilinx® recommends that you direct the SDAccel compiler and runtime library to use multiple DDR banks.
In addition to DDR banks, the host application can access PLRAM to
transfer data directly to a kernel. This feature is enabled using the xocc --sp
option with compatible platforms.
To take advantage of multiple DDR banks, you need to assign CL memory
buffers to different banks in the host code as well as configure the
xclbin file to match the bank assignment in xocc
command line.
The following block diagram shows the Global Memory Two Banks Example in “kernel_to_gmem” category on SDAccel Getting Started Examples on GitHub that connects the input pointer to DDR bank 0 and output pointer to DDR bank 1.
Assigning DDR Bank in Host Code
Bank assignment in host code is supported by Xilinx vendor extension. The following code snippet shows the header file required, as well as assigning input, and output buffers to DDR bank 0 and bank 1 respectively:
#include <CL/cl_ext.h>
…
int main(int argc, char** argv)
{
…
cl_mem_ext_ptr_t inExt, outExt; // Declaring two extensions for both buffers
inExt.flags = 0|XCL_MEM_TOPOLOGY; // Specify Bank0 Memory for input memory
outExt.flags = 1|XCL_MEM_TOPOLOGY; // Specify Bank1 Memory for output Memory
inExt.obj = 0 ; outExt.obj = 0; // Setting Obj and Param to Zero
inExt.param = 0 ; outExt.param = 0;
int err;
//Allocate Buffer in Bank0 of Global Memory for Input Image using Xilinx Extension
cl_mem buffer_inImage = clCreateBuffer(world.context, CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX,
image_size_bytes, &inExt, &err);
if (err != CL_SUCCESS){
std::cout << "Error: Failed to allocate device Memory" << std::endl;
return EXIT_FAILURE;
}
//Allocate Buffer in Bank1 of Global Memory for Input Image using Xilinx Extension
cl_mem buffer_outImage = clCreateBuffer(world.context, CL_MEM_WRITE_ONLY | CL_MEM_EXT_PTR_XILINX,
image_size_bytes, &outExt, NULL);
if (err != CL_SUCCESS){
std::cout << "Error: Failed to allocate device Memory" << std::endl;
return EXIT_FAILURE;
}
…
}
cl_mem_ext_ptr_t
is a struct
as defined below:
typedef struct{
unsigned flags;
void *obj;
void *param;
} cl_mem_ext_ptr_t;
- Valid values for
flags
are:- XCL_MEM_DDR_BANK0
- XCL_MEM_DDR_BANK1
- XCL_MEM_DDR_BANK2
- XCL_MEM_DDR_BANK3
- <id> | XCL_MEM_TOPOLOGYNote: The <id> is determined by looking at the Memory Configuration section in the xxx.xclbin.info file generated next to the xxx.xclbin file. In the xxx.xclbin.info file, the global memory (DDR, PLRAM, etc.) is listed with an index representing the <id>.
obj
is the pointer to the associated host memory allocated for the CL memory buffer only ifCL_MEM_USE_HOST_PTR
flag is passed toclCreateBuffer
API, otherwise set it to NULL.param
is reserved for future use. Always assign it to 0 or NULL.
Assigning Global Memory for Kernel Code
Creating Multiple AXI Interfaces
OpenCL™ kernels, C/C++ kernels, and RTL kernels have different methods for assigning function parameters to AXI interfaces.
-
For OpenCL kernels, the
--max_memory_ports
option is required to generate one AXI4 interface for each global pointer on the kernel argument. The AXI4 interface name is based on the order of the global pointers on the argument list.The following code is taken from the example gmem_2banks_ocl in the kernel_to_gmem category from the SDAccel Getting Started Examples on GitHub:
__kernel __attribute__ ((reqd_work_group_size(1, 1, 1))) void apply_watermark(__global const TYPE * __restrict input, __global TYPE * __restrict output, int width, int height) { ... }
In this example, the first global pointer
input
is assigned an AXI4 nameM_AXI_GMEM0
, and the second global pointeroutput
is assigned a nameM_AXI_GMEM1
. -
For C/C++ kernels, multiple AXI4 interfaces are generated by specifying different “bundle” names in the HLS INTERFACE pragma for different global pointers. Refer to the SDAccel Environment Programmers Guide for more information.
The following is a code snippet from the gmem_2banks_c example that assigns theinput
pointer to the bundlegmem0
and theoutput
pointer to the bundlegmem1
. The bundle name can be any valid C string, and the AXI4 interface name generated will beM_AXI_<bundle_name>
. For this example, the input pointer will have AXI4 interface name asM_AXI_gmem0
, and the output pointer will haveM_AXI_gmem1
.#pragma HLS INTERFACE m_axi port=input offset=slave bundle=gmem0 #pragma HLS INTERFACE m_axi port=output offset=slave bundle=gmem1
- For RTL kernels, the port names are generated during the import process by
the
RTL
kernel wizard. The default names
proposed by the RTL kernel wizard are
m00_axi
andm01_axi
. If not changed, these names have to be used when assigning a DDR bank through the--sp
option.
Assigning AXI Interfaces to DDR Banks
--sp
option,
and specify in which SLR the kernel is placed. Refer to the XOCC command in the SDx Command and Utility Reference Guide for details of the --sp
command option, and the SDAccel Environment User Guide for details on SLR placement. AXI4 interfaces are connected to DDR banks
using the --sp
option. The --sp
option value is in the format of <kernel_instance_name>.<interface_name>:<DDR_bank_name>
.
Valid DDR bank names for the --sp
option are:
- DDR[0]
- DDR[1]
- DDR[2]
- DDR[3]
The following is the command line example that connects the input pointer
(M_AXI_GMEM0
) to DDR bank 0 and the output pointer
(M_AXI_GMEM1
) to DDR bank 1:
xocc --max_memory_ports apply_watermark
--sp apply_watermark_1.m_axi_gmem0:DDR[0]
--sp apply_watermark_1.m_axi_gmem1:DDR[1]
You can use the Device Hardware Transaction view to observe the actual DDR Bank communication, and to analyze DDR usage.
Assigning AXI Interfaces to PLRAM
Some platforms support PLRAMs. In these cases, use the same --sp
option as described in Assigning AXI Interfaces to DDR Banks, but use the name, PLRAM[id]. Valid names supported by
specific platforms can be found in the Memory Configuration section of the xclibin.info file generated alongside xclbin.
Assigning Kernels to SLR regions
Assigning ports to DDR banks requires that the kernel will have to be
physically routed on the FPGA to connect to the assigned DDR. Currently, large FPGAs use
stacked silicon devices with several Super Logic Regions (SLRs). By default, the
SDAccel environment will place the compute
units in the same SLR as the shell. This might not always be desirable, especially when
specific DDR banks are used that might be in another SLR region. As a result, Xilinx recommends to use the --slr
option to map kernels to be close to the used DDR memory. For
example, the apply_watermark_1
kernel above can be
mapped to SLR 1 by applying the following link option:
xocc -l --slr apply_watermark_1:SLR1
To better understand the platform attributes, such as the number of DDRs
and SLR regions, use platforminfo
. For more
information, refer to the SDx
Command and Utility Reference Guide (UG1279).