Building the System

Building the system requires building both the hardware (kernels) and the software (host code) side of the system. The Project Editor view, shown below, gives a top-level view of the build configuration. It provides general information about the active build configuration, including the project name, current platform, and selected system configuration (OS and runtime). It also displays several build options including the selected build target, and options for enabling host and kernel debugging. For more details on build targets see Build Targets while Debugging Applications and Kernels gives details on using the debug options.

The bottom portion of the Editor view lists the current kernels used in the project. The kernels are listed under the binary container. In the above example, the kernel krnl_vadd has been added to binary_container_1. To add a binary container left-click the icon. You can rename the binary container by clicking the default name and entering a new name.

To add a kernel to the binary container, left-click the icon located in the Hardware Functions window. It displays a list of kernels defined in the project. Select the kernel from the Add Hardware Functions dialog box as shown in the following figure.

In the Compute Units column, next to the kernel, enter a value to instantiate multiple instances of the kernel (called compute units) as described in Creating Multiple Instances of a Kernel.

With the various options of the active build configuration specified, you can start the build process by clicking on the Build () command.

The SDAccel™ build process generates the host application executable (.exe) and the FPGA binary (.xclbin). The SDAccel environment manages two separate independent build flows:

Host code (software) build
Kernel code (hardware) build

SDAccel uses a standard compilation and linking process for both these software and hardware elements of the project. The steps to build both the host and kernel code to generate the selected build target are described in the following sections.

Building the Host Application

The host code (written in C/C++ using OpenCL™ APIs) is compiled and linked by the Xilinx® C++ (xcpp) compiler and generates a host executable (.exe file) which executes on the host CPU.

TIP: xcpp is based on GCC, and therefore supports many standard GCC options which are not documented here. For information refer to the GCC Option Index.

Compiling the Host Application

Each host application source file is compiled using the -c option that generates an object file (.o).

xcpp ... -c <file_name1> ... <file_nameN>

The name of the output object file can optionally be specified with the -o option.

xcpp ... -o <outut_file_name>

You can produce debugging information using the -g option.

xcpp ... -g

Linking the Host Application

The generated object files (.o) are linked with the Xilinx SDAccel runtime shared library to create the executable (.exe). Linking is performed using the -l option.

xcpp ... -l <object_file1.o> ... <object_fileN.o>

In the GUI flow, the host code and the kernel code are compiled and linked by clicking on the Build () command.

Building the FPGA Binary

The kernel code is written in C, C++, OpenCL C, or RTL and is built by the xocc compiler; a command line utility modeled after GCC. The final output of xocc is the generation of the FPGA binary (.xclbin) which links the kernel .xo files and the hardware platform (.dsa). Generation of the .xclbin is a two step build process requiring kernel compilation and linking.

The xocc can be used standalone (or ideally in scripts or a build system like make), and also is fully supported by the SDx™ IDE. See the SDAccel Environment Getting Started Tutorial (UG1021) for more information.

Compiling the Kernels

During compilation, xocc compiles kernel accelerator functions (written in C/C++ or OpenCL language) into Xilinx object (.xo) files. Each kernel is compiled into separate .xo files. This is the -c/--compile mode of xocc.

Kernels written in RTL are compiled using the package_xo command line utility. This utility, similar to xocc -c, also generates .xo files which are subsequently used in the linking stage. See RTL Kernels for more information.

Build Target

The compilation is dependent on the selected build target, which is discussed in greater detail in Build Targets. You can specify the build target using the xocc –target option as shown below.

xocc --target sw_emu|hw_emu|hw ...

For software emulation (sw_emu), the kernel source code is used during emulation.
For hardware emulation (hw_emu), the synthesized RTL code is used for simulation in the hardware emulation flow.
For system build (hw), xocc generates the FPGA binary and the system can be run on hardware.

Linking the Kernels

As discussed above, the kernel compilation process results in a Xilinx object file (.xo) whether the kernel is described in OpenCL C, C, C++, or RTL. During the linking stage, .xo files from different kernels are linked with the shell to create the FPGA binary container file (.xclbin) which is needed by the host code.

The xocc command to link files is:

$ xocc ... -l

Creating Multiple Instances of a Kernel

During the linking stage, you can specify the number of instances of a kernel, referred to as a compute unit, through the --nk xocc switch. This allows the same kernel function to run in parallel at application runtime to improve the performance of the host application, using different device resources on the FPGA.

Note: For additional information on the --nk options, see SDAccel Environment Programmers Guide (UG1277) and SDx Command and Utility Reference Guide (UG1279).

In the command-line flow, the

xocc
				--nk

option specifies the number of instances of a given kernel to instantiate into the .xclbin file. The syntax of the command is as follows:

$ xocc –nk <kernel name>:<no of instances>:<name1>.<name2>…<nameN>

For example, the kernel foo is instantiated three times with compute unit names fooA, fooB, and fooC:

$ xocc --nk foo:3:fooA.fooB.fooC

TIP: While the kernel instance name is optional, it is highly recommended to specify one as it is required for options like --sp.

In the GUI flow, the number of compute units can be specified by right-clicking the top-level kernel within the Assistant view, and selecting Settings.

From within the Project Settings dialog box, select the desired kernel to instantiate and update the Compute units value. In the following figure, the kernel, krnl_vadd, will be instantiated three times (that is, three CUs).

In the figure above, three compute units of the krnl_vadd kernel will be linked into the FPGA binary (.xclbin), addressable as krnl_vadd_1, krnl_vadd_2, and krnl_vadd_3.

Mapping Kernel Interfaces to Memory Resources

The link phase is when the memory ports of the kernels are connected to memory resources which include PLRAM and DDR. By default, all kernel memory ports are connected to the same DDR bank. As a result, only one memory interface can transfer data to and from the DDR bank at one time, limiting overall performance. If the FPGA contains only one global memory bank, this is the only option. However, if the device contains multiple banks, you can customize the memory bank connections. For additional information, see SDAccel Environment Programmers Guide (UG1277) and SDx Command and Utility Reference Guide (UG1279).

Global memory is the DDR memory accessible by a platform. SDAccel platforms can have access to multiple global memory banks. In applications with multiple kernel instances running concurrently, this can result in significant performance gains. Even if there is only one compute unit in the device, by mapping its input and output ports to different banks you can improve overall performance by enabling simultaneous accesses to input and output data.

Specifying the desired kernel port to memory bank mapping requires taking the following steps:

In the host application, allocate buffers using a vendor extension pointer.
During xocc linking, use the --sp option to map the kernel interface to the desired memory bank.

Details of coding the host application can be found in the SDAccel Environment Programmers Guide, in "Memory Data Transfer to/from the FPGA Device." In short, you must create buffers using a cl_mem_ext_ptr_t vendor extension pointer. The vendor extension pointer is used to indicate which kernel argument this buffer maps to. The runtime uses this information in conjunction with data in the FPGA binary to determine in which memory bank the buffer should be allocated.

During xocc linking, the xocc --sp option specifies the assignment of kernel ports to available memory resources, overriding the default assignments.

The directive to assign a compute unit's memory interface to a memory resource is:

--sp <COMPUTE_UNIT>.<MEM_INTERFACE>:<MEMORY>

Where

COMPUTE_UNIT is the name of the compute unit (CU)
MEM_INTERFACE is the name of one of the compute unit's memory interface or function argument
MEMORY is the memory resource

It is necessary to have a separate directive for each memory interface connection.

TIP: To obtain kernel information including kernel, port, and argument names use the command line tool kernelinfo if you have the .xo file or the platforminfo if you have the .xclbin file. For more information on the tool, see the Useful Command Line Utilities.

For example, xocc … --sp vadd_1.m_axi_gmem:DDR[3] assigns the memory interface called m_axi_gmem from a CU named vadd to DDR[3] memory.

Note: Starting in release 2018.3 and moving forward, memory resource are specified using vector formatting with the resource item enclosed in square brackets (that is, [..] ). For example, the DDR memory resource names of a device with four (4) DDR banks are specified at DDR[0], DDR[1], DDR[2], and DDR[3]. PLRAM is specified in a similar fashion. While release 2018.3 supports the legacy sptag names (that is, bank<n>) for platforms available in 2018.2.xdf and any associated updates in 2018.3, this support will be deprecated in subsequent releases. All new platforms in 2018.3, however, do not support legacy sptag names and require the vector syntax format.

The --sp switch can be added through the SDx GUI similar to the process outlined in Creating Multiple Instances of a Kernel. Right-click the top-level kernel in the Assistant view, and select Settings. From within the Project Settings dialog box, enter the --sp option in the XOCC Linker Options field.

To add directives to the xocc compilation through the GUI, from within the Assistant, right-click the desired kernel under System and select Settings.

This displays the hardware function settings dialog window where you can change the memory interface mapping under the Compute Unit Settings area. To change the memory resource mapping of a CU for a particular argument, click the Memory setting of the respective argument and change to the desired memory resource. The following figure shows the a argument being selected.

To select the identical memory resource for all CU arguments, click the memory resource for the CU (that is, kernl_vadd_1 in the example above) and select the desired memory resource.

IMPORTANT: When using the --sp option to assign kernel interfaces to memory banks, you must specify the --sp option for all interfaces of the kernel. Refer to "Customization of DDR Bank to Kernel Connection" in the SDAccel Environment Programmers Guide for more information.

Allocating Compute Units to SLRs

A Compute Unit (CU) is allocated to a super logic region (SLR) during xocc linking using the --slr directive. The syntax of the command line directive is:

--slr <COMPUTE_UNIT>:<SLR_NUM>

where COMPUTE_UNIT is the name of the CU and SLR_NUM is the SLR number to which the CU is assigned.

For example, xocc … --slr vadd_1:SLR2 assigns the CU named vadd_1 to SLR2.

The --slr directive must be applied separately for each CU in the design. For instance, in the following example, three invocations of the --slr directive are used to assign all three CUs to SLRs; krnl_vadd_1 and krnl_vadd_2 are assigned to SLR1 while krnl_vadd_3 is assigned to SLR2.

--slr krnl_vadd_1:SLR1 --slr krnl_vadd_2:SLR1 --slr krnl_vadd_3:SLR2

In the absence of an --slr directive for a CU, the tools are free to place the CU in any SLR.

To allocate a CU to an SLR in the GUI flow, right-click the desired kernel under System or Emulation-HW configurations and select Settings as shown in the following figure.

This displays the hardware function settings dialog window. Under the Compute Unit Settings area, you can change the SLR where the CU is allocated to by clicking the SLR setting of the respective CU and selecting the desired SLR from the menu as shown. Selecting Auto allows the tools the freedom to place the CU in any SLR.

Controlling Implementation Results

When compiling or linking, fine grain control over the hardware generated by SDAccel for hardware emulation and system builds can be specified using the --xp switch.

The --xp switch is paired with parameters to configure the Vivado® Design Suite. For instance, the --xp switch can configure the optimization, placement and timing results of the hardware implementation.

The --xp can also be used to set up emulation and compile options. Specific examples of these parameters include setting the clock margin, specifying the depth of FIFOs used in the kernel dataflow region, and specifying the number of outstanding writes and reads to buffer on the kernel AXI interface. A full list of parameters and valid values can be found in the SDx Command and Utility Reference Guide.

TIP: Familiarity with the Vivado Design Suite User Guide: High-Level Synthesis (UG902) and the tool suite is necessary to make the most use of these parameters. See the Vivado Design Suite User Guide: Implementation (UG904) for more information.

In the command line flow, parameters are specified as param:<param_name>=<value>, where:

param: Required keyword.
param_name: Name of a parameter to apply.
value: Appropriate value for the parameter.

IMPORTANT: The xocc linker does not check the validity of the parameter or value. Be careful to apply valid values or the downstream tools might not work properly.

For example:

$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true
     -–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"

You must repeat the --xp switch for each param used in the xocc command as shown below:

$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true
-–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"

You can specify param values in an xocc.ini file with each option specified on a separate line (without the --xp switch).

An xocc.ini is an initialization file that contains --xp settings. Locate the file in the same directory as the build configuration.

param:compiler.enableDSAIntegrityCheck=true
param:prop:kernel.foo.kernel_flags="-std=c++0x"

Under the GUI flow, if no xocc.ini is present, the application uses the GUI build settings. Under a Makefile flow, if no xocc.ini file is present, it will use the configurations within the Makefile.

The --xp switch can be added through the SDx GUI similar to that outlined in Creating Multiple Instances of a Kernel. Right-click the top-level kernel in the Assistant view, and select Settings. From within the Project Settings dialog box, enter the --xp option in the XOCC Linker Options field.

You can also add xocc compiler options and --xp parameters to kernels by right-clicking the kernel in the Assistant view. The following image demonstrates the --xp setting for the krnl_vadd kernel.

Controlling Report Generation

The xocc -R switch controls the level of report generation during the link stage for hardware emulation and system targets. Builds that generate fewer reports will typically run more quickly.

The command line option is as follows:

$ xocc -R <report_level>

Where <report_level> is one of the following report_level options:

-R0: Minimal reports and no intermediate design checkpoints (DCP)
-R1: Includes R0 reports plus:
- Identifies design characteristics to review for each kernel (report_failfast)
- Identifies design characteristics to review for full design post-opt (report_failfast)
- Saves post-opt DCP
-R2 : Includes R1 reports plus:
- The Vivado default reporting including DCP after each implementation step
- Design characteristics to review for each SLR after placement (report_failfast)

TIP: The report_failfast is a utility that highlights potential device utilization challenges, clock constraint problems, and potential unreachable target frequency (MHz).

The -R switch can also be added through the SDx GUI as described in Creating Multiple Instances of a Kernel:

Right-click the top-level kernel in the Assistant view and select Settings.
From within the Project Settings dialog box, enter the -R option in the XOCC Linker Options field.

Build Targets

The SDAccel build target defines the nature of FPGA binary generated by the build process. There are three different build targets, two emulation targets (software and hardware emulation) used for debug and validation purposes and the default hardware target used to generate the actual FPGA binary.

Software Emulation

The main goal of software emulation is to ensure functional correctness and to partition the application into kernels. For software emulation, both the host code and the kernel code are compiled to run on the host x86 processor. The programmer model of iterative algorithm refinement through fast compile and run loops is preserved. Software emulation has compile and execution times that are the same as a CPU. Refer to the SDAccel Environment Debugging Guide for more information on running software emulation.

In the context of the SDAccel development environment, software emulation on a CPU is the same as the iterative development process that is typical of CPU/GPU programming. In this type of development style, a programmer continuously compiles and runs an application as it is being developed.

For RTL kernels, software emulation can be supported if a C model is associated with the kernel. The RTL kernel wizard packaging step provides an option to associate C model files with the RTL kernel for support of software emulation flows.

Hardware Emulation

While the software emulation flow is a good measure of functional correctness, it does not guarantee correctness on the FPGA execution target. The hardware emulation flow enables the programmer to check the correctness of the logic generated for the custom compute units before deployment on hardware, where a compute unit is an instantiation of a kernel.

The SDAccel environment generates at least one custom compute unit for each kernel in an application. Each kernel is compiled to a hardware model (RTL). During emulation kernels are executed with a hardware simulator, but the rest of the system still uses a C simulator. This allows the SDAccel environment to test the functionality of the logic that will be executed on the FPGA compute fabric.

In addition, hardware emulation provides performance and resource estimation, allowing the programmer to get an insight into the design.

In hardware emulation, compile and execution times are longer in software emulation; thus Xilinx recommends that you use small data sets for debug and validation.

IMPORTANT: The DDR memory model and the memory interface generator (MIG) model used in Hardware Emulation are high-level simulation models. These models are good for simulation performance, however they approximate latency values and are not cycle-accurate like the kernels. Consequently, any performance numbers shown in the profile summary report are approximate, and must be used only as a general guidance and for comparing relative performance between different kernel implementations.

System

When the build target is system, xocc generates the FPGA binary for the device by running synthesis and implementation on the design. The binary includes custom logic for every compute unit in the binary container. Therefore, it is normal for this build step to run for a longer period of time than the other steps in the SDAccel build flow. However, because the kernels will be running on actual hardware, their execution times will be extremely fast.

The generation of custom compute units uses the Vivado High-Level Synthesis (HLS) tool, which is the compute unit generator in the application compilation flow. Automatic optimization of a compute unit for maximum performance is not possible for all coding styles without additional user input to the compiler. The SDAccel Environment Profiling and Optimization Guide discusses the additional user input that can be provided to the SDAccel environment to optimize the implementation of kernel operations into a custom compute unit.

After all compute units have been generated, these units are connected to the infrastructure elements provided by the target device in the solution. The infrastructure elements in a device are all of the memory, control, and I/O data planes which the device developer has defined to support an OpenCL application. The SDAccel environment combines the custom compute units and the base device infrastructure to generate an FPGA binary which is used to program the Xilinx device during application execution.

IMPORTANT: The SDAccel environment always generates a valid FPGA hardware design and performs default connections from the kernel to global memory. Xilinx recommends explicitly defining optimal connections. See Kernel SLR and DDR Memory Assignments for details.

Specifying a Target

You can specify the target build from the command-line with the following command:

xocc --target sw_emu|hw_emu|hw ...

Similarly, from within the GUI, the build target can be specified by selecting the Active build configuration pull-down tab in the Project Editor window. This provides three choices (see the following figure):

Emulation-SW
Emulation-HW
System

TIP: You can also assign the compilation target from the Build (

) command, or from the Project > Build Configurations > Set Active menu command.

After setting the active build configuration, build the system from the Project > Build Project menu command.

The recommended build flow is detailed in Debugging Flows.