Building the System
Building the system requires building both the hardware (kernels) and the software (host code) side of the system. The Project Editor view, shown below, gives a top-level view of the build configuration. It provides general information about the active build configuration, including the project name, current platform, and selected system configuration (OS and runtime). It also displays several build options including the selected build target, and options for enabling host and kernel debugging. For more details on build targets see Build Targets while Debugging Applications and Kernels gives details on using the debug options.
The bottom portion of the Editor view lists the current kernels used in the
project. The kernels are listed under the binary container. In the above example, the kernel
krnl_vadd
has been added to binary_container_1
. To add a binary container left-click the icon. You can rename the
binary container by clicking the default name and entering a new name.
To add a kernel to the binary container, left-click the icon located in the Hardware Functions window. It displays a list of kernels defined in the project. Select the kernel from the Add Hardware Functions dialog box as shown in the following figure.
In the Compute Units column, next to the kernel, enter a value to instantiate multiple instances of the kernel (called compute units) as described in Creating Multiple Instances of a Kernel.
With the various options of the active build configuration specified, you can start the build process by clicking on the Build () command.
The SDAccel™ build process generates the host application executable (.exe) and the FPGA binary (.xclbin). The SDAccel environment manages two separate independent build flows:
- Host code (software) build
- Kernel code (hardware) build
SDAccel uses a standard compilation and linking process for both these software and hardware elements of the project. The steps to build both the host and kernel code to generate the selected build target are described in the following sections.
Building the Host Application
The host code (written in C/C++ using OpenCL™ APIs) is compiled and linked by the Xilinx® C++
(xcpp
) compiler and generates a host executable
(.exe file) which executes on the host CPU.
xcpp
is based on GCC, and therefore supports many standard GCC options
which are not documented here. For information refer to the GCC Option Index.Compiling the Host Application
Each host application source file is compiled using the -c
option that generates an object file (.o).
xcpp ... -c <file_name1> ... <file_nameN>
-o
option.xcpp ... -o <outut_file_name>
You
can produce debugging information using the -g
option.xcpp ... -g
Linking the Host Application
-l
option.xcpp ... -l <object_file1.o> ... <object_fileN.o>
In the GUI flow, the host code and the kernel code are compiled and linked by clicking on the Build () command.
Building the FPGA Binary
The kernel code is written in C, C++, OpenCL C, or RTL and is built by the xocc
compiler; a
command line utility modeled after GCC. The final output of xocc
is the generation of the FPGA binary (.xclbin
) which links the kernel .xo
files and the hardware platform (.dsa). Generation
of the .xclbin
is a two step build process requiring
kernel compilation and linking.
The xocc
can be used standalone (or ideally in
scripts or a build system like make
), and also is fully
supported by the SDx™ IDE. See the SDAccel
Environment Getting Started Tutorial (UG1021) for more information.
Compiling the Kernels
During compilation, xocc
compiles
kernel accelerator functions (written in C/C++ or OpenCL language) into Xilinx object
(.xo) files. Each kernel is compiled into
separate .xo files. This is the -c/--compile
mode of xocc
.
Kernels written in RTL are compiled using the package_xo
command line utility. This utility, similar to xocc -c
, also generates .xo files which are subsequently used in the linking stage. See RTL Kernels for more information.
Build Target
The compilation is dependent on the selected build target, which is
discussed in greater detail in Build Targets. You
can specify the build target using the xocc –target
option as shown below.
xocc --target sw_emu|hw_emu|hw ...
- For software emulation (
sw_emu
), the kernel source code is used during emulation. - For hardware emulation (
hw_emu
), the synthesized RTL code is used for simulation in the hardware emulation flow. - For system build (
hw
),xocc
generates the FPGA binary and the system can be run on hardware.
Linking the Kernels
As discussed above, the kernel compilation process results in a Xilinx object file (.xo) whether the kernel is described in OpenCL C, C, C++, or RTL. During the linking stage, .xo files from different kernels are linked with the shell to create the FPGA binary container file (.xclbin) which is needed by the host code.
xocc
command to link files
is:$ xocc ... -l
Creating Multiple Instances of a Kernel
During the linking stage, you can specify the number of instances of a
kernel, referred to as a compute unit, through the --nk xocc
switch.
This allows the same kernel function to run in parallel at application runtime to
improve the performance of the host application, using different device resources on the
FPGA.
--nk
options, see SDAccel Environment Programmers Guide (UG1277) and SDx
Command and Utility Reference Guide (UG1279).xocc
--nk
option specifies the number of instances of a given kernel to
instantiate into the .xclbin file. The syntax of
the command is as
follows:$ xocc –nk <kernel name>:<no of instances>:<name1>.<name2>…<nameN>
foo
is instantiated three times
with compute unit names fooA
, fooB
, and
fooC
:
$ xocc --nk foo:3:fooA.fooB.fooC
--sp
.In the GUI flow, the number of compute units can be specified by right-clicking the top-level kernel within the Assistant view, and selecting Settings.
From within the Project Settings dialog box, select the desired kernel to
instantiate and update the Compute units value. In the following figure, the kernel,
krnl_vadd
, will be instantiated three times (that
is, three CUs).
In the figure above, three compute units of the krnl_vadd
kernel will be linked into the FPGA binary (.xclbin), addressable as krnl_vadd_1
, krnl_vadd_2
, and krnl_vadd_3
.
Mapping Kernel Interfaces to Memory Resources
The link phase is when the memory ports of the kernels are connected to memory resources which include PLRAM and DDR. By default, all kernel memory ports are connected to the same DDR bank. As a result, only one memory interface can transfer data to and from the DDR bank at one time, limiting overall performance. If the FPGA contains only one global memory bank, this is the only option. However, if the device contains multiple banks, you can customize the memory bank connections. For additional information, see SDAccel Environment Programmers Guide (UG1277) and SDx Command and Utility Reference Guide (UG1279).
Global memory is the DDR memory accessible by a platform. SDAccel platforms can have access to multiple global memory banks. In applications with multiple kernel instances running concurrently, this can result in significant performance gains. Even if there is only one compute unit in the device, by mapping its input and output ports to different banks you can improve overall performance by enabling simultaneous accesses to input and output data.
Specifying the desired kernel port to memory bank mapping requires taking the following steps:
- In the host application, allocate buffers using a vendor extension pointer.
- During
xocc
linking, use the--sp
option to map the kernel interface to the desired memory bank.
Details of coding the host application can be found in the SDAccel Environment Programmers Guide, in "Memory Data Transfer to/from
the FPGA Device." In short, you must create buffers using a cl_mem_ext_ptr_t
vendor extension pointer. The vendor extension pointer is
used to indicate which kernel argument this buffer maps to. The runtime uses this
information in conjunction with data in the FPGA binary to determine in which memory
bank the buffer should be allocated.
During xocc
linking, the xocc
--sp
option specifies the assignment of kernel ports to available memory
resources, overriding the default assignments.
The directive to assign a compute unit's memory interface to a memory resource is:
--sp <COMPUTE_UNIT>.<MEM_INTERFACE>:<MEMORY>
Where
COMPUTE_UNIT
is the name of the compute unit (CU)MEM_INTERFACE
is the name of one of the compute unit's memory interface or function argumentMEMORY
is the memory resource
It is necessary to have a separate directive for each memory interface connection.
kernelinfo
if you have the .xo
file or the platforminfo
if you have
the .xclbin
file. For more information on the tool,
see the Useful Command Line Utilities.For example, xocc … --sp
vadd_1.m_axi_gmem:DDR[3]
assigns the memory interface called m_axi_gmem
from a CU named vadd
to DDR[3] memory.
DDR[0]
, DDR[1]
, DDR[2]
, and DDR[3]
. PLRAM is specified
in a similar fashion. While release 2018.3 supports the legacy sptag
names (that is, bank<n>) for platforms available in 2018.2.xdf
and any associated updates in 2018.3, this support will be deprecated in subsequent
releases. All new platforms in 2018.3, however, do not support legacy
sptag
names and require the vector syntax format.The --sp
switch can be added through the
SDx GUI similar to the process outlined in Creating Multiple Instances of a Kernel. Right-click the top-level kernel in
the Assistant view, and select Settings. From within the Project Settings dialog box,
enter the --sp
option in the XOCC Linker Options field.
To add directives to the xocc
compilation through the GUI,
from within the Assistant, right-click the
desired kernel under System and select
Settings.
This displays the hardware function settings dialog window where you can change the memory interface mapping under the Compute Unit Settings area. To change the memory resource mapping of a CU for a particular argument, click the Memory setting of the respective argument and change to the desired memory resource. The following figure shows the a argument being selected.
To select the identical memory resource for all CU arguments, click the memory
resource for the CU (that is, kernl_vadd_1
in the
example above) and select the desired memory resource.
--sp
option to assign kernel interfaces to memory banks,
you must specify the --sp
option for all interfaces of
the kernel. Refer to "Customization of DDR Bank to Kernel Connection" in the SDAccel Environment Programmers Guide for more information.Allocating Compute Units to SLRs
A Compute Unit (CU) is allocated to a super logic region (SLR) during xocc
linking using the --slr
directive. The syntax of the command line directive is:
--slr <COMPUTE_UNIT>:<SLR_NUM>
where COMPUTE_UNIT
is the name of the
CU and SLR_NUM
is the SLR number to which the CU is
assigned.
For example, xocc … --slr vadd_1:SLR2
assigns the CU named vadd_1
to SLR2.
The --slr
directive must be applied
separately for each CU in the design. For instance, in the following example, three
invocations of the --slr
directive are used to assign
all three CUs to SLRs; krnl_vadd_1
and krnl_vadd_2
are assigned to SLR1 while krnl_vadd_3
is assigned to SLR2.
--slr krnl_vadd_1:SLR1 --slr krnl_vadd_2:SLR1 --slr krnl_vadd_3:SLR2
In the absence of an --slr
directive
for a CU, the tools are free to place the CU in any SLR.
To allocate a CU to an SLR in the GUI flow, right-click the desired kernel under System or Emulation-HW configurations and select Settings as shown in the following figure.
This displays the hardware function settings dialog window. Under the Compute Unit Settings area, you can change the SLR where the CU is allocated to by clicking the SLR setting of the respective CU and selecting the desired SLR from the menu as shown. Selecting Auto allows the tools the freedom to place the CU in any SLR.
Controlling Implementation Results
When compiling or linking, fine grain control over the hardware generated by
SDAccel for hardware emulation and system builds can be
specified using the --xp
switch.
The --xp
switch is paired with parameters to
configure the Vivado® Design Suite. For instance, the
--xp
switch can configure the optimization,
placement and timing results of the hardware implementation.
The --xp
can also be used to set up emulation
and compile options. Specific examples of these parameters include setting the clock
margin, specifying the depth of FIFOs used in the kernel dataflow region, and specifying
the number of outstanding writes and reads to buffer on the kernel AXI interface. A full
list of parameters and valid values can be found in the SDx Command and Utility Reference Guide.
param:<param_name>=<value>
, where:param
: Required keyword.param_name
: Name of a parameter to apply.value
: Appropriate value for the parameter.
xocc
linker does not check the validity of the parameter
or value. Be careful to apply valid values or the downstream tools might not work
properly.For example:
$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true
-–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"
You must repeat the --xp
switch for each
param
used in the xocc
command as shown below:
$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true
-–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"
You can specify param
values in an xocc.ini file with each option specified on a separate line (without the
--xp
switch).
--xp
settings. Locate the file in the
same directory as the build configuration.
param:compiler.enableDSAIntegrityCheck=true
param:prop:kernel.foo.kernel_flags="-std=c++0x"
Under the GUI flow, if no xocc.ini
is present, the application uses the GUI build settings. Under a Makefile
flow, if no xocc.ini file is present, it will use the configurations within the
Makefile.
The --xp
switch can be added through the
SDx GUI similar to that outlined in Creating Multiple Instances of a Kernel. Right-click the top-level kernel in the Assistant view, and select Settings. From within the Project Settings dialog box, enter the
--xp
option in the XOCC Linker Options field.
You can also add xocc
compiler options and --xp
parameters to kernels by right-clicking the kernel in
the Assistant view. The following image demonstrates the --xp
setting for the krnl_vadd
kernel.
Controlling Report Generation
The xocc
-R
switch controls the level of report generation during
the link stage for hardware emulation and system targets. Builds that generate fewer
reports will typically run more quickly.
The command line option is as follows:
$ xocc -R <report_level>
Where <report_level>
is one of the
following report_level
options:
-R0
: Minimal reports and no intermediate design checkpoints (DCP)-R1
: Includes R0 reports plus:- Identifies design characteristics to review for each kernel
(
report_failfast
) - Identifies design characteristics to review for full design
post-opt (
report_failfast
) - Saves post-opt DCP
- Identifies design characteristics to review for each kernel
(
-R2
: Includes R1 reports plus:- The Vivado default reporting including DCP after each implementation step
- Design characteristics to review for each SLR after
placement (
report_failfast
)
report_failfast
is a utility that highlights potential
device utilization challenges, clock constraint problems, and potential unreachable
target frequency (MHz).The -R
switch can also be added through the
SDx GUI as described in Creating Multiple Instances of a Kernel:
- Right-click the top-level kernel in the Assistant view and select Settings.
- From within the Project Settings dialog box, enter the
-R
option in the XOCC Linker Options field.
Build Targets
The SDAccel build target defines the nature of FPGA binary generated by the build process. There are three different build targets, two emulation targets (software and hardware emulation) used for debug and validation purposes and the default hardware target used to generate the actual FPGA binary.
Software Emulation
The main goal of software emulation is to ensure functional correctness and to partition the application into kernels. For software emulation, both the host code and the kernel code are compiled to run on the host x86 processor. The programmer model of iterative algorithm refinement through fast compile and run loops is preserved. Software emulation has compile and execution times that are the same as a CPU. Refer to the SDAccel Environment Debugging Guide for more information on running software emulation.
In the context of the SDAccel development environment, software emulation on a CPU is the same as the iterative development process that is typical of CPU/GPU programming. In this type of development style, a programmer continuously compiles and runs an application as it is being developed.
For RTL kernels, software emulation can be supported if a C model is associated with the kernel. The RTL kernel wizard packaging step provides an option to associate C model files with the RTL kernel for support of software emulation flows.
Hardware Emulation
While the software emulation flow is a good measure of functional correctness, it does not guarantee correctness on the FPGA execution target. The hardware emulation flow enables the programmer to check the correctness of the logic generated for the custom compute units before deployment on hardware, where a compute unit is an instantiation of a kernel.
The SDAccel environment generates at least one custom compute unit for each kernel in an application. Each kernel is compiled to a hardware model (RTL). During emulation kernels are executed with a hardware simulator, but the rest of the system still uses a C simulator. This allows the SDAccel environment to test the functionality of the logic that will be executed on the FPGA compute fabric.
In addition, hardware emulation provides performance and resource estimation, allowing the programmer to get an insight into the design.
In hardware emulation, compile and execution times are longer in software emulation; thus Xilinx recommends that you use small data sets for debug and validation.
System
When the build target is system, xocc
generates the FPGA binary for the device by running synthesis and implementation on the
design. The binary includes custom logic for every compute unit in the binary container.
Therefore, it is normal for this build step to run for a longer period of time than the
other steps in the SDAccel build flow. However,
because the kernels will be running on actual hardware, their execution times will be
extremely fast.
The generation of custom compute units uses the Vivado High-Level Synthesis (HLS) tool, which is the compute unit generator in the application compilation flow. Automatic optimization of a compute unit for maximum performance is not possible for all coding styles without additional user input to the compiler. The SDAccel Environment Profiling and Optimization Guide discusses the additional user input that can be provided to the SDAccel environment to optimize the implementation of kernel operations into a custom compute unit.
After all compute units have been generated, these units are connected to the infrastructure elements provided by the target device in the solution. The infrastructure elements in a device are all of the memory, control, and I/O data planes which the device developer has defined to support an OpenCL application. The SDAccel environment combines the custom compute units and the base device infrastructure to generate an FPGA binary which is used to program the Xilinx device during application execution.
Specifying a Target
You can specify the target build from the command-line with the following command:
xocc --target sw_emu|hw_emu|hw ...
Similarly, from within the GUI, the build target can be specified by selecting the Active build configuration pull-down tab in the Project Editor window. This provides three choices (see the following figure):
- Emulation-SW
- Emulation-HW
- System
After setting the active build configuration, build the system from the
menu command.The recommended build flow is detailed in Debugging Flows.