Building the System
Building the system requires building both the hardware (kernels) and the software (host code) side of the system. The Project Editor view, shown below, gives a top-level view of the build configuration. It provides general information about the active build configuration, including the project name, current platform, and selected system configuration (OS and runtime). It also displays several build options including the selected build target, and options for enabling host and kernel debugging. For more details on build targets see Build Targets while Debugging Applications and Kernels gives details on using the debug options.
The bottom portion of the Editor view lists the current kernels used in the
project. The kernels are listed under the binary container. In the above example, the kernel
krnl_vadd
has been added to binary_container_1
. To add a binary container left-click the icon. You can rename the
binary container by clicking the default name and entering a new name.
To add a kernel to the binary container, left-click the icon located in the Hardware Functions window. It displays a list of kernels defined in the project. Select the kernel from the Add Hardware Functions dialog box as shown in the following figure.
In the Compute Units column, next to the kernel, enter a value to instantiate multiple instances of the kernel (called compute units) as described in Creating Multiple Instances of a Kernel.
With the various options of the active build configuration specified, you can start the build process by clicking on the Build () command.
The SDAccel™ build process generates the host application executable (.exe) and the FPGA binary (.xclbin). The SDAccel environment manages two separate independent build flows:
- Host code (software) build
- Kernel code (hardware) build
SDAccel uses a standard compilation and linking process for both these software and hardware elements of the project. The steps to build both the host and kernel code to generate the selected build target are described in the following sections.
Building the Host Application
The host application, written in C/C++ using OpenCL™
API calls, is built using the Xilinx® C++ compiler
(xcpp
) which is based on GNU compiler collection
(GCC). Each source file is compiled to an object file (.o) and
linked with the Xilinx
SDAccel runtime shared library to create the executable (.exe) which executes on the host CPU.
xcpp
is based on GCC, and therefore supports many standard GCC options
which are not documented here. For information refer to the GCC Option Index.Compiling the Host Application
Each host application source file is compiled using the -c
option and generates an object file (.o).
xcpp ... -c <file_name1> ... <file_nameN>
-o
option.xcpp ... -o <outut_file_name>
You
can produce debugging information using the -g
option.xcpp ... -g
Linking the Host Application
-l
option.xcpp ... -l <object_file1.o> ... <object_fileN.o>
-c
and -l
options are not required, only the source input files are
needed.In the GUI flow, the host code and the kernel code are compiled and linked by clicking the Build () command.
Building the Hardware
The kernel code is written in C, C++, OpenCL C, or RTL and is built by the xocc
compiler; a
command line utility modeled after GCC. The final output of xocc
is the generation of the FPGA binary (.xclbin
) which links the kernel .xo
files and the hardware platform (.dsa). Generation
of the .xclbin
is a two step build process requiring
kernel compilation and linking.
The xocc
can be used standalone (or ideally in
scripts or a build system like make
), and also is fully
supported by the SDx™ IDE.
Build Target
The compilation is dependent on the selected build target, which is
discussed in greater detail in Build Targets. You
can specify the build target using the xocc –target
option as shown below.
xocc --target sw_emu|hw_emu|hw ...
- For software emulation (
sw_emu
), the kernel source code is used during emulation. - For hardware emulation (
hw_emu
), the synthesized RTL code is used for simulation in the hardware emulation flow. - For system build (
hw
),xocc
generates the FPGA binary and the system can be run on hardware.
Compiling the Kernels
During compilation, xocc
compiles
kernel accelerator functions (written in C/C++ or OpenCL language) into Xilinx object
(.xo) files. Each kernel is compiled into
separate .xo files. This is the -c/--compile
mode of xocc
.
Kernels written in RTL are compiled using the package_xo
command line utility. This utility, similar to xocc -c
, also generates .xo files which are subsequently used in the linking stage. See RTL Kernels for more information.
Linking the Kernels
As discussed above, the kernel compilation process results in a Xilinx object file (.xo) whether the kernel is described in OpenCL C, C, C++, or RTL. During the linking stage, .xo files from different kernels are linked with the shell to create the FPGA binary container file (.xclbin) which is needed by the host code.
xocc
command to link files
is:$ xocc -l <kernel_object_file>.xo -o <binary_platform_file>.xclbin
where one more input kernel_object_file
are given and the
binary_platform_file
is the name of the xclbin
output file.
Creating Multiple Instances of a Kernel
During the linking stage, you can specify the number of instances of a
kernel, referred to as a compute unit, through the --nk
xocc
switch. This allows the same kernel function to run in parallel at
application runtime to improve the performance of the host application, using different
device resources on the FPGA.
--nk
options, see SDAccel Environment Programmers Guide (UG1277) and SDx
Command and Utility Reference Guide (UG1279).xocc
--nk
option specifies the number of instances of a given kernel to
instantiate into the .xclbin file. The syntax of
the command is as
follows:$ xocc –nk <kernel name>:<no of instances>:<name1>.<name2>…<nameN>
foo
is
instantiated three times with compute unit names fooA
,
fooB
, and fooC
:
$ xocc --nk foo:3:fooA.fooB.fooC
--sp
.In the GUI flow, the number of compute units can be specified by right-clicking the top-level kernel within the Assistant view, and selecting Settings.
From within the Project Settings dialog box, select the desired kernel
to instantiate and update the Compute units value. In the following figure, the kernel,
krnl_vadd
, will be instantiated three times (that
is, three CUs).
In the figure above, three compute units of the krnl_vadd
kernel will be linked into the FPGA binary (.xclbin), addressable as krnl_vadd_1
, krnl_vadd_2
, and krnl_vadd_3
.
To access the various instances of the kernel, use the OpenCL API clCreateSubDevices
in the host code to divide
the device into multiple sub-devices containing one kernel instance per sub-device. For
specific details, see "Sub-devices" section in SDAccel Environment Programmers Guide (UG1277).
Mapping Kernel Interfaces to Memory Resources
The link phase is when the memory ports of the kernels are connected to memory
resources which include PLRAM and DDR. If not specified, connections to these resources
will be completed automatically during xocc
linking. However, Xilinx recommends specifying these connections for optimal performance.
For additional information, see SDAccel Environment Programmers Guide (UG1277) and SDx
Command and Utility Reference Guide (UG1279).
SDAccel platforms can have access to various memory resources. By mapping the input and output ports from the compute unit to different memory resources for instance, you can improve overall performance by enabling simultaneous access to input and output data.
Use the xocc --sp
option during linking to
map the interface from a compute unit to a memory resource.
Details of coding the host application can be found in the "Memory Data Transfer to/from the FPGA Device" section in the SDAccel Environment Programmers Guide.
The directive to assign a compute unit's memory interface to a memory resource is:
--sp <compute_unit>.<mem_interface>:<memory>
Where
compute_unit
is the name of the compute unit (CU)mem_interface
is the name of one of the compute unit's memory interface or function argumentmemory
is the memory resource
It is necessary to have a separate directive for each memory interface connection.
kernelinfo
if you have the .xo
file or the platforminfo
if you have
the .xclbin
file. For more information on the tool,
see the SDx
Command and Utility Reference Guide (UG1279).m_axi_gmem
from a CU named vadd_1
to DDR[3] memory:
xocc … --sp vadd_1.m_axi_gmem:DDR[3]
In the SDx GUI, the --sp
switch can be added through the SDx GUI similar to
the process outlined in Creating Multiple Instances of a Kernel. Right-click
the top-level kernel in the Assistant view, and
select Settings. From within the Project
Settings dialog box, enter the --sp
option in the
XOCC Linker Options field.
To add directives to the xocc
compilation through the GUI,
from within the Assistant, right-click the
desired kernel under System and select
Settings.
This displays the hardware function settings dialog window where you can change the memory interface mapping under the Compute Unit Settings area. To change the memory resource mapping of a CU for a particular argument, click the Memory setting of the respective argument and change to the desired memory resource. The following figure shows the a argument being selected.
To select the identical memory resource for all CU arguments, click the memory
resource for the CU (that is, kernl_vadd_1
in the
example above) and select the desired memory resource.
--sp
option to assign kernel interfaces to memory banks,
you must specify the --sp
option for all interfaces of
the kernel. Refer to "Customization of DDR Bank to Kernel Connection" in the SDAccel Environment Programmers Guide for more information.Kernel to Kernel Streaming Connection
Kernel to kernel (K2K) streaming provides direct streams between kernels. It is necessary to
specify the stream connections between source and destination kernel stream interfaces.
This is done during xocc
linking through the –sc
option as shown below:
xocc -l --sc <kernel_instance_name>.<source streaming port>:<kernel_instance_name><destination streaming port>
For example, to connect the two streaming ports for the following two kernels:
- Instance name CU_A with an output streaming port called
data_out
. - Instance name CU_B with an input streaming port called
data_in
.
xocc -l --sc CU_A.data_out:CU_B.data_in
Allocating Compute Units to SLRs
A Compute Unit (CU) is allocated to a super logic region (SLR) during xocc
linking using the --slr
directive. The syntax of the command line directive is:
--slr <compute_unit>:<SLR_NUM>
where compute_unit
is the name of the CU and
SLR_NUM
is the SLR number to which the CU is
assigned.
For example, xocc … --slr vadd_1:SLR2
assigns the CU named vadd_1
to SLR2.
The --slr
directive must be applied
separately for each CU in the design. For instance, in the following example, three
invocations of the --slr
directive are used to assign
all three CUs to SLRs; krnl_vadd_1
and krnl_vadd_2
are assigned to SLR1 while krnl_vadd_3
is assigned to SLR2.
--slr krnl_vadd_1:SLR1 --slr krnl_vadd_2:SLR1 --slr krnl_vadd_3:SLR2
In the absence of an --slr
directive for a
CU, the tools are free to place the CU in any SLR. See Kernel SLR and DDR Memory Assignments
for CU SLR mapping recommendations.
In the SDx GUI, to allocate a CU to an SLR in the GUI flow, right-click the desired kernel under System or Emulation-HW configurations and select Settings as shown in the following figure.
This displays the hardware function settings dialog window. Under the Compute Unit Settings area, you can change the SLR where the CU is allocated to by clicking the SLR setting of the respective CU and selecting the desired SLR from the menu as shown. Selecting Auto allows the tools the freedom to place the CU in any SLR.
Controlling Implementation Results
When compiling or linking, fine grain control over the hardware generated by
SDAccel for hardware emulation and system builds can be
specified using the --xp
switch.
The --xp
switch is paired with parameters to
configure the Vivado® Design Suite. For instance, the
--xp
switch can configure the optimization,
placement and timing results of the hardware implementation.
The --xp
can also be used to set up emulation
and compile options. Specific examples of these parameters include setting the clock
margin, specifying the depth of FIFOs used in the kernel dataflow region, and specifying
the number of outstanding writes and reads to buffer on the kernel AXI interface. A full
list of parameters and valid values can be found in the SDx Command and Utility Reference Guide.
param:<param_name>=<value>
, where:param
: Required keyword.param_name
: Name of a parameter to apply.value
: Appropriate value for the parameter.
xocc
linker does not check the validity of the parameter
or value. Be careful to apply valid values or the downstream tools might not work
properly.For example:
$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true
-–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"
You must repeat the --xp
switch for each
param
used in the xocc
command as shown below:
$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true
-–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"
You can specify param
values in an xocc.ini file with each option specified on a separate line (without the
--xp
switch).
--xp
settings. Locate the file in the
same directory as the build configuration.
param:compiler.enableDSAIntegrityCheck=true
param:prop:kernel.foo.kernel_flags="-std=c++0x"
Under the GUI flow, if no xocc.ini
is present, the application uses the GUI build settings. Under a Makefile
flow, if no xocc.ini file is present, it will use the configurations within the
Makefile.
In the SDx GUI, the --xp
switch can be added through the GUI similar to that
outlined in Creating Multiple Instances of a Kernel. Right-click the
top-level kernel in the Assistant view, and
select Settings. From within the Project
Settings dialog box, enter the --xp
option in the
XOCC Linker Options field.
You can also add xocc
compiler options and --xp
parameters to kernels by right-clicking the kernel in
the Assistant view. The following image demonstrates the --xp
setting for the krnl_vadd
kernel.
Controlling Report Generation
The xocc
-R
switch controls the level of report generation during
the link stage for hardware emulation and system targets. Builds that generate fewer
reports will typically run more quickly.
The command line option is as follows:
$ xocc -R <report_level>
Where <report_level>
is one of the
following report_level
options:
-R0
: Minimal reports and no intermediate design checkpoints (DCP)-R1
: Includes R0 reports plus:- Identifies design characteristics to review for each kernel
(
report_failfast
) - Identifies design characteristics to review for full design
post-opt (
report_failfast
) - Saves post-opt DCP
- Identifies design characteristics to review for each kernel
(
-R2
: Includes R1 reports plus:- The Vivado default reporting including DCP after each implementation step
- Design characteristics to review for each SLR after
placement (
report_failfast
)
report_failfast
is a utility that highlights potential
device utilization challenges, clock constraint problems, and potential unreachable
target frequency (MHz).The -R
switch can also be added through the
SDx GUI as described in Creating Multiple Instances of a Kernel:
- Right-click the top-level kernel in the Assistant view and select Settings.
- From within the Project Settings dialog box, enter the
-R
option in the XOCC Linker Options field.
Build Targets
The SDAccel build target defines the nature of FPGA binary generated by the build process. There are three different build targets, two emulation targets (software and hardware emulation) used for debug and validation purposes and the default hardware target used to generate the actual FPGA binary.
Software Emulation
The main goal of software emulation is to ensure functional correctness and to partition the application into kernels. For software emulation, both the host code and the kernel code are compiled to run on the host x86 processor. The programmer model of iterative algorithm refinement through fast compile and run loops is preserved. Software emulation has compile and execution times that are the same as a CPU. Refer to the SDAccel Environment Debugging Guide for more information on running software emulation.
In the context of the SDAccel development environment, software emulation on a CPU is the same as the iterative development process that is typical of CPU/GPU programming. In this type of development style, a programmer continuously compiles and runs an application as it is being developed.
For RTL kernels, software emulation can be supported if a C model is associated with the kernel. The RTL kernel wizard packaging step provides an option to associate C model files with the RTL kernel for support of software emulation flows.
Hardware Emulation
While the software emulation flow is a good measure of functional correctness, it does not guarantee correctness on the FPGA execution target. The hardware emulation flow enables the programmer to check the correctness of the logic generated for the custom compute units before deployment on hardware, where a compute unit is an instantiation of a kernel.
The SDAccel environment generates at least one custom compute unit for each kernel in an application. Each kernel is compiled to a hardware model (RTL). During emulation kernels are executed with a hardware simulator, but the rest of the system still uses a C simulator. This allows the SDAccel environment to test the functionality of the logic that will be executed on the FPGA compute fabric.
In addition, hardware emulation provides performance and resource estimation, allowing the programmer to get an insight into the design.
In hardware emulation, compile and execution times are longer in software emulation; thus Xilinx recommends that you use small data sets for debug and validation.
System
When the build target is system, xocc
generates the FPGA binary for the device by running synthesis and implementation on the
design. The binary includes custom logic for every compute unit in the binary container.
Therefore, it is normal for this build step to run for a longer period of time than the
other steps in the SDAccel build flow. However,
because the kernels will be running on actual hardware, their execution times will be
extremely fast.
The generation of custom compute units uses the Vivado High-Level Synthesis (HLS) tool, which is the compute unit generator in the application compilation flow. Automatic optimization of a compute unit for maximum performance is not possible for all coding styles without additional user input to the compiler. The SDAccel Environment Profiling and Optimization Guide discusses the additional user input that can be provided to the SDAccel environment to optimize the implementation of kernel operations into a custom compute unit.
After all compute units have been generated, these units are connected to the infrastructure elements provided by the target device in the solution. The infrastructure elements in a device are all of the memory, control, and I/O data planes which the device developer has defined to support an OpenCL application. The SDAccel environment combines the custom compute units and the base device infrastructure to generate an FPGA binary which is used to program the Xilinx device during application execution.
Specifying a Target
You can specify the target build from the command-line with the following command:
xocc --target sw_emu|hw_emu|hw ...
Similarly, from within the GUI, the build target can be specified by selecting the Active build configuration pull-down tab in the Project Editor window. This provides three choices (see the following figure):
- Emulation-SW
- Emulation-HW
- System
After setting the active build configuration, build the system from the
menu command.The recommended build flow is detailed in Debugging Flows.