Migrating to a New Target Platform

This migration content is intended for users who need to migrate their accelerated Vitis™ technology application from one target platform to another. For example, moving an application from an Alveo™ U200 Data Center accelerator card, to an Alveo U280 card.

Design Migration

When migrating an application implemented in one target platform to another, it is important to understand the differences between the target platforms and the impact those differences have on the design.

Key considerations:

Is there a change in the release?
Does the new target platform contain a different target platform?
Do the kernels need to be redistributed across the Super Logic Regions (SLRs)?
Does the design meet the required frequency (timing) performance in the new platform?

The following diagram summarizes the migration flow described in this guide and the topics to consider during the migration process.

Figure 1: Target Platform Migration Flowchart

IMPORTANT: Before starting to migrate a design, it is important to understand the architecture of an FPGA and the target platform.

Understanding an FPGA Architecture

Before migrating any design to a new target platform, you should have a fundamental understanding of the FPGA architecture. The following diagram shows the floorplan of a Xilinx® FPGA device. The concepts to understand are:

SSI Devices
SLRs
SLR routing resources
Memory interfaces

Figure 2: Physical View of Xilinx FPGA with Four SLR Regions

TIP: The FPGA floorplan shown above is for a SSI device with four SLRs where each SLR contains a DDR Memory interface.

Stacked Silicon Interconnect Devices

A SSI device is one in which multiple silicon dies are connected together through silicon interconnect, and packaged into a single device. An SSI device enables high-bandwidth connectivity between multiple die by providing a much greater number of connections. It also imposes much lower latency and consumes dramatically lower power than either a multiple FPGA or a multi-chip module approach, while enabling the integration of massive quantities of interconnect logic, transceivers, and on-chip resources within a single package. The advantages of SSI devices are detailed in Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency (WP380).

Super Logic Region

An SLR is a single FPGA die slice contained in an SSI device. Multiple SLR components are assembled to make up an SSI device. Each SLR contains the active circuitry common to most Xilinx FPGA devices. This circuitry includes large numbers of:

LUTs
Registers
I/O Components
Gigabit Transceivers
Block Memory
DSP Blocks

One or more kernels can be implemented within an SLR. A single kernel can be placed across multiple SLRs if needed.

SLR Routing Resources

The custom hardware implemented on the FPGA is connected via on-chip routing resources. There are two types of routing resources in an SSI device:

Intra-SLR Resources: Intra-SLR routing resource are the fast resources used to connect the hardware logic. The Vitis technology automatically uses the most optimal resources to connect the hardware elements when implementing kernels.
Super Long Line (SLL) Resources: SLLs are routing resources running between SLRs, used to connect logic from one region to the next. These routing resources are slower than intra-SLR routes. However, when a kernel is placed in one SLR, and the DDR it connects to is in another, the Vitis technology automatically implements dedicated hardware to use SLL routing resources without any impact to performance. More information on managing placement are provided in Modifying Kernel Placement.

Memory Interfaces

Each SLR contains one or more memory interfaces. These memory interfaces are used to connect to the DDR memory where the data in the host buffers is copied before kernel execution. Each kernel reads data from the DDR memory and writes the results back to the same DDR memory. The memory interface connects to the pins on the FPGA and includes the memory controller logic.

Understanding Target Platforms

In the Vitis technology, a target platform is the hardware design that is implemented onto the FPGA before any custom logic, or accelerators are added. The target platform defines the attributes of the FPGA and is composed of two regions:

Static region which contains kernel and device management logic.
Dynamic region where the custom logic of the accelerated kernels is placed.

The figure below shows an FPGA with the target platform applied.

Figure 3: Target Platform on an FPGA with Four SLR Regions

The target platform, which is a static region that cannot be modified, contains the logic required to operate the FPGA, and transfer data to and from the dynamic region. The static region, shown above in gray, might exist within a single SLR, or as in the above example, might span multiple SLRs. The static region contains:

DDR memory interface controllers
PCIe® interface logic
XDMA logic
Firewall logic, etc.

The dynamic region is the area shown in white above. This region contains all the reconfigurable components of the target platform and is the region where all the accelerator kernels are placed.

Because the static region consumes some of the hardware resources available on the device, the custom logic to be implemented in the dynamic region can only use the remaining resources. In the example shown above, the target platform defines that all four DDR memory interfaces on the FPGA can be used. This will require resources for the memory controller used in the DDR interface.

Details on how much logic can be implemented in the dynamic region of each target platform is provided in the Vitis Software Platform Release Notes. This topic is also addressed in Modifying Kernel Placement.

Migrating Releases

Before migrating to a new target platform, you should also determine if you will need to target the new platform to a different release of the Vitis technology. If you intend to target a new release, Xilinx highly recommends to first target the existing platform using the new software release to confirm there are no changes required, and then migrate to a new target platform.

There are two steps to follow when targeting a new release with an existing platform:

Host Code Migration
Release Migration

IMPORTANT: Before migrating to a new release, Xilinx recommends that you review the Vitis Software Platform Release Notes.

Host Code Migration

The XILINX_XRT environment variable is used to specify the location of the XRT library environment and must be set before you compile the host code. When the XRT library environment has been installed, the XILINX_XRT environment variable can be set by sourcing the /opt/xilinx/xrt/setup.csh, or /opt/xilinx/xrt/setup.sh file as appropriate. Secondly, ensure that your LD_LIBRARY_PATH variable also points to the XRT library installation area.

To compile and run the host code, source the <INSTALL_DIR>/settings64.csh or <INSTALL_DIR>/settings64.sh file from the Vitis installation.

If you are using the GUI, it will automatically incorporate the new XRT library location and generate the makefile when you build your project.

However, if you are using your own custom makefile, you must use the XILINX_XRT environment variable to set up the XRT library.

Include directories are now specified as: -I${XILINX_XRT}/include and -I${XILINX_XRT}/include/CL
Library path is now: -L${XILINX_XRT}/lib
OpenCL library will be: libxilinxopencl.so, use -lxilinxopencl in your makefile

Release Migration

After migrating the host code, build the code on the existing target platform using the new release of the Vitis technology. Verify that you can run the project in the Vitis unified software platform using the new release, ensure it completes successfully, and meets the timing requirements.

Issues which can occur when using a new release are:

Changes to C libraries or library files.
Changes to kernel path names.
Changes to the HLS pragmas or pragma options embedded in the kernel code.
Changes to C/C++/OpenCL compiler support.
Changes to the performance of kernels: this might require adjustments to the pragmas in the existing kernel code.

Address these issues using the same techniques you would use during the development of any kernel. At this stage, ensure the throughput performance of the target platform using the new release meets your requirements. If there are changes to the final timing (the maximum clock frequency), you can address these when you have moved to the new target platform. This is covered in Address Timing.

Modifying Kernel Placement

The primary issue when targeting a new platform is ensuring that an existing kernel placement will work in the new target platform. Each target platform has an FPGA defined by a static region. As shown in the figure below, the target platform(s) can be different.

The target platform on the left has four SLRs, and the static region is spread across all four SLRs.
The target platform on the right has only three SLRs, and the static region is fully-contained in SLR1.

Figure 4: Comparison of Target Platforms of the Hardware Platform

This section explains how to modify the placement of the kernels.

Implications of a New Hardware Platform

The figure below highlights the issue of kernel placement when migrating to a new target platform. In the example below:

Existing kernel, kernel_B, is too large to fit into SLR2 of the new target platform because most of the SLR is consumed by the static region.
The existing kernel, kernel_D, must be relocated to a new SLR because the new target platform does not have four SLRs like the existing platform.

Figure 5: Migrating Platforms – Kernel Placement

When migrating to a new platform, you need to take the following actions:

Understand the resources available in each SLR of the new target platform, as documented in the Vitis Software Platform Release Notes.
Understand the resources required by each kernel in the design.
Use the v++ --config option to specify which SLR each kernel is placed in, and which DDR bank each kernel connects to. For more details, refer to Assigning Compute Units to SLRs and Mapping Kernel Ports to Memory.

These items are addressed in the remainder of this section.

Determining Where to Place the Kernels

To determine where to place kernels, two pieces of information are required:

Resources available in each SLR of the hardware platform (.xsa).
Resources required for each kernel.

With these two pieces of information you will then determine which kernel or kernels can be placed in each SLR of the target platform.

Keep in mind when performing these calculation that 10% of the available resources can be used by system infrastructure:

Infrastructure logic can be used to connect a kernel to a DDR interface if it has to cross an SLR boundary.
In an FPGA, resources are also used for signal routing. It is never possible to use 100% of all available resources in an FPGA because signal routing also requires resources.

Available SLR Resources

The resources available in each SLR of the various platforms supported by a release can be found in the Vitis Software Platform Release Notes. The table shows an example target platform. In this example:

SLR description indicates which SLR contains static and/or dynamic regions.
Resources available in each SLR (LUTs, Registers, RAM, etc.) are listed.

This allows you to determine what resources are available in each SLR.

Table 1. SLR Resources of a Hardware Platform
Area	SLR 0	SLR 1	SLR 2
SLR description	Bottom of device; dedicated to dynamic region.	Middle of device; shared by dynamic and static region resources.	Top of device; dedicated to dynamic region.
Dynamic region Pblock name	pfa_top_i_dynamic_region_pblock _dynamic_SLR0	pfa_top_i_dynamic_region_pblock _dynamic_SLR1	pfa_top_i_dynamic_region_pblock _dynamic_SLR2
Compute unit placement syntax	set_property CONFIG.SLR_ASSIGNMENTS SLR0[get_bd_cells<cu_name>]	set_property CONFIG.SLR_ASSIGNMENTS SLR1[get_bd_cells<cu_name>]	set_property CONFIG.SLR_ASSIGNMENTS SLR2[get_bd_cells<cu_name>]
Global memory resources available in dynamic region
Memory channels; system port name	bank0 (16 GB DDR4)	bank1 (16 GB DDR4, in static region) bank2 (16 GB DDR4, in dynamic region)	bank3 (16 GB DDR4)
Approximate available fabric resources in dynamic region
CLB LUT	388K	199K	388K
CLB Register	776K	399K	776K
Block RAM Tile	720	420	720
UltraRAM	320	160	320
DSP	2280	1320	2280

Kernel Resources

The resources for each kernel can be obtained from the System Estimate report.

The System Estimate report is available in the Assistant view after either the Hardware Emulation or Hardware run are complete. An example of this report is shown below.

FF refers to the CLB Registers noted in the platform resources for each SLR.
LUT refers to the CLB LUTs noted in the platform resources for each SLR.
DSP refers to the DSPs noted in the platform resources for each SLR.
BRAM refers to the block RAM Tile noted in the platform resources for each SLR.

This information can help you determine the proper SLR assignments for each kernel.

Assigning Kernels to SLRs

Each kernel in a design can be assigned to an SLR region using the connectivity.slr option in a configuration file specified for the v++ --config command line option. Refer to Assigning Compute Units to SLRs for more information.

When placing kernels, Xilinx recommends assigning the specific DDR memory bank that the kernel will connect to using the connectivity.sp config option as described in Mapping Kernel Ports to Memory.

For example, the figure below shows an existing target platform that has four SLRs, and a new target platform with three SLRs. The static region is also structured differently between the two platforms. In this migration example:

Kernel_A is mapped to SLR0.
Kernel_B, which no longer fits in SLR1, is remapped to SLR0, where there are available resources.
Kernel_C is mapped to SLR2.
Kernel_D is remapped to SLR2, where there are available resources.

The kernel mappings are illustrated in the figure below.

Figure 7: Mapping of Kernels Across SLRs

Specifying Kernel Placement

For the example above, the configuration file to assign the kernels would be similar to the following:

[connectivity]
nk=kernel:4:kernel_A.lernel_B.kernel_C.kernel_D

slr=kernel_A:SLR0
slr=kernel_B:SLR0
slr=kernel_C:SLR2
slr=kernel_D:SLR2

The v++ command line to place each of the kernels as shown in the figure above would be:

v++ -l --config config.cfg ...

Specifying Kernel DDR Interfaces

You should also specify the kernel DDR memory interface when specifying kernel placements. Specifying the DDR interface ensures the automatic pipelining of kernel connections to a DDR interface in a different SLR. This ensures there is no degradation in timing which can reduce the maximum clock frequency.

In this example, using the kernel placements in the above figure:

Kernel_A is connected to Memory Bank 0.
Kernel_B is connected to Memory Bank 1.
Kernel_C is connected to Memory Bank 2.
Kernel_D is connected to Memory Bank 1.

The configuration file to perform these connections would be as follows, and passed through the v++ --config command:

[connectivity]
nk=kernel:4:kernel_A.lernel_B.kernel_C.kernel_D

slr=kernel_A:SLR0
slr=kernel_B:SLR0
slr=kernel_C:SLR2
slr=kernel_D:SLR2

sp=kernel_A.arg1:DDR[0]
sp=kernel_B.arg1:DDR[1]
sp=kernel_C.arg1:DDR[2]
sp=kernel_D.arg1:DDR[1]

IMPORTANT: When using the connectivity.sp option to assign kernel ports to memory banks, you must map all interfaces/ports of the kernel. Refer to Mapping Kernel Ports to Memory for more information.

Address Timing

Perform a system run and if it completes with no violations, then the migration is successful.

If timing has not been met you might need to specify some custom constraints to help meet timing. Refer to UltraFast Design Methodology Timing Closure Quick Reference Guide (UG1292) for more information on meeting timing.

Custom Constraints

Custom Tcl constraints for floorplanning, placement, and timing of the kernels will need to be reviewed in the context of the new target platform (.xsa). For example, if a kernel needs to be moved to a different SLR in the new target platform, the placement constraints for that kernel will also need to be modified.

In general, timing is expected to be comparable between different target platforms that are based on the 9P Virtex UltraScale device. Any custom Tcl constraints for timing closure will need to be evaluated and might need to be modified for the new platform.

Custom constraints can be passed to the Vivado® tools using the [advanced] directives of the v++ configuration file specified by the --config option. Refer to Managing Vivado Synthesis and Implementation Results more information.

Timing Closure Considerations

Design performance and timing closure can vary when moving across Vitis releases or target platform(s), especially when one of the following conditions is true:

Floorplan constraints were needed to close timing.
Device or SLR resource utilization was higher than the typical guideline:
- LUT utilization was higher than 70%
- DSP, RAMB, and UltraRAM utilization was higher than 80%
- FD utilization was higher than 50%
High effort compilation strategies were needed to close timing.

The utilization guidelines provide a threshold above which the compilation of the design can take longer, or performance can be lower than initially estimated. For larger designs which usually require using more than one SLR, specify the kernel/DDR association with the v++ --config option, as described in Mapping Kernel Ports to Memory, while verifying that any floorplan constraint ensures the following:

The utilization of each SLR is below the recommended guidelines.
The utilization is balanced across SLRs if one type of hardware resource needs to be higher than the guideline.

For designs with overall high utilization, increasing the amount of pipelining in the kernels, at the cost of higher latency, can greatly help timing closure and achieving higher performance.

For quickly reviewing all aspects listed above, use the fail-fast reports generated throughout the Vitis application acceleration development flow using the -R option as described below (refer to Controlling Report Generation for more information):

v++ –R 1
- report_failfast is run at the end of each kernel synthesis step
- report_failfast is run after opt_design on the entire design
- opt_design DCP is saved
v++ –R 2
- Same reports as with -R 1, plus:
- report_failfast is post-placement for each SLR
- Additional reports and intermediate DCPs are generated

All reports and DCPs can be found in the implementation directory, including kernel synthesis reports:

<runDir>/_x/link/vivado/prj/prj.runs/impl_1

For more information about timing closure and the fail-fast report, see the UltraFast Design Methodology Guide for Xilinx FPGAs and SoCs (UG949).