Introduction to the Vitis Environment for Acceleration
Accelerated Flow Application Development Using the Vitis Software Platform
The Vitis™ unified software platform is a new tool that combines all aspects of Xilinx® software development into one unified environment. The Vitis software platform supports both the Vitis embedded software development flow, for Xilinx Software Development Kit (SDK) users looking to move into the next generation technology, and the Vitis application acceleration development flow, for software developers looking to use the latest in Xilinx FPGA-based software acceleration. This content is primarily concerned with the application acceleration flow, and use of the Vitis core development kit and Xilinx Runtime (XRT).
The Vitis application acceleration development flow provides a framework for developing and delivering FPGA accelerated applications using standard programming languages for both software and hardware components. The software component, or host program, is developed using C/C++ to run on x86 or embedded processors, with OpenCL™ API calls to manage runtime interactions with the accelerator. The hardware component, or kernel, can be developed using C/C++, OpenCL C, or RTL. The Vitis software platform accommodates various methodologies, letting you start by developing either the application or the kernel.
As shown in the figure above, the Vitis unified software platform consists of the following features and elements:
- Vitis technology targets acceleration hardware platforms, such as the Alveo™ Data Center accelerator cards, or Zynq® UltraScale+™ MPSoC and Zynq®-7000 SoC based embedded processor platforms.
- XRT provides an API and drivers for your host program to connect with the target platform, and handles transactions between your host program and accelerated kernels.
- Vitis core development kit provides the software development tool stack, such as compilers and cross-compilers, to build your host program and kernel code, analyzers to let you profile and analyze the performance of your application, and debuggers to help you locate and fix any problems in your application.
- Vitis accelerated libraries provide performance-optimized FPGA acceleration with minimal code changes, and without the need to reimplement your algorithms to harness the benefits of Xilinx adaptive computing. Vitis accelerated libraries are available for common functions of math, statistics, linear algebra and DSP, and also for domain specific applications, like vision and image processing, quantitative finance, database, data analytics, and data compression. For more information on Vitis accelerated libraries, refer to https://xilinx.github.io/Vitis_Libraries/.
FPGA Acceleration
Xilinx FPGAs offer many advantages over traditional CPU/GPU acceleration, including a custom architecture capable of implementing any function that can run on a processor, resulting in better performance at lower power dissipation. When compared with processor architectures, the structures that comprise the programmable logic (PL) fabric in a Xilinx device enable a high degree of parallelism in application execution.
To realize the advantages of software acceleration on a Xilinx device, you should look to accelerate large compute-intensive portions of your application in hardware. Implementing these functions in custom hardware gives you an ideal balance between performance and power.
For more information on how to architect an application for optimal performance and other recommended design techniques, review the Methodology for Accelerating Applications with the Vitis Software Platform.
Execution Model
In the Vitis core development kit, an application program is split between a host application and hardware accelerated kernels with a communication channel between them. The host program, written in C/C++ and using API abstractions like OpenCL, runs on a host processor (such as an x86 server or an Arm processor for embedded platforms), while hardware accelerated kernels run within the programmable logic (PL) region of a Xilinx device.
The API calls, managed by XRT, are used to process transactions between the host program and the hardware accelerators. Communication between the host and the kernel, including control and data transfers, occurs across the PCIe® bus or an AXI bus for embedded platforms. While control information is transferred between specific memory locations in the hardware, global memory is used to transfer data between the host program and the kernels. Global memory is accessible by both the host processor and hardware accelerators, while host memory is only accessible by the host application.
For instance, in a typical application, the host first transfers data to be operated on by the kernel from host memory into global memory. The kernel subsequently operates on the data, storing results back to the global memory. Upon kernel completion, the host transfers the results back into the host memory. Data transfers between the host and global memory introduce latency, which can be costly to the overall application. To achieve acceleration in a real system, the benefits achieved by the hardware acceleration kernels must outweigh the added latency of the data transfers.
The target platform contains the FPGA accelerated kernels, global memory, and the direct memory access (DMA) for memory transfers. Kernels can have one or more global memory interfaces and are programmable. The Vitis core development kit execution model can be broken down into the following steps:
- The host program writes the data needed by a kernel into the global memory of the attached device through the PCIe interface on an Alveo Data Center accelerator card, or through the AXI bus on an embedded platform.
- The host program sets up the kernel with its input parameters.
- The host program triggers the execution of the kernel function on the FPGA.
- The kernel performs the required computation while reading data from global memory, as necessary.
- The kernel writes data back to global memory and notifies the host that it has completed its task.
- The host program reads data back from global memory into the host memory and continues processing as needed.
The FPGA can accommodate multiple kernel instances on the accelerator, both different types of kernels, and multiple instances of the same kernel. XRT transparently orchestrates the interactions between the host program and kernels in the accelerator. XRT architecture documentation is available at https://xilinx.github.io/XRT/.
Build Process
The Vitis core development kit offers all of the features of a standard software development environment:
- Compiler or cross-compiler for host applications running on x86 or Arm® processors.
- Cross-compilers for building the FPGA binary.
- Debugging environment to help identify and resolve issues in the code.
- Performance profilers to identify bottlenecks and help you optimize the application.
The build process follows a standard compilation and linking process for both
the host program and the kernel code. As shown in the following figure, the host program
is built using the GNU C++ compiler (g++
) or the GNU
C++ Arm cross-compiler for MPSoC-based devices. The
FPGA binary is built using the Vitis compiler.
Host Program Build Process
The main application is compiled and linked with the g++
compiler, using the following two step process:
- Compile any required code into object files (.o).
- Link the object files (.o) with the XRT shared library to create the executable.
For details on this topic, refer to Building the Host Program.
FPGA Binary Build Process
Kernels can be described in C/C++, or OpenCL C code, or can be created from packaged RTL designs. As shown in
the figure above, each hardware kernel is independently compiled to a Xilinx object (.xo) file using the Vitis compiler
(v++
) command, the Vitis HLS tool, or the
package_xo
command for RTL kernels.
Xilinx object (.xo) files are linked with the target hardware platform
by the v++ --link
command to create an FPGA binary file (.xclbin) that is loaded into the Xilinx device on the target platform. For Alveo Data Center accelerator cards, the .xclbin
file is the required build object for booting and running the system.
For embedded processor platforms an additional step is required to build
the system. The Vitis compiler package process (v++
--package
) gathers the necessary elements to build a boot image for running
emulation and debug, or for booting and running on the target hardware, as shown below.
The key to building the FPGA binary is to determine the build target you are producing. For more information, refer to Build Targets.
For a detailed explanation of the build process, refer to Building the Device Binary.
Build Targets
The Vitis compiler build process generates the host program executable and the FPGA binary (.xclbin). The nature of the FPGA binary is determined by the build target.
- When the build target is software or hardware emulation, the Vitis compiler generates simulation models of the kernels in the FPGA binary. These emulation targets let you build, run, and iterate the design over relatively quick cycles; debugging the application and evaluating performance.
- When the build target is the hardware system, Vitis compiler generates the .xclbin for the hardware accelerator, using the Vivado Design Suite to run synthesis and implementation. It uses these tools with predefined settings proven to provide good quality of results. Using the Vitis core development kit does not require knowledge of these tools; however, hardware-savvy developers can fully leverage these tools and use all the available features to implement kernels.
The Vitis compiler provides three different build targets, two emulation targets used for debug and validation purposes, and the default hardware target used to generate the actual FPGA binary:
- Software Emulation (
sw_emu
) - Both the host application code and the kernel code are compiled to run on the host processor. This allows iterative algorithm refinement through fast build-and-run loops. This target is useful for identifying syntax errors, performing source-level debugging of the kernel code running together with application, and verifying the behavior of the system.
- Hardware Emulation (
hw_emu
) - The kernel code is compiled into a hardware model (RTL), which is run in a dedicated simulator. This build-and-run loop takes longer but provides a detailed, cycle-accurate view of kernel activity. This target is useful for testing the functionality of the logic that will go in the FPGA and getting initial performance estimates.
- Hardware (
hw
) - The kernel code is compiled into a hardware model (RTL) and then implemented on the FPGA, resulting in a binary that will run on the actual FPGA.
Tutorials and Examples
To help you quickly get started with the Vitis core development kit, you can find tutorials, example applications, and hardware kernels in the following repositories on http://github.com/Xilinx.
- Vitis Application Acceleration Development Flow Tutorials
- Provides a number of tutorials that can be worked through to teach
specific concepts regarding the tool flow and application development.
The Getting Started pathway tutorials are an excellent place to start as a new user.
- Vitis Examples
- Hosts many examples to demonstrate good design practices, coding guidelines, design pattern for common applications, and most importantly, optimization techniques to maximize application performance. The on-boarding examples are divided into several main categories. Each category has various key concepts illustrated by individual examples in both OpenCL™ C and C/C++ frameworks, when applicable. All examples include a Makefile to enable building for software emulation, hardware emulation, and running on hardware, and a README.md file with a detailed explanation of the example.
Now that you have an idea of the elements of the Vitis core development kit and how to write and build an application for acceleration, review the best approach for your design problem.