Programming Model

The Vitis™ core development kit supports heterogeneous computing using a Xilinx provided programming interface, or the industry standard OpenCL™ framework (https://www.khronos.org/opencl/). The host program executes on the processor (x86 or Arm®) and offloads compute intensive tasks through Xilinx Runtime (XRT) to execute on a hardware kernel running on programmable logic (PL) of a Xilinx device.

Device Topology

In the Vitis core development kit, targeted devices can include Xilinx® MPSoCs or UltraScale+™ FPGAs connected to a processor, such as an x86 host through a PCIe bus, or an Arm processor through an AXI4 interface. The FPGA contains a programmable region that implements and executes hardware kernels.

The FPGA platform contains one or more global memory banks. The data transfer from the CPU to kernels and from kernels to the CPU happens through these global memory banks. The kernels running in the FPGA can have one or more memory interfaces (m_axi). The connection from the global memory banks to those memory interfaces are configurable and are defined through the Vitis linking options as described in Linking the Kernels. Kernels can also use streaming interfaces (axis) to stream data directly from one kernel to the next. Streaming connections are also managed through v++ linking options.

Multiple kernels can be implemented in the PL of the Xilinx device, allowing for significant application acceleration. A single kernel can also be instantiated multiple times. The number of instances of a kernel is programmable, and determined by linking options specified when building the FPGA binary. For more information on specifying these options, refer to Linking the Kernels.

Kernel Properties

In the Vitis application acceleration development flow, kernels are the processing elements executing in the PL region of the Xilinx device. The Vitis software platform supports kernels written in C/C++, RTL, or OpenCL C/C++. Regardless of source language, all kernels have the same properties and must adhere to same set of requirements.

Kernels can be defined as software controllable, or non-software controlled. This means that the kernel is controlled through software such as the host application, or is unmanaged by software and is instead data driven.

SW-Controllable Kernels

Software controllable kernels expose a programmable register interface, allowing a host software application to interact with kernels through register reads and write. These are the most common and widely applicable types of kernels. There are two types of SW controllable kernels: user-managed and XRT-managed.

Note: XRT-managed kernels are a specialized form of user-managed kernels.

The primary difference between user-managed and XRT-managed kernels is related to the kernel execution mode. Because XRT relies on the ap_ctrl_chain and ap_ctrl_hs execution protocols generated by Vitis HLS, XRT-managed kernels are better for C++ developers as described in C/C++ Kernels and in Compiling Kernels with Vitis HLS. Alternatively, user-managed kernels can support many different user-defined execution protocols as found in existing Vivado RTL IP, and so are a better fit for RTL designers working with RTL Kernels.

The Vitis application acceleration development flow supports host programs written using the XRT native C/C++ API, which supports both user-managed kernels and XRT-managed kernels, as well as some advanced designs such as never-ending kernels. It also supports host applications using the OpenCL API for XRT-managed kernels. The next sections briefly describe the programming API and the different hardware interfaces required for XRT-managed or user-managed kernels.

Table 1. Software Control Using the XRT API
XRT-Managed Kernels	User-Managed Kernels
The object class for an XRT-managed kernel is `xrt::kernel` The software application communicates with the XRT-managed kernel using higher-level commands such as `set_arg`, `run`, and `wait` The user does not need to know the low-level details of the programmable registers and kernel execution protocols Control and status registers provide XRT with a known interface to interact with the kernel, which makes these high-level commands possible If needed, it is also possible to control an XRT-managed kernel as a user-managed kernel (using atomic register reads and write) OpenCL API can also be used with XRT-managed kernels (`cl::kernel`)	The object class for a user-managed kernel is `xrt::ip` The software application communicates with the user-managed kernel using atomic register reads and writes through the AXI4-Lite interface The application developer is responsible for knowing the address offset and purpose of each register in the kernel, and using them properly There are no checks, high-level controls, or profiling capabilities. The user is responsible for running the simulations for performance analysis/debugging.

Design Languages

SW controllable kernels can be developed using either RTL or C/C++:

RTL: User-managed kernels are the most natural and recommended type of kernel for RTL developers. They offer greater flexibility, offer a wider range of control possibilities, and have fewer requirements than XRT-managed kernels. For more information, see RTL Kernels.
C++: XRT-managed kernels are the default and recommended type of kernel for C/C++ developers as described in C/C++ Kernels. The Vitis compiler, using Vitis HLS, automatically generates interfaces compatible with the high-level XRT API, leaving fewer details for the developer to worry about.

HW Interfaces

The kernel interfaces are used to exchange data with the host application, other kernels, or device I/Os. Both user-managed and XRT-managed have exactly the same interface requirements as listed here:

Programmable interface: AXI4-Lite slave interface. Kernels can only have a single AXI4-Lite interface.
Data interfaces: Any number and combination of AXI4 memory mapped and AXI4-Stream interfaces.
Clock and resets: As described in Clock and Reset Requirements.

TIP: XRT-managed kernels have specific requirements for control registers in the AXI4-Lite interface (including start and stop bits) as described in Control Requirements for XRT-Managed Kernels. User-managed kernels can implement whatever control structure the user specifies.

The following table elaborates the type of interface required based on the characteristics of the data movement in your application.

Table 2. Kernel Interface Types
Register (AXI4-Lite)	Memory Mapped (M_AXI)	Streaming (AXI4-Stream)
Register interfaces must be implemented using a single AXI4-Lite interface. Designed for transferring scalars between the host application and the kernel. Register reads and writes are initiated by the host application. The kernel acts as a slave.	Memory mapped interfaces must be implemented using one or more AXI4 Masters interfaces. Designed for bi-directional data transfers with global memory (DDR, PLRAM, HBM). Introduces additional latency for memory transfers. The kernel acts as a master accessing data stored into global memory. The host application allocates the buffer for the size of the dataset. The base address of the buffer is provided by the host application to the kernel via its AXI4-Lite interface.	Streaming interfaces must be implemented using one or more AXI4-Stream interfaces. Designed for uni-directional data transfers between kernels. The access pattern is sequential. Does not use global memory. Data set is unbounded. A sideband signal can be used to indicate the last value in the stream.

Execution Modes

User-managed kernels have no predefined execution mode. It is up to the kernel designer to implement the control protocol and the execution mechanism. It is the application developer's responsibility to manage the operation of the kernel by executing appropriate sequences of register reads and writes from the host application, in accordance with the user-defined control protocol of the kernel.

XRT-managed kernels, as described in Supported Kernel Execution Models in the XRT documentation, provide defined kernel execution modes supporting overlapping execution of the kernel, or sequential execution.

A kernel is started by the host application using an API call. When the kernel is ready for new data it notifies the host application through bits in the control register.
The default control protocol, ap_ctrl_chain, allows multiple executions of the same kernel to be overlapped and run in a pipelined fashion and thus improve the overall application throughput.
If required, overlapping execution can be disabled by using the ap_ctrl_hs control protocol which forces kernels to run sequentially, waiting until the prior run has completed before starting the next run.

Non-Software Controlled Kernels

These kernels are present in the device but are not visible to or directly accessible by the software application. These kernels do not have a programmable register interface. The kernels must have at least one AXI4-Stream interface. The kernel synchronizes with the rest of the system through these streaming interfaces.

Non-software controlled kernels are considered an advanced feature and should only be used when a software controllable kernel cannot be used. Because they do not have a programmable register interface, control-related information needs to be passed through the data interfaces of the kernel.

Non-software controlled kernels do not require a software API, as the host application is not interacting directly with the kernel. The kernel can be developed as either RTL Kernels, or as C/C++ Kernels as described in Streaming Data in User-Managed Never-Ending Kernels.

HW Interfaces

The kernel interfaces are used to exchange the data with the host application, other kernels, or device I/Os. Non-software controlled kernels have the interface requirements listed here:

Programmable interface: There is no AXI4-Lite interface.
Data interfaces: At least one AXI4-Stream interfaces.
Clock and resets: As described in Clock and Reset Requirements.

Clock and Reset Requirements

These clock and reset requirements apply to both software controllable and non-software controllable kernels.

Table 3. Requirements
C/C++/OpenCL C Kernel	RTL Kernel
C kernel does not require any input from user on clock ports and reset ports. The HLS tool always generates RTL with clock port ap_clk and reset port ap_rst_n. HLS kernels can only have one clock/reset.	RTL kernels require at least one clock port, but a kernel can have multiple clocks. The number of clocks that an RTL can have is primarily determined by the number of clocks that the platform supports. Most data center platforms only support two clocks, but most embedded platforms can have multiple clocks. An active-Low reset port can optionally be associated with a clock through the ASSOCIATED_RESET parameter on the clock.