RTL Kernels

In the Vitis application acceleration development flow, C++ source code can be compiled into Xilinx® object (XO) files that can be linked with a target platform into an FPGA executable (XCLBIN). RTL IP from the Vivado® Design Suite can also be packaged as XO files that can be linked into an XCLBIN, as long as they adhere to Vivado IP Packaging guidelines, and requirements of the Vitis compiler. Those requirements are described here.

Requirements of an RTL Kernel

An RTL module must meet both interface and software requirements to be used as an RTL kernel within the Vitis tools. For more information on kernel properties, see Kernel Properties.

It might be necessary to revise the RTL module or Vivado IP packaging to meet the kernel requirements outlined in the following sections.

Kernel Interface Requirements

To satisfy the Vitis core development kit execution model, an RTL kernel must adhere to the requirements described in Kernel Properties. The RTL kernel must have at least one clock interface port to supply a clock to the kernel logic. The various interface requirements are summarized in the following table.

IMPORTANT: In some cases, the port names must be written exactly as shown.
Table 1. RTL Kernel Interface and Port Requirements
Port or Interface Description Comment
ap_clk Primary clock input port
  • Required port.
  • Name must be exact.
ap_clk_2 Secondary optional clock input port
  • Optional port.
  • Name must be exact.
ap_rst_n Primary active-Low reset input port
  • Optional port.
  • Name must be exact.
  • This signal should be internally pipelined to improve timing.
  • This signal is driven by a synchronous reset in the ap_clk clock domain.
ap_rst_n_2 secondary optional active-Low reset input
  • Optional port.
  • Name must be exact.
  • This signal should be internally pipelined to improve timing.
  • This signal is driven by a synchronous reset in the ap_clk_2 clock domain.
interrupt Active-High interrupt.
  • Optional port.
  • Name must be exact.
s_axi_control One (and only one) AXI4-Lite slave control interface
  • Required port*
  • Name must be exact; case sensitive.
Note: * The port is generally required, though there are exceptions such as the Free-Running Kernel.
AXI4_MASTER One or more AXI4 master interfaces for global memory access
  • Optional port.
  • All AXI4 master interfaces must have 64-bit addresses (32 bits on Zynq-7000 devices).
  • The RTL kernel developer is responsible for partitioning global memory spaces. Each partition in the global memory becomes a kernel argument. The memory offset for each partition must be set by a control register programmable via the AXI4-Lite slave interface.
  • AXI4 masters must not use Wrap or Fixed burst types and must not use narrow (sub-size) bursts meaning AxSIZE should match the width of the AXI data bus.
  • Any user logic or RTL code that does not conform to the requirements above, must be wrapped or bridged to satisfy these requirements.
AXI4_STREAM One or more AXI4-Streaminterfaces for one-way data transfers between kernels or between the host application and kernels.
  • Optional port.
  • Cannot be used with bi-directional ports.
  • Use the STREAM interface template in the Vivado Design Suite.
  • Refer to AXI4-Stream Interfaces in Vitis High-Level Synthesis User Guide (UG1399) for additional information on interface requirements.

Kernel Controls

The following table outlines the required register map such that a kernel can be used within the Vitis tools and XRT. The control register is required by kernels that specify ap_ctrl_hs and ap_ctrl_chain execution models, while the interrupt related registers are only required for designs with interrupts. All user-defined registers must begin at location 0x10; locations below this are reserved.

If your RTL design has a different execution model, it must be adapted to ensure that it will operate in this manner.

TIP: Kernels that specify ap_ctrl_none do not require the control registers described below.
Table 2. Register Address Map
Offset Name Description
0x0 Control Controls and provides kernel status.
0x4 Global Interrupt Enable Used to enable interrupt to the host.
0x8 IP Interrupt Enable Used to control which IP generated signal are used to generate an interrupt.
0xC IP Interrupt Status Provides interrupt status.
0x10 Kernel arguments This would include scalars and global memory arguments for instance.

The following table shows the control signals that are accessed through the control register (offset 0x0). The available signals are used by the different control protocols as explained in Supported Kernel Execution Models in the XRT documentation. For the sequential execution mode ap_ctrl_hs, for example, the host typically writes 0x00000001 to the offset 0 control register which sets Bit 0, clears Bits 1 and 2, and polls on reading ap_done signal until it is a 1.

Table 3. Control Register Signals
Bit Name Description
0 ap_start Asserted when the kernel can start processing data. Cleared on handshake with ap_done being asserted.
1 ap_done Asserted when the kernel has completed operation. Cleared on read.
2 ap_idle Asserted when the kernel is idle.
3 ap_ready Asserted by the kernel when it is ready to accept the new data
4 ap_continue Asserted by the XRT to allow kernel keep running
31:5 Reserved Reserved

The control register or its signals are determined by the kernel execution mode (ap_ctrl_hs or ap_ctrl_chain).

TIP: The control register is not required when the kernel execution mode is ap_ctrl_none, as described in Free-Running Kernel.

The following interrupt related registers are only required if the kernel has an interrupt.

Table 4. Global Interrupt Enable (0x4)
Bit Name Description
0 Global Interrupt Enable When asserted, along with the IP Interrupt Enable bit, the interrupt is enabled.
31:1 Reserved Reserved
Table 5. IP Interrupt Enable (0x8)
Bit Name Description
0 Interrupt Enable When asserted, along with the Global Interrupt Enable bit, the interrupt is enabled.
31:1 Reserved Reserved
Table 6. IP Interrupt Status (0xC)
Bit Name Description
0 Interrupt Status Toggle on write.
31:1 Reserved Reserved

Interrupt

RTL kernels can optionally have an interrupt port containing a single interrupt. The port name must be called interrupt and be active-High. It is enabled when both the global interrupt enable (GIE) and interrupt enable register (IER) bits are asserted in the Control Register block.

By default, the IER uses the internal ap_done signal to trigger an interrupt. Further, the interrupt is cleared only when writing a one to bit-0 of the IP Interrupt Status Register.

This logic should be reflected in the Verilog code for the RTL kernel, and also in the associated component.xml and kernel.xml files. The kernel.xml file is stored inside the kernel.xo file and is generated automatically when using the package_xo command or RTL Kernel Wizard.

RTL Kernel Development Flow

This section explains the two-step process for creating RTL kernels for the Vitis core development kit, which includes:

  1. Package the RTL block as a standard Vivado IP.
  2. Package the RTL kernel into a Xilinx Object (XO) file.

The packaged XO file is a container encapsulating the Vivado IP object (including source files) and associated kernel XML file. Using the Vitis compiler, the XO file can be combined with other kernels, and linked with the target platform and built for hardware or hardware emulation flows.

IMPORTANT: An RTL kernel will not support software emulation unless you provide a C-model for the kernel as explained in Creating the XO File from the RTL Kernel.

Package the RTL Code as a Vivado IP

RTL kernels must be packaged as a Vivado IP that can be used with the IP integrator. For details on IP packaging in the Vivado tool, see the Vivado Design Suite User Guide: Creating and Packaging Custom IP (UG1118).

The following required interfaces for the RTL kernel must be packaged:

  • The AXI4-Lite interface name must be packaged as S_AXI_CONTROL, but the underlying AXI ports can be named differently.
  • Any memory-mapped AXI4 interfaces must be packaged as AXI4 master endpoints with 64-bit address support.
    Note: Xilinx strongly recommends that AXI4 interfaces be packaged with AXI meta data HAS_BURST=0 and SUPPORTS_NARROW_BURST=0. These properties can be set in an IP-level bd.tcl file. This indicates wrap and fixed burst type is not used, and narrow (sub-size burst) is not used.
  • You can also implement the AXI4-Stream interface.
  • ap_clk and ap_clk_2 must be packaged as clock interfaces (ap_clk_2 is only required when the RTL kernel has two clocks).
  • ap_rst_n and ap_rst_n_2 must be packaged as active-Low reset interfaces (when the RTL kernel has a reset).
  • ap_clk must be associated with all AXI4-Lite, AXI4, AXI4-Stream interfaces, and also any reset signals, ap_rst_n, on the kernel.

To package the IP, use the following steps:

  1. Create and package a new IP.
    1. From a Vivado project, with your RTL source files added, select Tools > Create and Package New IP.
    2. Select Package your current project, and click Next.

      You can select the default location for your IP, or choose a different location.

    3. To open the Package IP window, select Finish.
  2. Associate the clock to the AXI interfaces.

    In the Ports and Interfaces section of the Package IP window, you can associate the ap_clk with the AXI4 interfaces, and reset signal if needed.

    1. Right-click an interface, and select Associate Clocks.

      This opens the Associate Clocks dialog box which lists the ap_clk, and perhaps ap_clk_2.

    2. Select the ap_clk and click OK to associate it with the interface.
    3. Make sure to repeat this step to associate ap_clk with each of the AXI interfaces, and the reset.
  3. Add the control registers and offsets.

    The kernel requires control registers as discussed in Kernel Controls. The following table shows a list of the required registers.

    Table 7. Address Map
    Register Name Description Address Offset Size
    CTRL Control Signals.
    IMPORTANT: The CTRL register and <kernel_args> are required on all kernels. The interrupt related registers are only required for designs with interrupts.
    0x000 32
    GIER Global Interrupt Enable Register. Used to enable interrupt to the host. 0x004 32
    IP_IER IP Interrupt Enable Register. Used to control which IP generated signal are used to generate an interrupt. 0x008 32
    IP_ISR IP Interrupt Status Register. Provides interrupt status. 0x00C 32
    <kernel_args> This includes a separate entry for each kernel argument as needed on the software function interface. All user-defined registers must begin at location 0x10; locations below this are reserved. 0x010 32/64

    Scalar arguments are 32-bits wide.m_axi and axis interfaces are 64 bits wide.

    1. To create the address map described in the table, select the Addressing and Memory section of the Package IP window. Right-click in the Address Blocks and select the Add Register command.

      This opens the Add Register dialog box in which you can enter one of the register names from the table above.

    2. Repeat as needed to add all required registers.
      This creates a Registers table in the Addressing and Memory section. You can edit the table to add the Description, Address Offset, and Size to each register. The Registers table should look similar to the following example.

    3. Finally, select the register for each of the pointer arguments from your table, right-click and select the Add Register Parameter command. Enter the name ASSOCIATED_BUSIF into the dialog box that opens, and click OK.

      This lets you define an association between the register and the AXI4 Interface. In the value field of the added parameter, enter the name of the m_axi interface assigned to the specific argument you are defining. In the example above, the argument A uses the m00_axi interface, and the argument B uses the m01_axi interface.

  4. Add required properties to the IP:
    The IP requires a few standard properties that you can add to your core. The easiest way to do this is by using the following commands from the Vivado Tcl Console.
    
    set_property sdx_kernel true [ipx::current_core]
    set_property sdx_kernel_type rtl [ipx::current_core]
    
  5. At this point you should be ready to package your IP.
    1. Select the Review and Package section of the Package IP window, review the Summary and After Packaging sections, and make whatever changes are needed.
      IMPORTANT: You must enable the generation of an IP archive file. If the After Packaging section indicates An archive will not be generated., you must select the Edit packaging settings link and enable the Create archive of IP setting.
    2. When you are ready, click Package IP.

      The Vivado tool packages your kernel IP and opens a dialog box to inform you of success. You can go on to package the kernel using the package_xo command, as described in Creating the XO File from the RTL Kernel.

  6. To test if the RTL kernel is packaged correctly for the IP integrator, try to instantiate the packaged kernel IP into a block design in the IP integrator. For information on the tool, refer to Vivado Design Suite User Guide: Designing IP Subsystems Using IP Integrator (UG994).
  7. The kernel IP should show the various interfaces described above. Examine the IP in the canvas view. The properties of the AXI interface can be viewed by selecting the interface on the canvas. Then in the Block Interface Properties window, select the Properties tab and expand the CONFIG table entry. If an interface is to be read-only or write-only, the unused AXI channels can be removed and the READ_WRITE_MODE is set to read-only or write-only.
  8. If the RTL kernel has constraints which refer to constraints in the static area such as clocks, then the RTL kernel constraint file needs to be marked as late processing order to ensure RTL kernel constraints are correctly applied.

    There are two methods to mark constraints as late processing order:

    1. If the constraints are given in a .ttcl file, add <: setFileProcessingOrder "late" :> to the .ttcl preamble section of the file as follows:
      <: set ComponentName [getComponentNameString] :>
      <: setOutputDirectory "./" :>
      <: setFileName $ComponentName :>
      <: setFileExtension ".xdc" :>
      <: setFileProcessingOrder "late" :>
      
    2. If constraints are defined in an .xdc file, then add the following four lines starting at <spirit:define> in the component.xml. The four lines in the component.xml need to be next to the area where the .xdc file is called. In the following example, my_ip_constraint.xdc file is being called with the subsequent late processing order defined.
      <spirit:file>
              <spirit:name>ttcl/my_ip_constraint.xdc</spirit:name>
              <spirit:userFileType>ttcl</spirit:userFileType>
              <spirit:userFileType>USED_IN_implementation</spirit:userFileType>
              <spirit:userFileType>USED_IN_synthesis</spirit:userFileType>
              <spirit:define>
                   <spirit:name>processing_order</spirit:name>
                   <spirit:value>late</spirit:value>
              </spirit:define>
      </spirit:file>

Creating the XO File from the RTL Kernel

The final step is to package the RTL IP into a Xilinx object (XO) file, so the kernel can be used in the Vitis core development kit. This is done using the package_xo Tcl command in the Vivado Design Suite.

After packaging the IP, the package_xo command is run from within the Vivado tool. The package_xo command uses the component.xml file from the IP to create the necessary kernel.xml if possible. The Vivado tool runs design rule checks as a pre-processor for package_xo to determine that everything is available and either processes the IP to create the XO file, or returns errors indicating any issues that might exist.

The following example packages an RTL kernel IP named test_sincos, found in the specified IP directory, into an object file named test.xo, creating the required kernel.xml file, and using the ap_ctrl_chain protocol:

package_xo -xo_path ./test.xo -kernel_name test_sincos \
-ctrl_protocol ap_ctrl_chain -ip_directory ./ip/

The output of the package_xo command is the test.xo file, that can be added as a source file to the v++ --link command as discussed in Building and Running the Application, or added to an application project as discussed in Using the Vitis IDE.

In some cases, you might find it necessary to provide a kernel.xml file for your IP, as specified in the requirements described in RTL Kernel XML File. You can use the -kernel_xml option to specify the file for the package_xo command. In this case, the package_xo command uses the kernel.xml as specified. The following example shows this command.

package_xo -xo_path ./test.xo -kernel_name test_sincos \
-kernel_xml ./src/kernel.xml -ip_directory ./ip/

To use the RTL kernel during software emulation, you must provide a C-model for the kernel. The C-model must have a function prototype that compiles in hardware to the same interface used in your RTL kernel. However, the C-model does not need to be synthesizeable by the HLS tool.

You can use the package_xo -kernel_files option to add a C-model to the packaged RTL kernel:

package_xo -xo_path ./test.xo -kernel_name test_sincos -kernel_xml ./src/kernel.xml \
-ip_directory ./ip/ -kernel_files ./imports/sincos_cmodel.cpp 

The package_xo command packages the C-model files into cpu_sources inside the XO. The following C-model file suffixes are automatically recognized:

  • .cl = OpenCL
  • .c, .cpp, .cxx = C/C++

Design Recommendations for RTL Kernels

While the RTL Kernel Wizard assists in packaging RTL designs for use within the Vitis core development kit, the underlying RTL kernels should be designed with recommendations from the UltraFast Design Methodology Guide for Xilinx FPGAs and SoCs (UG949).

In addition to adhering to the interface and packaging requirements, the kernels should be designed with the following performance goals in mind:

Memory Performance Optimizations for AXI4 Interface

The AXI4 interfaces typically connects to DDR memory controllers in the platform.

Note: For optimal frequency and resource usage, it is recommended that one interface is used per memory controller.

For best performance from the memory controller, the following is the recommended AXI interface behavior:

  • Use an AXI data width that matches the native memory controller AXI data width, typically 512-bits.
  • Do not use WRAP, FIXED, or sub-sized bursts.
  • Use burst transfer as large as possible (up to 4k byte AXI4 protocol limit).
  • Avoid use of deasserted write strobes. Deasserted write strobes can cause error-correction code (ECC) logic in the DDR memory controller to perform read-modify-write operations.
  • Use pipelined AXI transactions.
  • Avoid using threads if an AXI interface is only connected to one DDR controller.
  • Avoid generating write address commands if the kernel does not have the ability to deliver the full write transaction (non-blocking write requests).
  • Avoid generating read address commands if the kernel does not have the capacity to accept all the read data without back pressure (non-blocking read requests).
  • If a read-only or write-only interfaces are desired, the ports of the unused channels can be commented out in the top level RTL file before the project is packaged into a kernel.
  • Using multiple threads can cause larger resource requirements in the infrastructure IP between the kernel and the memory controllers.

Managing Clocks in an RTL Kernel

An RTL kernel can have up to two external clock interfaces; a primary clock, ap_clk, and an optional secondary clock, ap_clk_2. Both clocks can be used for clocking internal logic. However, all external RTL kernel interfaces must be clocked on the primary clock. Both primary and secondary clocks support independent automatic frequency scaling.

Note: When targeting an embedded processor platform or an Alveo accelerator card from the 2020.2 release, you can use the techniques described in Managing Clock Frequencies to map any number of kernel clocks to clock frequencies from the hardware platform.

If you require additional clocks within the RTL kernel, a frequency synthesizer such as the Clocking Wizard IP or MMCM/PLL primitive can be instantiated within the RTL kernel. Therefore, your RTL kernel can use just the primary clock, both primary and secondary clock, or primary and secondary clock along with an internal frequency synthesizer. The following shows the advantages and disadvantages of using these three RTL kernel clocking methods:

  • Single input clock: ap_clk
    • External interfaces and internal kernel logic run at the same frequency.
    • No clock-domain-crossing (CDC) issues.
    • Frequency of ap_clk can automatically be scaled to allow kernel to meet timing.
  • Two input clocks: ap_clk and ap_clk_2
    • Kernel logic can run at either clock frequency.
    • Need proper CDC technique in the RTL kernel to move from one frequency to another.
    • Both ap_clk and ap_clk_2 can automatically scale their frequencies independently to allow the kernel to meet timing.
  • Using a frequency synthesizer inside the kernel:
    • Additional device resources required to generate clocks.
    • Must have ap_clk and optionally ap_clk_2 interfaces.
    • Generated clocks can have different frequencies for different CUs.
    • Kernel logic can run at any available clock frequency.
    • Need proper CDC technique to move from one frequency to another.

When using a frequency synthesizer in the RTL kernel there are some constraints you should be aware of:

  1. RTL external interfaces are clocked at ap_clk.
  2. The frequency synthesizer can have multiple output clocks that are used as internal clocks to the RTL kernel.
  3. You must provide a Tcl script to downgrade DRCs related to clock resource placement in Vivado placement to prevent a DRC error from occurring. Refer to CLOCK_DEDICATED_ROUTE in the Vivado Design Suite Properties Reference Guide (UG912) for more information. The following is an example of the needed Tcl command that you will add to your Tcl script:
    set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN 
    [get_nets pfm_top_i/static_region/base_clocking/clkwiz_kernel/inst/CLK_CORE_DRP_I/clk_inst/clk_out1
    Note: This constraint should be edited to reflect the clock structure of your target platform.
  4. Specify the Tcl script from step 3 for use by Vivado implementation, after optimization, by using the v++ --vivado.prop option as described in --vivado Options. The following option specifies a Tcl script for use by Vivado implementation, after completing the optimization step:
    --vivado.prop:run.impl_1.STEPS.OPT_DESIGN.TCL.POST={<PATH>/<Script_Name>.tcl}
  5. Specify the two global clock input frequencies which can be used by the kernels (RTL or HLS-based). Use the v++ --kernel_frequency option to ensure the kernel input clock frequency is as expected. For example to specify one clock use:
    v++ --kernel_frequency 250
    For two clocks, you can specify multiple frequencies based on the clock ID. The primary clock has clock ID 0 and the secondary has clock ID 1.
    v++ --kernel_frequency 0:250|1:500
    TIP: Ensure that the PLL or MMCM output clock is locked before RTL kernel operations. Use the locked signal in the RTL kernel to ensure the clock is operating correctly.
After adding the frequency synthesizer to an RTL kernel, the generated clocks are not automatically scalable. Ensure the RTL kernel passes timing requirements, or v++ will return an error like the following:
ERROR: [VPL-1] design did not meet timing - Design did not meet timing. One 
or more unscalable system clocks did not meet their required target 
frequency. Please try specifying a clock frequency lower than 300 MHz using 
the '--kernel_frequency' switch for the next compilation. For all system 
clocks, this design is using 0 nanoseconds as the threshold worst negative 
slack (WNS) value. List of system clocks with timing failure.

In this case you will need to change the internal clock frequency, or optimize the kernel logic to meet timing.

Quality of Results Considerations

The following recommendations help improve results for timing and area:

  • Pipeline all reset inputs and internally distribute resets avoiding high fanout nets.
  • Reset only essential control logic flip-flops.
  • Consider registering input and output signals to the extent possible.
  • Understand the size of the kernel relative to the capacity of the target platforms to ensure fit, especially if multiple kernels will be instantiated.
  • Recognize platforms that use stacked silicon interconnect (SSI) technology. These devices have multiple die and any logic that must cross between them should be flip-flop to flip-flop timing paths.

Debug and Verification Considerations

  • RTL kernels should be verified in their own test bench using advanced verification techniques including verification components, randomization, and protocol checkers. The AXI Verification IP (VIP) is available in the Vivado IP catalog and can help with the verification of AXI interfaces. The RTL kernel example designs contain an AXI VIP-based test bench with sample stimulus files.
  • Hardware emulation should be used to test the host code software integration or to view the interaction between multiple kernels.