RTL Kernels
In the Vitis application acceleration development flow, C++ source code can be compiled into Xilinx® object (XO) files that can be linked with a target platform into an FPGA executable (XCLBIN). RTL IP from the Vivado® Design Suite can also be packaged as XO files that can be linked into an XCLBIN, as long as they adhere to Vivado IP Packaging guidelines, and requirements of the Vitis compiler. Those requirements are described here.
Requirements of an RTL Kernel
An RTL module must meet both interface and software requirements to be used as an RTL kernel within the Vitis tools. For more information on kernel properties, see Kernel Properties.
It might be necessary to revise the RTL module or Vivado IP packaging to meet the kernel requirements outlined in the following sections.
Kernel Interface Requirements
To satisfy the Vitis core development kit execution model, an RTL kernel must adhere to the requirements described in Kernel Properties. The RTL kernel must have at least one clock interface port to supply a clock to the kernel logic. The various interface requirements are summarized in the following table.
Port or Interface | Description | Comment |
---|---|---|
ap_clk | Primary clock input port |
|
ap_clk_2 | Secondary optional clock input port |
|
ap_rst_n | Primary active-Low reset input port |
|
ap_rst_n_2 | secondary optional active-Low reset input |
|
interrupt | Active-High interrupt. |
|
s_axi_control | One (and only one) AXI4-Lite slave control interface |
Note: * The port is generally required, though there are exceptions such
as the Free-Running Kernel.
|
AXI4_MASTER | One or more AXI4 master interfaces for global memory access |
|
AXI4_STREAM | One or more AXI4-Streaminterfaces for one-way data transfers between kernels or between the host application and kernels. |
|
Kernel Controls
The following table outlines the required register map such that a kernel can
be used within the Vitis tools and XRT. The control register
is required by kernels that specify ap_ctrl_hs and
ap_ctrl_chain execution models, while the interrupt related registers are only
required for designs with interrupts. All user-defined registers must begin at location 0x10
; locations below this are reserved.
If your RTL design has a different execution model, it must be adapted to ensure that it will operate in this manner.
Offset | Name | Description |
---|---|---|
0x0 | Control | Controls and provides kernel status. |
0x4 | Global Interrupt Enable | Used to enable interrupt to the host. |
0x8 | IP Interrupt Enable | Used to control which IP generated signal are used to generate an interrupt. |
0xC | IP Interrupt Status | Provides interrupt status. |
0x10 | Kernel arguments | This would include scalars and global memory arguments for instance. |
The following table shows the control signals that are accessed through the
control register (offset 0x0
). The available signals are used
by the different control protocols as explained in Supported Kernel Execution Models in the XRT documentation. For the
sequential execution mode ap_ctrl_hs, for example, the host
typically writes 0x00000001
to the offset 0 control register
which sets Bit 0, clears Bits 1 and 2, and polls on reading ap_done signal until it is a 1.
Bit | Name | Description |
---|---|---|
0 | ap_start | Asserted when the kernel can start processing data. Cleared on handshake with ap_done being asserted. |
1 | ap_done | Asserted when the kernel has completed operation. Cleared on read. |
2 | ap_idle | Asserted when the kernel is idle. |
3 | ap_ready | Asserted by the kernel when it is ready to accept the new data |
4 | ap_continue | Asserted by the XRT to allow kernel keep running |
31:5 | Reserved | Reserved |
The control register or its signals are determined by the kernel execution mode (ap_ctrl_hs or ap_ctrl_chain).
The following interrupt related registers are only required if the kernel has an interrupt.
Bit | Name | Description |
---|---|---|
0 | Global Interrupt Enable | When asserted, along with the IP Interrupt Enable bit, the interrupt is enabled. |
31:1 | Reserved | Reserved |
Bit | Name | Description |
---|---|---|
0 | Interrupt Enable | When asserted, along with the Global Interrupt Enable bit, the interrupt is enabled. |
31:1 | Reserved | Reserved |
Bit | Name | Description |
---|---|---|
0 | Interrupt Status | Toggle on write. |
31:1 | Reserved | Reserved |
Interrupt
RTL kernels can optionally have an interrupt
port containing a single interrupt. The port name must be called interrupt and be active-High. It is enabled when both the global interrupt
enable (GIE
) and interrupt enable register (IER
) bits are asserted in the Control Register block.
By default, the IER uses the internal ap_done signal to trigger an interrupt. Further, the interrupt is cleared only when writing a one to bit-0 of the IP Interrupt Status Register.
This logic should be reflected in the Verilog code for the RTL kernel, and also
in the associated component.xml and kernel.xml files. The kernel.xml file is stored inside the kernel.xo file and is generated automatically when using the package_xo
command or RTL Kernel Wizard.
RTL Kernel Development Flow
This section explains the two-step process for creating RTL kernels for the Vitis core development kit, which includes:
- Package the RTL block as a standard Vivado IP.
- Package the RTL kernel into a Xilinx Object (XO) file.
The packaged XO file is a container encapsulating the Vivado IP object (including source files) and associated kernel XML file. Using the Vitis compiler, the XO file can be combined with other kernels, and linked with the target platform and built for hardware or hardware emulation flows.
Package the RTL Code as a Vivado IP
RTL kernels must be packaged as a Vivado IP that can be used with the IP integrator. For details on IP packaging in the Vivado tool, see the Vivado Design Suite User Guide: Creating and Packaging Custom IP (UG1118).
The following required interfaces for the RTL kernel must be packaged:
- The AXI4-Lite interface name
must be packaged as
S_AXI_CONTROL
, but the underlying AXI ports can be named differently. - Any memory-mapped AXI4
interfaces must be packaged as AXI4 master
endpoints with 64-bit address support.Note: Xilinx strongly recommends that AXI4 interfaces be packaged with AXI meta data
HAS_BURST=0
andSUPPORTS_NARROW_BURST=0
. These properties can be set in an IP-level bd.tcl file. This indicates wrap and fixed burst type is not used, and narrow (sub-size burst) is not used. - You can also implement the AXI4-Stream interface.
ap_clk
andap_clk_2
must be packaged as clock interfaces (ap_clk_2
is only required when the RTL kernel has two clocks).ap_rst_n
andap_rst_n_2
must be packaged as active-Low reset interfaces (when the RTL kernel has a reset).ap_clk
must be associated with all AXI4-Lite, AXI4, AXI4-Stream interfaces, and also any reset signals, ap_rst_n, on the kernel.
To package the IP, use the following steps:
- Create and package a new IP.
- From a Vivado project, with your RTL source files added, select .
- Select Package your current
project, and click Next.
You can select the default location for your IP, or choose a different location.
- To open the Package IP window, select Finish.
- Associate the clock to the AXI interfaces.
In the Ports and Interfaces section of the Package IP window, you can associate the ap_clk with the AXI4 interfaces, and reset signal if needed.
- Right-click an interface, and select Associate Clocks.
This opens the Associate Clocks dialog box which lists the
ap_clk
, and perhapsap_clk_2
. - Select the
ap_clk
and click OK to associate it with the interface. - Make sure to repeat this step to associate
ap_clk
with each of the AXI interfaces, and the reset.
- Right-click an interface, and select Associate Clocks.
- Add the control registers and offsets.
The kernel requires control registers as discussed in Kernel Controls. The following table shows a list of the required registers.
Table 7. Address Map Register Name Description Address Offset Size CTRL Control Signals. IMPORTANT: The CTRL register and <kernel_args> are required on all kernels. The interrupt related registers are only required for designs with interrupts.0x000 32 GIER Global Interrupt Enable Register. Used to enable interrupt to the host. 0x004 32 IP_IER IP Interrupt Enable Register. Used to control which IP generated signal are used to generate an interrupt. 0x008 32 IP_ISR IP Interrupt Status Register. Provides interrupt status. 0x00C 32 <kernel_args> This includes a separate entry for each kernel argument as needed on the software function interface. All user-defined registers must begin at location 0x10
; locations below this are reserved.0x010 32/64 Scalar arguments are 32-bits wide.
m_axi
andaxis
interfaces are 64 bits wide.- To create the address map described in the table,
select the Addressing and Memory
section of the Package
IP window. Right-click in the Address Blocks and select the
Add Register command.
This opens the Add Register dialog box in which you can enter one of the register names from the table above.
- Repeat as needed to add all required registers.This creates a Registers table in the Addressing and Memory section. You can edit the table to add the Description, Address Offset, and Size to each register. The Registers table should look similar to the following example.
- Finally, select the register for each of the pointer
arguments from your table, right-click and select the Add Register Parameter command. Enter
the name
ASSOCIATED_BUSIF
into the dialog box that opens, and click OK.This lets you define an association between the register and the AXI4 Interface. In the value field of the added parameter, enter the name of the
m_axi
interface assigned to the specific argument you are defining. In the example above, the argumentA
uses them00_axi
interface, and the argumentB
uses them01_axi
interface.
- To create the address map described in the table,
select the Addressing and Memory
section of the Package
IP window. Right-click in the Address Blocks and select the
Add Register command.
- Add required properties to the IP:The IP requires a few standard properties that you can add to your core. The easiest way to do this is by using the following commands from the Vivado Tcl Console.
set_property sdx_kernel true [ipx::current_core] set_property sdx_kernel_type rtl [ipx::current_core]
- At this point you should be ready to package your IP.
- Select the Review and Package section of the
Package IP
window, review the Summary and
After Packaging sections, and make whatever
changes are needed.IMPORTANT: You must enable the generation of an IP archive file. If the After Packaging section indicates An archive will not be generated., you must select the Edit packaging settings link and enable the Create archive of IP setting.
- When you are ready, click Package IP.
The Vivado tool packages your kernel IP and opens a dialog box to inform you of success. You can go on to package the kernel using the
package_xo
command, as described in Creating the XO File from the RTL Kernel.
- Select the Review and Package section of the
Package IP
window, review the Summary and
After Packaging sections, and make whatever
changes are needed.
- To test if the RTL kernel is packaged correctly for the IP integrator, try to instantiate the packaged kernel IP into a block design in the IP integrator. For information on the tool, refer to Vivado Design Suite User Guide: Designing IP Subsystems Using IP Integrator (UG994).
- The kernel IP should show the various interfaces described above. Examine
the IP in the canvas view. The properties of the AXI interface can be viewed by
selecting the interface on the canvas. Then in the Block Interface Properties window,
select the Properties tab and expand the
CONFIG table
entry. If an interface is to be read-only or write-only, the unused AXI channels
can be removed and the
READ_WRITE_MODE
is set to read-only or write-only. - If the RTL kernel has constraints which refer to constraints in
the static area such as clocks, then the RTL kernel constraint file needs to be
marked as late processing order to ensure RTL kernel constraints are correctly
applied.
There are two methods to mark constraints as late processing order:
- If the constraints are given in a .ttcl file, add
<: setFileProcessingOrder "late" :>
to the .ttcl preamble section of the file as follows:<: set ComponentName [getComponentNameString] :> <: setOutputDirectory "./" :> <: setFileName $ComponentName :> <: setFileExtension ".xdc" :> <: setFileProcessingOrder "late" :>
- If constraints are defined in an .xdc file, then add the following four
lines starting at
<spirit:define>
in the component.xml. The four lines in the component.xml need to be next to the area where the .xdc file is called. In the following example, my_ip_constraint.xdc file is being called with the subsequent late processing order defined.<spirit:file> <spirit:name>ttcl/my_ip_constraint.xdc</spirit:name> <spirit:userFileType>ttcl</spirit:userFileType> <spirit:userFileType>USED_IN_implementation</spirit:userFileType> <spirit:userFileType>USED_IN_synthesis</spirit:userFileType> <spirit:define> <spirit:name>processing_order</spirit:name> <spirit:value>late</spirit:value> </spirit:define> </spirit:file>
- If the constraints are given in a .ttcl file, add
Creating the XO File from the RTL Kernel
The final step is to package the RTL IP into a Xilinx object (XO) file, so the kernel can be used in the Vitis core development kit. This is done using the package_xo
Tcl command in the Vivado Design Suite.
After packaging the IP, the package_xo
command is run from within the Vivado tool. The
package_xo
command uses the component.xml file from the IP to create the necessary
kernel.xml if possible. The Vivado tool runs design rule checks as a pre-processor
for package_xo
to determine that everything is
available and either processes the IP to create the XO file, or returns errors
indicating any issues that might exist.
The following example packages an RTL kernel IP named test_sincos
, found in the specified IP directory, into an
object file named test.xo, creating the required
kernel.xml file, and using the ap_ctrl_chain
protocol:
package_xo -xo_path ./test.xo -kernel_name test_sincos \
-ctrl_protocol ap_ctrl_chain -ip_directory ./ip/
The output of the package_xo
command
is the test.xo file, that can be added as a source
file to the v++ --link
command as discussed in Building and Running the Application, or added to an application project as
discussed in Using the Vitis IDE.
In some cases, you might find it necessary to provide a kernel.xml file for your IP, as specified in the
requirements described in RTL Kernel XML File. You can
use the -kernel_xml
option to specify the file for the
package_xo
command. In this case, the package_xo
command uses the kernel.xml as specified. The following example shows this command.
package_xo -xo_path ./test.xo -kernel_name test_sincos \
-kernel_xml ./src/kernel.xml -ip_directory ./ip/
To use the RTL kernel during software emulation, you must provide a C-model for the kernel. The C-model must have a function prototype that compiles in hardware to the same interface used in your RTL kernel. However, the C-model does not need to be synthesizeable by the HLS tool.
You can use the package_xo
-kernel_files
option to add a C-model to the packaged RTL kernel:
package_xo -xo_path ./test.xo -kernel_name test_sincos -kernel_xml ./src/kernel.xml \
-ip_directory ./ip/ -kernel_files ./imports/sincos_cmodel.cpp
The package_xo
command packages the
C-model files into cpu_sources inside the XO. The
following C-model file suffixes are automatically recognized:
- .cl = OpenCL
- .c, .cpp, .cxx = C/C++
Design Recommendations for RTL Kernels
While the RTL Kernel Wizard assists in packaging RTL designs for use within the Vitis core development kit, the underlying RTL kernels should be designed with recommendations from the UltraFast Design Methodology Guide for Xilinx FPGAs and SoCs (UG949).
In addition to adhering to the interface and packaging requirements, the kernels should be designed with the following performance goals in mind:
Memory Performance Optimizations for AXI4 Interface
The AXI4 interfaces typically connects to DDR memory controllers in the platform.
For best performance from the memory controller, the following is the recommended AXI interface behavior:
- Use an AXI data width that matches the native memory controller AXI data width, typically 512-bits.
- Do not use
WRAP
,FIXED
, or sub-sized bursts. - Use burst transfer as large as possible (up to 4k byte AXI4 protocol limit).
- Avoid use of deasserted write strobes. Deasserted write strobes can cause error-correction code (ECC) logic in the DDR memory controller to perform read-modify-write operations.
- Use pipelined AXI transactions.
- Avoid using threads if an AXI interface is only connected to one DDR controller.
- Avoid generating write address commands if the kernel does not have the ability to deliver the full write transaction (non-blocking write requests).
- Avoid generating read address commands if the kernel does not have the capacity to accept all the read data without back pressure (non-blocking read requests).
- If a read-only or write-only interfaces are desired, the ports of the unused channels can be commented out in the top level RTL file before the project is packaged into a kernel.
- Using multiple threads can cause larger resource requirements in the infrastructure IP between the kernel and the memory controllers.
Managing Clocks in an RTL Kernel
An RTL kernel can have up to two external clock interfaces; a primary clock, ap_clk, and an optional secondary clock, ap_clk_2. Both clocks can be used for clocking internal logic. However, all external RTL kernel interfaces must be clocked on the primary clock. Both primary and secondary clocks support independent automatic frequency scaling.
If you require additional clocks within the RTL kernel, a frequency synthesizer such as the Clocking Wizard IP or MMCM/PLL primitive can be instantiated within the RTL kernel. Therefore, your RTL kernel can use just the primary clock, both primary and secondary clock, or primary and secondary clock along with an internal frequency synthesizer. The following shows the advantages and disadvantages of using these three RTL kernel clocking methods:
- Single input clock: ap_clk
- External interfaces and internal kernel logic run at the same frequency.
- No clock-domain-crossing (CDC) issues.
- Frequency of ap_clk can automatically be scaled to allow kernel to meet timing.
- Two input clocks: ap_clk and ap_clk_2
- Kernel logic can run at either clock frequency.
- Need proper CDC technique in the RTL kernel to move from one frequency to another.
- Both ap_clk and ap_clk_2 can automatically scale their frequencies independently to allow the kernel to meet timing.
- Using a frequency synthesizer inside the kernel:
- Additional device resources required to generate clocks.
- Must have ap_clk and optionally ap_clk_2 interfaces.
- Generated clocks can have different frequencies for different CUs.
- Kernel logic can run at any available clock frequency.
- Need proper CDC technique to move from one frequency to another.
When using a frequency synthesizer in the RTL kernel there are some constraints you should be aware of:
- RTL external interfaces are clocked at ap_clk.
- The frequency synthesizer can have multiple output clocks that are used as internal clocks to the RTL kernel.
- You must provide a Tcl script to downgrade DRCs related to clock resource
placement in Vivado placement to prevent a DRC error
from occurring. Refer to
CLOCK_DEDICATED_ROUTE
in the Vivado Design Suite Properties Reference Guide (UG912) for more information. The following is an example of the needed Tcl command that you will add to your Tcl script:set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN [get_nets pfm_top_i/static_region/base_clocking/clkwiz_kernel/inst/CLK_CORE_DRP_I/clk_inst/clk_out1
Note: This constraint should be edited to reflect the clock structure of your target platform. - Specify the Tcl script from step 3 for use by Vivado implementation, after optimization, by using the
v++ --vivado.prop
option as described in --vivado Options. The following option specifies a Tcl script for use by Vivado implementation, after completing the optimization step:--vivado.prop:run.impl_1.STEPS.OPT_DESIGN.TCL.POST={<PATH>/<Script_Name>.tcl}
- Specify the two global clock input frequencies which can be used by the
kernels (RTL or HLS-based). Use the
v++ --kernel_frequency
option to ensure the kernel input clock frequency is as expected. For example to specify one clock use:
For two clocks, you can specify multiple frequencies based on the clock ID. The primary clock has clock ID 0 and the secondary has clock ID 1.v++ --kernel_frequency 250
v++ --kernel_frequency 0:250|1:500
TIP: Ensure that the PLL or MMCM output clock is locked before RTL kernel operations. Use the locked signal in the RTL kernel to ensure the clock is operating correctly.
v++
will return an error like the following:
ERROR: [VPL-1] design did not meet timing - Design did not meet timing. One
or more unscalable system clocks did not meet their required target
frequency. Please try specifying a clock frequency lower than 300 MHz using
the '--kernel_frequency' switch for the next compilation. For all system
clocks, this design is using 0 nanoseconds as the threshold worst negative
slack (WNS) value. List of system clocks with timing failure.
In this case you will need to change the internal clock frequency, or optimize the kernel logic to meet timing.
Quality of Results Considerations
The following recommendations help improve results for timing and area:
- Pipeline all reset inputs and internally distribute resets avoiding high fanout nets.
- Reset only essential control logic flip-flops.
- Consider registering input and output signals to the extent possible.
- Understand the size of the kernel relative to the capacity of the target platforms to ensure fit, especially if multiple kernels will be instantiated.
- Recognize platforms that use stacked silicon interconnect (SSI) technology. These devices have multiple die and any logic that must cross between them should be flip-flop to flip-flop timing paths.
Debug and Verification Considerations
- RTL kernels should be verified in their own test bench using advanced verification techniques including verification components, randomization, and protocol checkers. The AXI Verification IP (VIP) is available in the Vivado IP catalog and can help with the verification of AXI interfaces. The RTL kernel example designs contain an AXI VIP-based test bench with sample stimulus files.
- Hardware emulation should be used to test the host code software integration or to view the interaction between multiple kernels.