Defining Interfaces

Introduction to Interface Synthesis

The arguments of the top-level function in a Vitis HLSdesign are synthesized into interfaces and ports that group multiple signals to define the communication protocol between the HLS design and components external to the design. Vitis HLS defines interfaces automatically, using industry standards to specify the protocol used. The type of interfaces that Vitis HLS creates depends on the data type and direction of the parameters of the top-level function, the target flow for the active solution, the default interface configuration settings as specified by config_interface, and any specified INTERFACE pragmas or directives.

TIP: Interfaces can be manually assigned using the INTERFACE pragma or directive. Refer to Adding Pragmas and Directives for more information.
The target flows supported by Vitis HLS as described in Vitis HLS Process Overview include:
  • The Vivado IP flow which is the default flow for the tool
  • The Vitis Kernel flow, which is the bottom-up design flow for the Vitis Application Acceleration Development flow
You can specify the target flow when creating a project solution, as described in Creating a New Vitis HLS Project, or by using the following command:
open_solution -flow_target [vitis | vivado]
The interface defines three elements of the kernel:
  1. The interface defines channels for data to flow into or out of the HLS design. Data can flow from a variety of sources external to the kernel or IP, such as a host application, an external camera or sensor, or from another kernel or IP implemented on the Xilinx device. The default channels for Vitis kernels are AXI adapters as described in Interfaces for Vitis Kernel Flow.
  2. The interface defines the port protocol that is used to control the flow of data through the data channel, defining when the data is valid and can be read or can be written, as defined in Port-Level I/O Protocols.
    TIP: These port protocols can be customized in the Vivado IP flow, but are set and cannot be changed in the Vitis kernel flow, in most cases.
  3. The interface also defines the execution control scheme for the HLS design, specifying the operation of the kernel or IP as pipelined or sequential, as defined in Block-Level Control Protocols.

As described in Designing Efficient Kernels the choice and configuration of interfaces is a key to the success of your design. However, Vitis HLS tries to simplify the process by selecting default interfaces for the target flows. For more information on the defaults used refer to Interfaces for Vivado IP Flow or Interfaces for Vitis Kernel Flow as appropriate to your design.

After synthesis completes you can review the mapping of the software arguments of your C/C++ code to hardware ports or interfaces in the SW I/O Information section of the Synthesis Summary report.

Interfaces for Vitis Kernel Flow

The Vitis kernel flow provides support for compiled kernel objects (.xo) for software control from a host application and by the Xilinx Run Time (XRT). As described in Kernel Properties in the Vitis Unified Software Platform Documentation (UG1416), this flow has very specific interface requirements that Vitis HLS must meet.

Vitis HLS supports memory, stream, and register interface paradigms where each paradigm follows a certain interface protocol and uses the adapter to communicate with the external world.
  • Memory Paradigm (m_axi): the data is accessed by the kernel through memory such as DDR, HBM, PLRAM/BRAM/URAM

  • Stream Paradigm (axis): the data is streamed into the kernel from another streaming source, such as video processor or another kernel, and can also be streamed out of the kernel.

  • Register Paradigm (s_axilite): The data is accessed by the kernel through register interfaces and performed by software register reads/writes.

The Vitis kernel flow implements the following interfaces by default:

C-argument type Paradigm Interface protocol (I/O/Inout)
Scalar(pass by value) Register AXI4-Lite (s_axilite)
Array Memory AXI4 Memory Mapped (m_axi)
Pointer to array Memory m_axi
Pointer to scalar Register s_axilite
Reference Register s_axilite
hls::stream Stream AXI4-Stream (axis)
IMPORTANT: A pointer to an array is implemented as an m_axi interface for data transfer. The pointer to a scalar is implemented using the s_axilite interface. A scalar value passed as a constant does not need read access, while a pointer to a scalar value needs both read/write access. The s_axilite interface implements an additional internal protocol depending upon the C argument type. This internal implementation can be controlled using Port-Level I/O Protocols. However, you should not modify the default port protocols in the Vitis kernel flow unless necessary.

The default execution mode for Vitis kernel flow is pipelined execution, which enables overlapping execution of a kernel to improve throughput. This is specified by the ap_ctrl_chain block control protocol on the s_axilite interface.

TIP: The Vitis environment supports kernels with all of the supported block control protocols as described in Block-Level Control Protocols.

The vadd function in the following code provides an example of interface synthesis.

#define VDATA_SIZE 16

typedef struct v_datatype { unsigned int data[VDATA_SIZE]; } v_dt;

extern "C" {
void vadd(const v_dt* in1, // Read-Only Vector 1
          const v_dt* in2, // Read-Only Vector 2
          v_dt* out_r, // Output Result for Addition
          const unsigned int size // Size in integer 
) {

   unsigned int vSize = ((size - 1) / VDATA_SIZE) + 1;

   // Auto-pipeline is going to apply pipeline to this loop
   vadd1:
   for (int i = 0; i < vSize; i++) {
      vadd2:
      for (int k = 0; k < VDATA_SIZE; k++) {
         out_r[i].data[k] = in1[i].data[k] + in2[i].data[k];
      }
   }
}
}

The vadd function includes:

  • Two pointer inputs: in1 and in2
  • A pointer: out_r that the results are written to
  • A scalar value size

With the default interface synthesis settings used by Vitis HLS for the Vitis kernel flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.

Figure 1: RTL Ports After Default Interface Synthesis

The tool creates three types of interface ports on the RTL design to handle the flow of both data and control.

  • Clock, Reset, and Interrupt ports: ap_clk and ap_rst_n and interrupt are added to the kernel.
  • AXI4-Lite interface: s_axi_control interface which contains the scalar arguments like size, and manages address offsets for the m_axi interface, and defines the block control protocol.
  • AXI4 memory mapped interface: m_axi_gmem interface which contains the pointer arguments: in1, in2, and out_r

Details of M_AXI Interfaces for Vitis

AXI4 memory-mapped (m_axi) interfaces allow kernels to read and write data in global memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of sharing data across different elements of the accelerated application, such as between the host and kernel, or between kernels on the accelerator card. The main advantages for m_axi interfaces are listed below:
  • The interface has independent read and write channels
  • It supports burst-based accesses with potential performance of ~19 GB/s
  • It provides a queue for outstanding transactions
Understanding Burst Access
AXI4 memory-mapped interfaces support high throughput bursts of up to 4K bytes with just a single address phase. With burst mode transfers, Vitis HLS reads or writes data using a single base address followed by multiple sequential data samples, which makes this mode capable of higher data throughput. Burst mode of operation is possible when you use the C memcpy function or a pipelined for loop. Refer to Controlling AXI4 Burst Behavior or Optimizing Burst Transfers for more information.
Automatic Port Widening and Port Width Alignment

As discussed in Automatic Port Width Resizing, Vitis HLS has the ability to automatically widen a port width to facilitate data transfers and improve burst access, if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in Optimizing Burst Transfers, are also needed for port resizing.

In the Vitis Kernel flow automatic port width resizing is enabled by default with the following configuration commands (notice that one command is specified as bits and the other is specified as bytes):
config_interface -m_axi_max_widen_bitwidth 512
config_interface -m_axi_alignment_byte_size 64
Rules for Offset
IMPORTANT: In the Vitis kernel flow the default mode of operation is offset=direct and default_slave_interface=s_axilite and should not be changed.

The correct specification of the offset will let the HLS kernel correctly integrate into the Vitis system. Refer to Offset and Modes of Operation for more information.

Bundle Interfaces - Performance vs. Resource Utilization

By default, Vitis HLS groups function arguments with compatible options into a single m_axi interface adapter as described in M_AXI Bundles. Bundling ports into a single interface helps save device resources by eliminating AXI4 logic, which can be necessary when working in congested designs.

However, a single interface bundle can limit the performance of the kernel because all the memory transfers have to go through a single interface. The m_axi interface has independent READ and WRITE channels, so a single interface can read and write simultaneously, though only at one location. Using multiple bundles lets you increase the bandwidth and throughput of the kernel by creating multiple interfaces to connect to memory banks.

Details of S_AXILITE Interfaces for Vitis

In C++, a function starts to process data when the function is called from a parent function. The function call is pushed onto the stack when called, and removed from the stack when processing is complete to return control to the calling function. This process ensures the parent knows the status of the child.

Since the host and kernel occupy two separate compute spaces in the Vitis kernel flow, the "stack" is managed by the Xilinx Run Time (XRT), and communication is managed through the s_axilite interface. The kernel is software controlled through XRT by reading and writing the control registers of an s_axilite interface as described in S_AXILITE Control Register Map.The interface provides the following features:

Control Protocols
The block control protocol defines control registers in the s_axilite interface that let you set control signals to manage execution and operation of the kernel.
Scalar Arguments
Scalar inputs on a kernel are typical, and can be thought of as programming constants or parameters. The host application transfers these values through the s_axilite interface.
Pointers to Scalar Arguments
Vitis HLS lets you read to or write from a pointer to a scalar value when assigned to an s_axilite interface. Pointers are assigned by default to m_axi interfaces, so this requires you to manually assign the pointer to the s_axilite using the INTERFACE pragma or directive:
int top(int *a, int *b) {
#pragma HLS interface s_axilite port=a
Rules for Offset
Note: The Vitis kernel flow determines the required offsets. Do not specify the offset option in that flow.
Rules for Bundle
The Vitis kernel flow supports only a single s_axilite interface, which means that all s_axilite interfaces must be bundled together.
  • When no bundle is specified the tool automatically creates a default bundle named Control.
  • If for some reason you want to manually specify the bundle name, you must apply the same bundle to all s_axilite interfaces to create a single bundle.

Details of AXIS Interfaces for Vitis

The AXI4-Stream protocol (AXIS) defines a single uni-directional channel for streaming data in a sequential manner. The AXI4-Stream interfaces can burst an unlimited amount of data, which significantly improves performance. Unlike the AXI4 memory-mapped interface which needs an address to read/write the memory, the AXIS interface simply passes data to another AXIS interface without needing an address, and so uses fewer device resources. Combined, these features make the streaming interface a light-weight high performance interface.

The AXI4-Stream works on an industry-standard ready/valid handshake between a producer and consumer, as shown in the figure below. The data transfer is started once the producer sends the TVALID signal, and the consumer responds by sending the TREADY signal. This handshake of data and control should continue until either TREADY or TVALID are set low, or the producer asserts the TLAST signal indicating it is the last data packet of the transfer.

Figure 2: AXI4-Stream Handshake
IMPORTANT: The AXIS interface can only be assigned to the top-level arguments (ports) of a kernel or IP, and cannot be assigned to the arguments of functions internal to the design. Streaming channels used inside the HLS design should use hls::stream and not an AXIS interface.

You should define the streaming data type using hls::stream<T_data_type>, and use the ap_axis struct type to implement the AXIS interface. As explained in AXI4-Stream Interfaces the ap_axis struct lets you choose the implementation of the interface as with or without side-channels:

TIP: You should not define your own struct for modeling the AXIS signals (side channels, TLAST, TVALID). Instead you can overload the TDATA signal for implementing your data type .

Interfaces for Vivado IP Flow

The Vivado IP flow supports a wide variety of I/O protocols and handshakes due to the requirement of supporting FPGA design for a wide variety of applications. This flow supports a traditional system design flow where multiple IP are integrated into a system. IP can be generated through Vitis HLS. In this IP flow there are two modes of control for execution of the system:
  • Software Control: The system is controlled through a software application running on an embedded Arm processor or external x86 processor, using drivers to access elements of the hardware design, and reading and writing registers in the hardware to control the execution of IP in the system.
  • Self Synchronous: In this mode the IP exposes signals which are used for starting and stopping the kernel. These signals are driven by other IP or other elements of the system design that handles the execution of the IP.

The Vivado IP flow supports memory, stream, and register interface paradigms where each paradigm supports different interface protocols to communicate with the external world, as shown in the following table. Note that while the Vitis kernel flow supports only the AXI4 interface adapters, this flow supports a number of different interface types.

Paradigm Description s
Memory Data is accessed by the kernel through memory such as DDR, HBM, PLRAM/BRAM/URAMSupported Interface Protocol ap_memory, BRAM, AXI4 Memory Mapped (m_axi)
Stream Supported InterfaceData is streamed into the kernel from another streaming source, such as video processor or another kernel, and can also be streamed out of the kernel. ap_fifo, AXI4-Stream (axis)
Register Data is accessed by the kernel through register interfaces performed by register reads and writes. ap_none, ap_hs, ap_ack, ap_ovld, ap_vld, and AXI4-Lite adapter (s_axilite).

The default interfaces are defined by the C-argument type in the top-level function, and the default paradigm, as shown in the following table.

C-Argument Type Supported Paradigms Default Paradigm Default Interface Protocol
Input Output Inout
Scalar variable (pass by value) Register Register ap_none N/A N/A
Array Memory, Stream Memory ap_memory ap_memory ap_memory
Pointer Memory, Stream, Register Register ap_none ap_vld ap_ovld
Reference Register Register ap_none ap_vld ap_vld
hls::stream Stream Stream ap_fifo ap_fifo N/A

The default execution mode for Vivado IP flow is sequential execution, which requires the HLS IP to complete one iteration before starting the next. This is specified by the ap_ctrl_hs block control protocol. The control protocol can be changed as specified in Block-Level Control Protocols.

The vadd function in the following code provides an example of interface synthesis in the Vivado IP flow.

#define VDATA_SIZE 16

typedef struct v_datatype { unsigned int data[VDATA_SIZE]; } v_dt;

extern "C" {
void vadd(const v_dt* in1, // Read-Only Vector 1
          const v_dt* in2, // Read-Only Vector 2
          v_dt* out_r, // Output Result for Addition
          const unsigned int size // Size in integer 
) {

   unsigned int vSize = ((size - 1) / VDATA_SIZE) + 1;

   // Auto-pipeline is going to apply pipeline to this loop
   vadd1:
   for (int i = 0; i < vSize; i++) {
      vadd2:
      for (int k = 0; k < VDATA_SIZE; k++) {
         out_r[i].data[k] = in1[i].data[k] + in2[i].data[k];
      }
   }
}
}

The vadd function includes:

  • Two pointer inputs: in1 and in2
  • A pointer: out_r that the results are written to
  • A scalar value size

With the default interface synthesis settings used for the Vivado IP flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.

Figure 3: RTL Ports After Default Interface Synthesis

In the default Vivado IP flow the tool creates three types of interface ports on the RTL design to handle the flow of both data and control.

  • Clock and Reset ports: ap_clk and ap_rst are added to the kernel.
  • Block-level control protocol: The ap_ctrl interface is implemented as an s_axilite interface.
  • Port-level interface protocols: These are created for each argument in the top-level function and the function return (if the function returns a value). As explained in the table above most of the arguments use a port protocol of ap_none, and so have no control signals. In the vadd example above these ports include: in1, in2, and size. However, the out_r_o output port uses the ap_vld protocol and so is associated with the out_r_o_ap_vld signal.

AP_Memory in the Vivado IP Flow

The ap_memory is the default interface for the memory paradigm described in the tables above. In the Vivado IP flow it is used for communicating with memory resources such as BRAM and URAM. The ap_memory protocol also follows the address and data phase. The protocol initially requests to read/write the resource and waits until it receives an acknowledgment of the resource availability. It then initiates the data transfer phase of read/write.

An important consideration for ap_memory is that it can only perform a single beat data transfer to a single address, which is different from m_axi which can do burst accesses. This makes the ap_memory a lightweight protocol, compared to the others.

  • Memory Resources: By default Vitis HLS implements a protocol to communicate with a single-port RAM resource. You can control the implementation of the protocol by specifying the storage_type as part of the INTERFACE pragma or directive. The storage_type lets you explicitly define which type of RAM is used, and which RAM ports are created (single-port or dual-port). If no storage_type is specified Vitis HLS uses:
    • A single-port RAM by default.
    • A dual-port RAM if it reduces the initiation interval or latency.

M_AXI Interfaces in the Vivado IP Flow

AXI4 memory-mapped (m_axi) interfaces allow an IP to read and write data in global memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of sharing data across multiple IP. The main advantages for m_axi interfaces are listed below:
  • The interface has independent read and write channels
  • It supports burst-based accesses with potential performance of ~19 GB/s
  • It provides a queue for outstanding transactions
Understanding Burst Access
AXI4 memory-mapped interfaces support high throughput bursts of up to 4K bytes with just a single address phase. With burst mode transfers, Vitis HLS reads or writes data using a single base address followed by multiple sequential data samples, which makes this mode capable of higher data throughput. Burst mode of operation is possible when you use the C memcpy function or a pipelined for loop. Refer to Controlling AXI4 Burst Behavior or Optimizing Burst Transfers for more information.
Automatic Port Widening and Port Width Alignment
As discussed in Automatic Port Width Resizing, Vitis HLS has the ability to automatically widen a port width to facilitate data transfers and improve burst access when all the preconditions needed for bursting are present. In the Vivado IP flow the following configuration settings disable automatic port width resizing by default. To enable this feature you must change these configuration options (notice that one command is specified as bits and the other is specified as bytes):
config_interface -m_axi_max_widen_bitwidth 0
config_interface -m_axi_alignment_byte_size 0
Specifying Alignment for Vivado IP mode

The alignment for an m_axi port allows the port to read and write memory according to the specified alignment. Choosing the correct alignment is important as it will impact performance in the best case, and can impact functionality in the worst case.

Aligned memory access means that the pointer (or the start address of the data) is a multiple of a type-specific value called the alignment. The alignment is the natural address multiple where the type must be or should be stored (e.g. for performance reasons) on a Memory. For example, Intel 32-bit architecture stores words of 32 bits, each of 4 bytes in the memory. The data is aligned to one-word or 4-byte boundary.

The alignment should be consistent in the system. The alignment is determined when the IP is operating in AXI4 master mode and should be specified, like the Intel 32-bit architecture with 4-byte alignment. When the IP is operating in slave mode the alignment should match the alignment of the master.

Rules for Offset

The default for m_axi offset is offset=direct and default_slave_interface=s_axilite. However, in the Vivado IP flow you can change it as described in Offset and Modes of Operation.

Bundle Interfaces - Performance vs. Resource Utilization

By default, Vitis HLS groups function arguments with compatible options into a single m_axi interface adapter as described in M_AXI Bundles. Bundling ports into a single interface helps save device resources by eliminating AXI4 logic, which can be necessary when working in congested designs.

However, a single interface bundle can limit the performance of the IP because all the memory transfers have to go through a single interface. The m_axi interface has independent READ and WRITE channels, so a single interface can read and write simultaneously, though only at one location. Using multiple bundles lets you increase performance by creating multiple interfaces to connect to memory banks.

S_AXILITE in the Vivado IP Flow

In the Vivado IP flow, the default execution control is managed by register reads and writes through an s_axilite interface using the default ap_ctrl_hs control protocol. The IP is software controlled by reading and writing the control registers of an s_axilite interface as described in S_AXILITE Control Register Map.

The s_axilite interface provides the following features:

Control Protocols
The block control protocol as specified in Block-Level Control Protocols.
Scalar Arguments
Scalar arguments from the top-level function can be mapped to an s_axilite interface which creates a register for the value as described in S_AXILITE Control Register Map. The software can perform reads/writes to this register space.
Rules for Offset
The Vivado IP flow defines the size, or range of addresses assigned to a port based on the data type of the associated C-argument in the top-level function. However, the tool also lets you manually define the offset size as described in S_AXILITE Offset Option.
Rules for Bundle
In the Vivado IP flow you can specify multiple bundles using the s_axilite interface, and this will create a separate interface adapter for each bundle you have defined. However, there are some rules related to using multiple bundles that you should be familiar with as explained in S_AXILITE Bundle Rules.

AP_FIFO in the Vivado IP Flow

In the Vivado IP flow, the ap_fifo interface protocol is the default interface for the streaming paradigm on the interface for communication with a memory resource FIFO, and can also be used as a communication channel between different functions inside the IP. This protocol should only be used if the data is accessed sequentially, and Xilinx strongly recommends using the hls::stream<data type> which implements a FIFO.

TIP: The <data type> should not be the same as the T_data_type, which should only be used on the interface.

AXIS Interfaces in the Vivado IP Flow

The AXI4-Stream protocol (axis) is an alternative for streaming interfaces, and defines a single uni-directional channel for streaming data in a sequential manner. Unlike the m_axi protocol, the AXI4-Stream interfaces can burst an unlimited amount of data, which significantly improves performance. Unlike the AXI4 memory-mapped interface which needs an address to read/write the memory, the axis interface simply passes data to another axis interface without needing an address, and so uses fewer device resources. Combined, these features make the streaming interface a light-weight high performance interface as described in AXI4-Stream Interfaces.

AXI Adapter Interface Protocols

IMPORTANT: As discussed in Interfaces for Vitis Kernel Flow, the AXI4 adapter interfaces are the default interfaces used by Vitis HLS for the Vitis Application Acceleration Development flow, though they are also supported in the Vivado IP flow. TheAXI4-Stream Accelerator Adapter is a soft Xilinx® LogiCORE™ Intellectual Property (IP) core used as a infrastructure block for connecting hardware accelerators to embedded CPUs.

The AXI4 interfaces supported by Vitis HLS include the AXI4-Stream interface (axis), AXI4-Lite (s_axilite), and AXI4 master (m_axi) interfaces. For a complete description of the AXI4 interfaces, including timing and ports, see the Vivado Design Suite: AXI Reference Guide (UG1037).

m_axi
Specify on arrays and pointers (and references in C++) only. The m_axi mode specifies an AXI4 Memory Mapped interface.
TIP: You can group bundle arguments into a single m_axi interface.
s_axilite
Specify this protocol on any type of argument except streams. The s_axilite mode specifies an AXI4-Lite slave interface.
TIP: You can bundle multiple arguments into a single s_axilite interface.
axis
Specify this protocol on input arguments or output arguments only, not on input/output arguments. The axis mode specifies an AXI4-Stream interface.

AXI4 Master Interface

AXI4 memory-mapped (m_axi) interfaces allow kernels to read and write data in global memory (DDR, HBM, PLRAM). Memory-mapped interfaces are a convenient way of sharing data across different elements of the accelerated application, such as between the host and kernel, or between kernels on the accelerator card. The main advantages for m_axi interfaces are listed below:
  • The interface has a separate and independent read and write channels
  • It supports burst-based accesses with potential performance of ~19 GB/s
  • It provides support for outstanding transactions

In the Vitis Kernel flow the m_axi interface is assigned by default to pointer and array arguments. In this flow it supports the following default features:

  • Pointer and array arguments are automatically mapped to the m_axi interface
  • The default mode of operation is offset=slave in the Vitis flow and should not be changed
  • All pointer and array arguments are mapped to a single interface bundle to conserve device resources, and ports share read and write access across the time it is active
  • The default alignment in the Vitis flow is set to 64 bytes
  • The maximum read/write burst length is set to 16 by default
While not used by default in the Vivado IP flow, when the m_axi interface is specified it has the following default features:
  • The default operation mode is offset=off but you can change it as described in Offset and Modes of Operation
  • Assigned pointer and array arguments are mapped to a single interface bundle to conserve device resources, and share the interface across the time it is active
  • The default alignment in Vivado IP flow is set to 1 byte
  • The maximum read/write burst length is set to 16 by default

In both the Vivado IP flow and Vitis kernel flow, the INTERFACE pragma or directive can be used to modify default values as needed.

You can use an AXI4 master interface on array or pointer/reference arguments, which Vitis HLS implements in one of the following modes:

  • Individual data transfers
  • Burst mode data transfers

With individual data transfers, Vitis HLS reads or writes a single element of data for each address. The following example shows a single read and single write operation. In this example, Vitis HLS generates an address on the AXI interface to read a single data value and an address to write a single data value. The interface transfers one data value per address.

void bus (int *d) {
 static int acc = 0;

 acc += *d;
 *d  = acc;
}

With burst mode transfers, Vitis HLS reads or writes data using a single base address followed by multiple sequential data samples, which makes this mode capable of higher data throughput. Burst mode of operation is possible when you use the C memcpy function or a pipelined for loop. Refer to Optimizing Burst Transfers for more information.

IMPORTANT: The C memcpy function is only supported for synthesis when used to transfer data to or from a top-level function argument specified with an AXI4 master interface.

The following example shows a copy of burst mode using the memcpy function. The top-level function argument a is specified as an AXI4 master interface.

void example(volatile int *a){

//Port a is assigned to an AXI4 master interface
#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=s_axilite port=return

 int i;
 int buff[50];

//memcpy creates a burst access to memory
 memcpy(buff,(const int*)a,50*sizeof(int));

 for(i=0; i < 50; i++){
 buff[i] = buff[i] + 100;
 }

 memcpy((int *)a,buff,50*sizeof(int));
}

When this example is synthesized, it results in the interface shown in the following figure.

Note: In this figure, the AXI4 interfaces are collapsed.
Figure 4: AXI4 Interface

The following example shows the same code as the preceding example but uses a for loop to copy the data out:

void example(volatile int *a){

#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=s_axilite port=return

//Port a is assigned to an AXI4 master interface

 int i;
 int buff[50];

//memcpy creates a burst access to memory
 memcpy(buff,(const int*)a,50*sizeof(int));

 for(i=0; i < 50; i++){
 buff[i] = buff[i] + 100;
 }

 for(i=0; i < 50; i++){
#pragma HLS PIPELINE
 a[i] = buff[i];
 }
}

When using a for loop to implement burst reads or writes, follow these requirements:

  • Pipeline the loop
  • Access addresses in increasing order
  • Do not place accesses inside a conditional statement
  • For nested loops, do not flatten loops, because this inhibits the burst operation
Note: Only one read and one write is allowed in a for loop unless the ports are bundled in different AXI ports. The following example shows how to perform two reads in burst mode using different AXI interfaces.

In the following example, Vitis HLS implements the port reads as burst transfers. Port a is specified without using the bundle option and is implemented in the default AXI interface. Port b is specified using a named bundle and is implemented in a separate AXI interface called d2_port.

void example(volatile int *a, int *b){

#pragma HLS INTERFACE s_axilite port=return 
#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=m_axi depth=50 port=b bundle=d2_port


 int i;
 int buff[50];

//copy data in
 for(i=0; i < 50; i++){
#pragma HLS PIPELINE
 buff[i] = a[i] + b[i];
 }
...
 }

Offset and Modes of Operation

IMPORTANT: In the Vitis kernel flow the default mode of operation is offset=slave and should not be changed.

The AXI4 Master interface has a read/write address channel that can be used to read/write specific addresses. By default the m_axi interface starts all read and write operations from the address 0x00000000. For example, given the following code, the design reads data from addresses 0x00000000 to 0x000000C7 (50 32-bit words, gives 200 bytes), which represents 50 address values. The design then writes data back to the same addresses.

#include <stdio.h>
#include <string.h>
 
void example(volatile int *a){
   
#pragma HLS INTERFACE mode=m_axi port=a depth=50
   
  int i;
  int buff[50];
   
  //memcpy creates a burst access to memory
  //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
  //memcpy requires a local buffer to store the results of the memory transaction
  memcpy(buff,(const int*)a,50*sizeof(int));
   
  for(i=0; i < 50; i++){
    buff[i] = buff[i] + 100;
  }
   
  memcpy((int *)a,buff,50*sizeof(int));
}

The tool provides the capability to let the base address be configured statically in the Vivado IP for instance, or dynamically by the application or another IP during run time.

The m_axi interface can be both a master initiating transactions, and also a slave interface that receives the data and sends acknowledgment. Depending on the mode specified with the offset option of the INTERFACE pragma, an HLS IP can use multiple approaches to set the base address.

TIP: The config_interface -m_axi_offset command provides a global setting for the offset, that can be overridden for specific m_axi interfaces using the INTERFACE pragma offset option.
  • Master Mode: When acting as a master interface with different offset options, the m_axi interface start address can be either hard-coded or set at run time.
    • offset=off: Vitis HLS sets a base address for the m_axi interface when the IP is used in the Vivado IP integrator tool. One disadvantage with this approach is that you cannot change the base address during run time. See Customizing AXI4 Master Interfaces in IP Integrator for setting the base address.
      The following example is synthesized with offset=off, the default for the Vivado IP flow.
      void example(volatile int *a){
      #pragma HLS INTERFACE m_axi depth=50 port=a offset=off
         
        int i;
        int buff[50];
         
        //memcpy creates a burst access to memory
        //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
        //memcpy requires a local buffer to store the results of the memory transaction
        memcpy(buff,(const int*)a,50*sizeof(int));
         
        for(i=0; i < 50; i++){
          buff[i] = buff[i] + 100;
        }
         
        memcpy((int *)a,buff,50*sizeof(int));
      }
    • offset=direct: Vitis HLS generates a port on the IP for setting the address. Note the addition of the a port as shown in the figure below. This lets you update the address at run time, so you can have one m_axi interface reading and writing different locations. For example, an HLS module that reads data from an ADC into RAM, and an HLS module that processes that data. Since you can change the address on the module, while one HLS module is processing the initial dataset the other module can be reading more data into different address.
      void example(volatile int *a){
      #pragma HLS INTERFACE m_axi depth=50 port=a offset=direct
      ...
      }
    Figure 5: offset=direct
  • Slave Mode: The slave mode for an interface is set with offset=slave. In this mode the IP will be controlled by the host application, or the micro-controller through the s_axilite interface. This is the default for the Vitis kernel flow, and can also be used in the Vivado IP flow. Here is the flow of operation:
    1. initially, the Host/CPU will start the IP or kernel using the block-level control protocol which is mapped to the s_axilite adapter.
    2. The host will send the scalars and address offsets for the m_axi interfaces through the s_axilite adapter.
    3. The m_axi adapter will read the start address from the s_axilite adapter and store it in a queue.
    4. The HLS design starts to read the data from the global memory.

As shown in the figure below, the HLS design will have both the s_axilite adapter for the base address, and the m_axi to perform read and write transfer to the global memory.

Figure 6: AXI Adapters in Slave Mode
Offset Rules

The following are rules associated with the offset option:

  • Fully Specified Offset: When the user explicitly sets the offset value the tool uses the specified settings. The user can also set different offset values for different m_axi interfaces in the design, and the tool will use the specified offsets.
    #pragma HLS INTERFACE s_axilite port=return
    #pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=out offset=direct
    #pragma HLS INTERFACE mode=m_axi bundle=BUS_B port=in1 offset=slave
    #pragma HLS INTERFACE mode=m_axi bundle=BUS_C port=in2 offset=off
  • No Offset Specified: If there are no offsets specified in the INTERFACE pragma, the tool will defer to the setting specified by config_interface -m_axi_offset.
    Note: If the global m_axi_offset setting is specified, and the design has an s_axilite interface, the global setting is ignored and offset=slave is assumed.
    void top(int *a) {
    #pragma HLS interface mode=m_axi port=a
    #pragma HLS interface mode=s_axilite port=a
    }
Controlling the Address Offset in an AXI4 Interface

By default, the AXI4 master interface starts all read and write operations from address 0x00000000. For example, given the following code, the design reads data from addresses 0x00000000 to 0x000000C7 (50 32-bit words, gives 200 bytes), which represents 50 address values. The design then writes data back to the same addresses.

void example(volatile int *a){

#pragma HLS INTERFACE mode=m_axi depth=50 port=a 
#pragma HLS INTERFACE mode=s_axilite port=return bundle=AXILiteS

 int i;
 int buff[50];

 memcpy(buff,(const int*)a,50*sizeof(int));

 for(i=0; i < 50; i++){
 buff[i] = buff[i] + 100;
 }
 memcpy((int *)a,buff,50*sizeof(int));
}

To apply an address offset, use the -offset option with the INTERFACE directive, and specify one of the following options:

  • off: Does not apply an offset address. This is the default.
  • direct: Adds a 32-bit port to the design for applying an address offset.
  • slave: Adds a 32-bit register inside the AXI4-Lite interface for applying an address offset.

In the final RTL, Vitis HLS applies the address offset directly to any read or write address generated by the AXI4 master interface. This allows the design to access any address location in the system.

If you use the slave option in an AXI interface, you must use an AXI4-Lite port on the design interface. Xilinx recommends that you implement the AXI4-Lite interface using the following pragma:

#pragma HLS INTERFACE mode=s_axilite port=return

In addition, if you use the slave option and you used several AXI4-Lite interfaces, you must ensure that the AXI master port offset register is bundled into the correct AXI4-Lite interface. In the following example, port a is implemented as an AXI master interface with an offset and AXI4-Lite interfaces called AXI_Lite_1 and AXI_Lite_2:

#pragma HLS INTERFACE mode=m_axi port=a depth=50 offset=slave 
#pragma HLS INTERFACE mode=s_axilite port=return bundle=AXI_Lite_1
#pragma HLS INTERFACE mode=s_axilite port=b bundle=AXI_Lite_2

The following INTERFACE directive is required to ensure that the offset register for port a is bundled into the AXI4-Lite interface called AXI_Lite_1:

#pragma HLS INTERFACE mode=s_axilite port=a bundle=AXI_Lite_1

M_AXI Bundles

Vitis HLS groups function arguments with compatible options into a single m_axi interface adapter. Bundling ports into a single interface helps save FPGA resources by eliminating AXI logic, but it can limit the performance of the kernel because all the memory transfers have to go through a single interface. The m_axi interface has independent READ and WRITE channels, so a single interface can read and write simultaneously, though only at one location. Using multiple bundles the bandwidth and throughput of the kernel can be increased by creating multiple interfaces to connect to multiple memory banks.

In the following example all the pointer arguments are grouped into a single m_axi adapter using the interface option bundle=BUS_A, and adds a single s_axilite adapter for the m_axi offsets, the scalar argument size, and the function return.

extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
          const unsigned int *in2, // Read-Only Vector 2
          unsigned int *out,     // Output Result
          int size                 // Size in integer
          ) {
 
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=in2
#pragma HLS INTERFACE mode=s_axilite port=in1
#pragma HLS INTERFACE mode=s_axilite port=in2
#pragma HLS INTERFACE mode=s_axilite port=out
#pragma HLS INTERFACE mode=s_axilite port=size
#pragma HLS INTERFACE mode=s_axilite port=return
Figure 7: MAXI and S_AXILITE

You can also choose to bundle function arguments into separate interface adapters as shown in the following code. Here the argument in2 is grouped into a separate interface adapter with bundle=BUS_B. This creates a new m_axi interface adapter for port in2.

extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
          const unsigned int *in2, // Read-Only Vector 2
          unsigned int *out,     // Output Result
          int size                 // Size in integer
          ) {
 
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE mode=m_axi bundle=BUS_B port=in2
#pragma HLS INTERFACE mode=s_axilite port=in1
#pragma HLS INTERFACE mode=s_axilite port=in2
#pragma HLS INTERFACE mode=s_axilite port=out
#pragma HLS INTERFACE mode=s_axilite port=size
#pragma HLS INTERFACE mode=s_axilite port=return
Figure 8: 2 MAXI Bundles
Bundle Rules

The global configuration command config_interface -m_axi_auto_max_ports false will limit the number of interface bundles to the minimum required. It will allow the tool to group compatible ports into a single m_axi interface. The default setting for this command is disabled (false), but you can enable it to maximize bandwidth by creating a separate m_axi adapter for each port.

With m_axi_auto_max_ports disabled, the following are some rules for how the tool handles bundles under different circumstances:

  1. Default Bundle Name: The tool groups all interface ports with no bundle name into a single m_axi interface port using the tool default name bundle=<default>, and names the RTL port m_axi_<default>. The following pragmas:
    #pragma HLS INTERFACE mode=m_axi port=a depth=50 
    #pragma HLS INTERFACE mode=m_axi port=a depth=50
    #pragma HLS INTERFACE mode=m_axi port=a depth=50 
    

    Result in the following messages:

    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    
  2. User-Specified Bundle Names: The tool groups all interface ports with the same user-specified bundle=<string> into the same m_axi interface port, and names the RTL port the value specified by m_axi_<string>. Ports without bundle assignments are grouped into the default bundle as described above. The following pragmas:
    #pragma HLS INTERFACE mode=m_axi port=a depth=50 bundle=BUS_A
    #pragma HLS INTERFACE mode=m_axi port=b depth=50
    #pragma HLS INTERFACE mode=m_axi port=c depth=50
    

    Result in the following messages:

    INFO: [RTGEN 206-500] Setting interface mode on port 'example/BUS_A' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    
    IMPORTANT: If you bundle incompatible interfaces Vitis HLS issues a message and ignores the bundle assignment.

Controlling AXI4 Burst Behavior

An optimal AXI4 interface is one in which the design never stalls while waiting to access the bus, and after bus access is granted, the bus never stalls while waiting for the design to read/write. To create the optimal AXI4 interface, the following options are provided in the INTERFACE pragma or directive to specify the behavior of the bursts and optimize the efficiency of the AXI4 interface. Refer to Optimizing Burst Transfers for more information on burst transfers.

Some of these options use internal storage to buffer data and may have an impact on area and resources:

  • latency: Specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request a number of cycles (latency) before the read or write is expected. If this figure is too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be granted but the bus may stall waiting on the design to start the access.
  • max_read_burst_length: Specifies the maximum number of data values read during a burst transfer.
  • num_read_outstanding: Specifies how many read requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size: num_read_outstanding*max_read_burst_length*word_size.
  • max_write_burst_length: Specifies the maximum number of data values written during a burst transfer.
  • num_write_outstanding: Specifies how many write requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size: num_read_outstanding*max_read_burst_length*word_size

The following example can be used to help explain these options:

 #pragma HLS interface mode=m_axi port=input offset=slave bundle=gmem0 
depth=1024*1024*16/(512/8) 
 latency=100 
 num_read_outstanding=32 
 num_write_outstanding=32 
 max_read_burst_length=16
 max_write_burst_length=16 

The interface is specified as having a latency of 100. Vitis HLS seeks to schedule the request for burst access 100 clock cycles before the design is ready to access the AXI4 bus. To further improve bus efficiency, the options num_write_outstanding and num_read_outstanding ensure the design contains enough buffering to store up to 32 read and write accesses. This allows the design to continue processing until the bus requests are serviced. Finally, the options max_read_burst_length and max_write_burst_length ensure the maximum burst size is 16 and that the AXI4 interface does not hold the bus for longer than this.

These options allow the behavior of the AXI4 interface to be optimized for the system in which it will operate. The efficiency of the operation does depend on these values being set accurately.

Automatic Port Width Resizing

In the Vitis tool flow Vitis HLS provides the ability to automatically re-size m_axi interface ports to 512-bits to improve burst access. However, automatic port width resizing only supports standard C data types and does not support non-aggregate types such as ap_int, ap_uint, struct, or array.

IMPORTANT: Structs on the interface prevent automatic widening of the port. You must break the struct into individual elements to enable this feature.

Vitis HLS controls automatic port width resizing using the following two commands:

  • config_interface -m_axi_max_widen_bitwidth <N>: Directs the tool to automatically widen bursts on M-AXI interfaces up to the specified bitwidth. The value of <N> must be a power-of-two between 0 and 1024.
  • config_interface -m_axi_alignment_byte_size <N>: Note that burst widening also requires strong alignment properties. Assume pointers that are mapped to m_axi interfaces are at least aligned to the provided width in bytes (power of two). This can help automatic burst widening.
In the Vitis Kernel flow automatic port width resizing is enabled by default with the following:
config_interface -m_axi_max_widen_bitwidth 512
config_interface -m_axi_alignment_byte_size 64
In the Vivado IP flow this feature is disabled by default:
config_interface -m_axi_max_widen_bitwidth 0
config_interface -m_axi_alignment_byte_size 0

Automatic port width resizing will only re-size the port if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in Optimizing Burst Transfers, are also needed for port resizing. These conditions include:

  • Must be a monotonically increasing order of access (both in terms of the memory location being accessed as well as in time). You cannot access a memory location that is in between two previously accessed memory locations- aka no overlap.
  • The access pattern from the global memory should be in sequential order, and with the following additional requirements:
    • The sequential accesses need to be on a non-vector type
    • The start of the sequential accesses needs to be aligned to the widen word size
    • The length of the sequential accesses needs to be divisible by the widen factor

The following code example is used in the calculations that follow:

vadd_pipeline:
  for (int i = 0; i < iterations; i++) {
#pragma HLS LOOP_TRIPCOUNT min = c_len/c_n max = c_len/c_n

  // Pipelining loops that access only one variable is the ideal way to
  // increase the global memory bandwidth.
  read_a:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      result[x] = a[i * N + x];
    }

  read_b:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      result[x] += b[i * N + x];
    }

  write_c:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      c[i * N + x] = result[x];
    }
  }
}
}

The width of the automatic optimization for the code above is performed in three steps:

  1. The tool checks for the number of access patterns in the read_a loop. There is one access during one loop iteration, so the optimization determines the interface bit-width as 32= 32 *1 (bitwidth of the int variable * accesses).
  2. The tool tries to reach the default max specified by the config_interface m_axi_max_widen_bitwidth 512, using the following expression terms:
    length = (ceil((loop-bound of index inner loops) * 
    (loop-bound of index - outer loops)) * #(of access-patterns))
    • In the above code, the outer loop is an imperfect loop so there will not be burst transfers on the outer-loop. Therefore the length will only include the inner-loop. Therefore the formula will be shortened to:
      length = (ceil((loop-bound of index inner loops)) * #(of access-patterns))

      or: length = ceil(128) *32 = 4096

  3. Is the calculated length a power of 2? If Yes, then the length will be capped to the width specified by the m_axi_max_widen_bitwidth.

There are some pros and cons to using the automatic port width resizing which you should consider when using this feature. This feature improves the read latency from the DDR as the tool is reading a big vector, instead of the data type size. It also adds more resources as it needs to buffer the huge vector and shift the data accordingly to the data path size.

Creating an AXI4 Interface with 32-bit Address

By default, Vitis HLS implements the AXI4 port with a 64-bit address bus. However, some devices such as the Zynq-7000 have a 32 bit address bus. In this case you can implement the AXI4 interface with a 32-bit address bus by disabling the m_axi_addr64 interface configuration option as follows:
  1. Select Solution > Solution Settings.
  2. In the Solution Settings dialog box, click the General category, and Edit the existing config_interface command, or click Add to add one.
  3. In the Edit or Add dialog box, select config_interface, and disable m_axi_addr64.
IMPORTANT: When you disable the m_axi_addr64 option, Vitis HLS implements all AXI4 interfaces in the design with a 32-bit address bus.

Customizing AXI4 Master Interfaces in IP Integrator

When you incorporate an HLS RTL design that uses an AXI4 master interface into a design in the Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click, and select Customize Block to customize any of the settings provided. A complete description of the AXI4 parameters is provided in this link in the Vivado Design Suite: AXI Reference Guide (UG1037).

The following figure shows the Re-Customize IP dialog box for the design shown below. This design includes an AXI4-Lite port.

Figure 9: Customizing AXI4 Master Interfaces in IP Integrator

AXI4-Lite Interface

Overview

An HLS IP or kernel can be controlled by a host application, or embedded processor using the Slave AXI4-Lite interface (s_axilite) which acts as a system bus for communication between the processor and the kernel. Using the s_axilite interface the host or an embedded processor can start and stop the kernel, and read or write data to it. When Vitis HLS synthesizes the design the s_axilite interface is implemented as an adapter that captures the data that was communicated from the host in registers on the adapter.

The AXI4-Lite interface performs several functions within a Vivado IP or Vitis kernel:

  • It maps a block-level control mechanism which can be used to start and stop the kernel.
  • It provides a channel for passing scalar arguments, pointers to scalar values, function return values, and address offsets for m_axi interfaces from the host to the IP or kernel
  • For the Vitis Kernel flow:
    • The tool will automatically infer the s_axilite interface pragma to provide offsets to pointer arguments assigned to m_axi interfaces, scalar values, and function return type.
    • Vitis HLS lets you read to or write from a pointer to a scalar value when assigned to an s_axilite interface. Pointers are assigned by default to m_axi interfaces, so this requires you to manually assign the pointer to the s_axilite using the INTERFACE pragma or directive:
      int top(int *a, int *b) {
      #pragma HLS interface s_axilite port=a
    • Bundle: Do not specify the bundle option for the s_axilite adapter in the Vitis Kernel flow. The tool will create a single s_axilite interface that will serve for the whole design.
      IMPORTANT: HLS will return an error if multiple bundles are specified for the Vitis Kernel flow.
    • Offset: The tool will automatically choose the offsets for the interface. Do not specify any offsets in this flow.
  • For the Vivado IP flow:
    • This flow will not use the s_axilite interface by default.
    • To use the s_axilite as a communication channel for scalar arguments, pointers to scalar values, offset to m_axi pointer address, and function return type, you must manually specify the INTERFACE pragma or directive.
    • Bundle: This flow supports multiple s_axilite interfaces, specified by bundle. Refer to S_AXILITE Bundle Rules for more information.
    • Offset: By default the tool will place the arguments in a sequential order starting from 0x10 in the control register map. Refer to S_AXILITE Offset Option for additional details.

S_AXILITE Example

The following example shows how Vitis HLS implements multiple arguments, including the function return, as an s_axilite interface. Because each pragma uses the same name for the bundle option, each of the ports is grouped into a single interface.

void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=c bundle=BUS_A
#pragma HLS INTERFACE mode=ap_vld port=b 

  *c += *a + *b;
}
TIP: If you do not specify the bundle option, Vitis HLS groups all arguments into a single s_axilite bundle and automatically names the port.
The synthesized example will be part of a system that has three important elements as shown in the figure below:
  1. Host application running on an x86 or embedded processor interacting with the IP or kernel
  2. SAXI Lite Adapter: The INTERFACE pragma implements an s_axilite adapter. The adapter has two primary functions: implementing the interface protocol to communicate with the host, and providing a Control Register Map to the IP or kernel.
  3. The HLS engine or function that implements the design logic
Figure 10: S_AXILITE Adapter

By default, Vitis HLS automatically assigns the address for each port that is grouped into an s_axilite interface. The size, or range of addresses assigned to a port is dependent on the argument data type and the port protocol used, as described below. You can also explicitly define the address using the offset option as discussed in S_AXILITE Offset Option.

  • Port a: By default, is implemented as ap_none. 1-word for the data signal is assigned and only 3 bits are used as the argument data type is char. Remaining bits are unused.
  • Port b: is implemented as ap_vld defined by the INTERFACE pragma in the example. The corresponding control register is of size 2 bytes (16-bits) and is divided into two sections as follows:
    • (0x1c) Control signal : 1-word for the control signal is assigned.
    • (0x18) Data signal: 1-word for the data signal is assigned and only 3 bits are used as the argument data type is char. Remaining bits are unused.
  • Port c: By default, is implemented as ap_ovld as an output. The corresponding control register is of size 4 bytes (32 bits) and is divided into three sections:
    • (0x20) Data signal of c_i: 1-word for the input data signal is assigned, and only 3 bits are used as the argument data type is char, the rest are not used.
    • (0x24) Reserved Space
    • (0x28) Data signal of c_o: 1-word for the output data signal is assigned.
    • (0x2c) Control signal of c_o : 1-word for control signal ap_ovld is assigned and only 3 bits are used as the argument data type is char. Remaining bits are unused.

In operation the host application will initially start the kernel by writing into the Control address space (0x00). The host/CPU completes the initial setup by writing into the other address spaces which are associated with the various function arguments as defined in the example.

The control signal for port b is asserted and only then can the HLS engine read ports a and b (port a is ap_none and does not have a control signal). Until that time the design is stalled and waiting for the valid register to be set for port b. Each time port b is read by the HLS engine the input valid register is cleared and the register resets to logic 0.

After the HLS engine finishes its computation, the output value on port C is stored in the control register and the corresponding valid bit is set for the host to read. After the host reads the data, the HLS engine will write the ap_done bit in the Control register (0x00) to mark the end of the IP computation.

Vitis HLS reports the assigned addresses in the S_AXILITE Control Register Map, and also provides them in C Driver Files to aid in your software development. Using the s_axilite interface, you can output C driver files for use with code running on an embedded or x86 processor using provided C application program interface (API) functions, to let you control the hardware from your software.

S_AXILITE Control Register Map

Vitis HLS automatically generates a Control Register Map for controlling the Vivado IP or Vitis kernel, and the ports grouped into s_axilite interface. The register map, which is added to the generated RTL files, can be divided into two sections:
  1. Block-level control signals
  2. Function arguments mapped into the s_axilite interface
In the Vitis kernel flow, the block protocol is associated with the s_axilite interface by default. To change the default block protocol, specify the interface pragma as follows:
#pragma HLS INTERFACE mode=ap_ctrl_hs port=return
In the Vivado IP flow though, the block control protocol is assigned to its own interface, ap_ctrl, as seen in Interfaces for Vivado IP Flow. However, if you are using an s_axilite interface in your IP, you can also assign the block control protocol to that interface using the following INTERFACE pragmas, as an example:
#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=ap_ctrl_hs port=return bundle=BUS_A

In the Control Register Map, Vitis HLS reserves addresses 0x00 through 0x0C for the block-level protocol signals and interrupt controls, as shown below:

Address Description
0x00 Control signals
0x04 Global Interrupt Enable Register
0x08 IP Interrupt Enable Register (Read/Write)
0x0c IP Interrupt Status Register (Read/TOW)

The Control signals (0X00) contains ap_start, ap_done, ap_ready, and ap_idle; and in the case of ap_ctrl_chain the block protocol also contains ap_continue. These are the block-level interface signals which are accessed through the s_axilite adapter.

To start the block operation theap_start bit in the Control register must be set to 1. The HLS engine will then proceed and read any inputs grouped into the AXI4-Lite slave interface from the register in the interface.

When the block completes the operation, theap_done,ap_idleandap_ready registers will be set by the hardware output ports and the results for any output ports grouped into the s_axilite interface read from the appropriate register.

For function arguments, Vitis HLS automatically assigns the address for each argument or port that is assigned to the s_axilite interface. The tool will assign each port an offset starting from 0x10, the lower addresses being reserved for control signals. The size, or range of addresses assigned to a port is dependent on the argument data type and the port protocol used.

Because the variables grouped into an AXI4-Lite interface are function arguments which do not have a default value in the C code, none of the argument registers in the s_axilite interface can be assigned a default value. The registers can be implemented with a reset using the config_rtl command, but they cannot be assigned any other default value.

The Control Register Map generated by Vitis HLS for the ap_ctrl_hs block control protocol is provided below:

//------------------------Address Info-------------------
// 0x00 : Control signals
//        bit 0  - ap_start (Read/Write/COH)
//        bit 1  - ap_done (Read/COR)
//        bit 2  - ap_idle (Read)
//        bit 3  - ap_ready (Read)
//        bit 7  - auto_restart (Read/Write)
//        others - reserved
// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - enable ap_done interrupt (Read/Write)
//        bit 1  - enable ap_ready interrupt (Read/Write)
//        others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - ap_done (COR/TOW)
//        bit 1  - ap_ready (COR/TOW)
//        others - reserved
// 0x10 : Data signal of a
//        bit 7~0 - a[7:0] (Read/Write)
//        others  - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
//        bit 7~0 - b[7:0] (Read/Write)
//        others  - reserved
//  : Control signal of b
//        bit 0  - b_ap_vld (Read/Write/SC)
//        others - reserved
// 0x20 : Data signal of c_i
//        bit 7~0 - c_i[7:0] (Read/Write)
//        others  - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
//        bit 7~0 - c_o[7:0] (Read)
//        others  - reserved
// 0x2c : Control signal of c_o
//        bit 0  - c_o_ap_vld (Read/COR)
//        others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)

S_AXILITE and Port-Level Protocols

Port-level I/O protocols sequence data into and out of the HLS engine from the s_axilite adapter as seen in S_AXILITE Example. In the Vivado IP flow, you can assign port-level I/O protocols to the individual ports and signals bundled into an s_axilite interface. In the Vitis kernel flow, changing the default port-level I/O protocols is not recommended unless necessary. The tool assigns a default port protocol to a port depending on the type and direction of the argument associated with it. The port can contain one or more of the following:
  • Data signal for the argument
  • Valid signal (ap_vld/ap_ovld) to indicate when the data can be read
  • Acknowledge signal (ap_ack) to indicate when the data has been read

The default port protocol assignments for various argument types are as follows:

Argument Type Default Supported
scalar ap_none ap_ack and ap_vld can also be used
Pointers/References
Inputs ap_none ap_ack and ap_vld
Outputs ap_vld ap_none, ap_ack, and ap_ovld can also be used
Inouts ap_ovld ap_none, ap_ack, and ap_vld are also supported
IMPORTANT: Arrays default to ap_memory. The bram port protocol is not supported for arrays in an s_axilite interface.

The S_AXILITE Example groups port b into the s_axilite interface and specifies port b as using the ap_vld protocol with INTERFACE pragmas. As a result, the s_axilite adapter contains a register for the port b data, and a register for the port b input valid signal.

If the input valid register is not set to logic 1, the data in the b data register is not considered valid, and the design stalls and waits for the valid register to be set. Each time port b is read, Vitis HLS automatically clears the input valid register and resets the register to logic 0.

Note: To simplify the operation of your design, Xilinx recommends that you use the default port protocols associated with the s_axilite interface.

S_AXILITE Bundle Rules

In the S_AXILITE Example all the function arguments are grouped into a single s_axilite interface adapter specified by the bundle=BUS_A option in the INTERFACE pragma. The bundle option simply lets you group ports together into one interface.

In the Vitis kernel flow there should only be a single interface bundle, commonly named s_axi_control by the tool. So you should not specify the bundle option in that flow, or you will probably encounter an error during synthesis. However, in the Vivado IP flow you can specify multiple bundles using the s_axilite interface, and this will create a separate interface adapter for each bundle you have defined. The following example shows this:
void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=c bundle=OUT
#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=ap_vld port=b
  *c += *a + *b;
}

After synthesis completes, the Synthesis Summary report provides feedback regarding the number of s_axilite adapters generated. The SW-to-HW Mapping section of the report contains the HW info showing the control register offset and the address range for each port.

However, there are some rules related to using bundles with the s_axilite interface.

  1. Default Bundle Names: This rule explicitly groups all interface ports with no bundle name into the same AXI4-Lite interface port, uses the tool default bundle name, and names the RTL port s_axi_<default>, typically s_axi_control.
    In this example all ports are mapped to the default bundle:
    void top(char *a, char *b, char *c)
    {
    #pragma HLS INTERFACE mode=s_axilite port=a
    #pragma HLS INTERFACE mode=s_axilite port=b
    #pragma HLS INTERFACE mode=s_axilite port=c
         *c += *a + *b;
    }
  2. User-Specified Bundle Names: This rule explicitly groups all interface ports with the same bundle name into the same AXI4-Lite interface port, and names the RTL port the value specified by s_axi_<string>.
    The following example results in interfaces named s_axi_BUS_A, s_axi_BUS_B, and s_axi_OUT:
    void example(char *a, char *b, char *c)
    {
    #pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A
    #pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_B
    #pragma HLS INTERFACE mode=s_axilite port=c bundle=OUT
    #pragma HLS INTERFACE mode=s_axilite port=return bundle=OUT
    #pragma HLS INTERFACE mode=ap_vld port=b
         *c += *a + *b;
    }
  3. Partially Specified Bundle Names: If you specify bundle names for some arguments, but leave other arguments unassigned, then the tool will bundle the arguments as follows:
    • Group all ports into the specified bundles as indicated by the INTERFACE pragmas.
    • Group any ports without bundle assignments into a default named bundle. The default name can either be the standard tool default, or an alternative default name if the tool default has already been specified by the user.

    In the following example the user has specified bundle=control, which is the tool default name. In this case, port c will be assigned to s_axi_control as specified by the user, and the remaining ports will be bundled under s_axi_control_r, which is an alternative default name used by the tool.

    void top(char *a, char *b, char *c) {
    #pragma HLS INTERFACE mode=s_axilite port=a
    #pragma HLS INTERFACE mode=s_axilite port=b
    #pragma HLS INTERFACE mode=s_axilite port=c bundle=control
    }

S_AXILITE Offset Option

Note: The Vitis kernel flow determines the required offsets. Do not specify the offset option in that flow.

In the Vivado IP flow, Vitis HLS defines the size, or range of addresses assigned to a port in the S_AXILITE Control Register Map depending on the argument data type and the port protocol used. However, the INTERFACE pragma also contains an offset option that lets you specify the address offset in the AXI4-Lite interface.

When specifying the offset for your argument, you must consider the size of your data and reserve some extra for the port control protocol. The range of addresses you reserve should be based on a 32-bit word. You should reserve enough 32-bit words to fit your argument data type, and add reserve one additional word for the control protocol, even for ap_none.

TIP: In the case of the ap_memory protocol for arrays, you do not need to reserve the extra word for the control protocol. In this case, simply reserve enough 32-bit words to fit your argument data type.

For example, to reserve enough space for a double you need to reserve 2 32-bit words for the 64-bit data type, and then reserve an additional 32-bit word for the control protocol. So you need to reserve a total of 3 32-bit words, or 96 bits. If your argument offset starts at 0x020, then the next available offset would begin at 0x02c, in order to reserve the required address range for your argument.

If you make a mistake in setting the offset of your arguments, by not reserving enough address range to fit your data type and the control protocol, Vitis HLS will recognize the error, will warn you of the issue, and will recover by moving your misplaced argument register to the end of the Control Register Map. This will allow your build to proceed, but may not work with your host application or driver if they were written to your specified offset.

C Driver Files

When an AXI4-Lite slave interface is implemented, a set of C driver files are automatically created. These C driver files provide a set of APIs that can be integrated into any software running on a CPU and used to communicate with the device via the AXI4-Lite slave interface.

The C driver files are created when the design is packaged as IP in the IP catalog.

Driver files are created for standalone and Linux modes. In standalone mode the drivers are used in the same way as any other Xilinx standalone drivers. In Linux mode, copy all the C files (.c) and header files (.h) files into the software project.

The driver files and API functions derive their name from the top-level function for synthesis. In the above example, the top-level function is called “example”. If the top-level function was named “DUT” the name “example” would be replaced by “DUT” in the following description. The driver files are created in the packaged IP (located in the impl directory inside the solution).

Table 1. C Driver Files for a Design Named example
File Path Usage Mode Description
data/example.mdd Standalone Driver definition file.
data/example.tcl Standalone Used by SDK to integrate the software into an SDK project.
src/xexample_hw.h Both Defines address offsets for all internal registers.
src/xexample.h Both API definitions
src/xexample.c Both Standard API implementations
src/xexample_sinit.c Standalone Initialization API implementations
src/xexample_linux.c Linux Initialization API implementations
src/Makefile Standalone Makefile

In file xexample.h, two structs are defined.

XExample_Config
This is used to hold the configuration information (base address of each AXI4-Lite slave interface) of the IP instance.
XExample
This is used to hold the IP instance pointer. Most APIs take this instance pointer as the first argument.

The standard API implementations are provided in files xexample.c, xexample_sinit.c, xexample_linux.c, and provide functions to perform the following operations.

  • Initialize the device
  • Control the device and query its status
  • Read/write to the registers
  • Set up, monitor, and control the interrupts

Refer to Vitis HLS C Driver Reference for a description of the API functions provided in the C driver files.

IMPORTANT: The C driver APIs always use an unsigned 32-bit type (U32). You might be required to cast the data in the C code into the expected type.
C Driver Files and Float Types

C driver files always use a data 32-bit unsigned integer (U32) for data transfers. In the following example, the function uses float type arguments a and r1. It sets the value of a and returns the value of r1:

float caculate(float a, float *r1)
{
#pragma HLS INTERFACE mode=ap_vld register port=r1
#pragma HLS INTERFACE mode=s_axilite port=a 
#pragma HLS INTERFACE mode=s_axilite port=r1 
#pragma HLS INTERFACE mode=s_axilite port=return 

 *r1 = 0.5f*a;
 return (a>0);
}

After synthesis, Vitis HLS groups all ports into the default AXI4-Lite interface and creates C driver files. However, as shown in the following example, the driver files use type U32:

// API to set the value of A
void XCaculate_SetA(XCaculate *InstancePtr, u32 Data) {
    Xil_AssertVoid(InstancePtr != NULL);
    Xil_AssertVoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);
    XCaculate_WriteReg(InstancePtr->Hls_periph_bus_BaseAddress, 
XCACULATE_HLS_PERIPH_BUS_ADDR_A_DATA, Data);
}

// API to get the value of R1
u32 XCaculate_GetR1(XCaculate *InstancePtr) {
    u32 Data;

    Xil_AssertNonvoid(InstancePtr != NULL);
    Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);

    Data = XCaculate_ReadReg(InstancePtr->Hls_periph_bus_BaseAddress, 
XCACULATE_HLS_PERIPH_BUS_ADDR_R1_DATA);
    return Data;
}

If these functions work directly with float types, the write and read values are not consistent with expected float type. When using these functions in software, you can use the following casts in the code:

float a=3.0f,r1;
u32 ua,ur1;

// cast float “a” to type U32
XCaculate_SetA(&calculate,*((u32*)&a));
ur1=XCaculate_GetR1(&caculate);

// cast return type U32 to float type for “r1”
r1=*((float*)&ur1);

Controlling Hardware

TIP: The example provided below demonstrates the ap_ctrl_hs block control protocol, which is the default for the Vivado IP flow. Refer to Block-Level Control Protocols for more information and a description of the ap_ctrl_chain protocol which is the default for the Vitis kernel flow.

In this example, the hardware header file xexample_hw.h provides a complete list of the memory mapped locations for the ports grouped into the AXI4-Lite slave interface, as described in S_AXILITE Control Register Map.

// 0x00 : Control signals
//        bit 0  - ap_start (Read/Write/SC)
//        bit 1  - ap_done (Read/COR)
//        bit 2  - ap_idle (Read)
//        bit 3  - ap_ready (Read)
//        bit 7  - auto_restart (Read/Write)
//        others - reserved
// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - Channel 0 (ap_done)
//        bit 1  - Channel 1 (ap_ready)
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - Channel 0 (ap_done)
//        others - reserved
// 0x10 : Data signal of a
//        bit 7~0 - a[7:0] (Read/Write)
//        others  - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
//        bit 7~0 - b[7:0] (Read/Write)
//        others  - reserved
// 0x1c : reserved
// 0x20 : Data signal of c_i
//        bit 7~0 - c_i[7:0] (Read/Write)
//        others  - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
//        bit 7~0 - c_o[7:0] (Read)
//        others  - reserved
// 0x2c : Control signal of c_o
//        bit 0  - c_o_ap_vld (Read/COR)
//        others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on 
Handshake)

To correctly program the registers in the s_axilite interface, you must understand how the hardware ports operate with the default port protocols, or the custom protocols as described in S_AXILITE and Port-Level Protocols.

For example, to start the block operation the ap_start register must be set to 1. The device will then proceed and read any inputs grouped into the AXI4-Lite slave interface from the register in the interface. When the block completes operation, the ap_done, ap_idle and ap_ready registers will be set by the hardware output ports and the results for any output ports grouped into the AXI4-Lite slave interface read from the appropriate register.

The implementation of function argument c in the example highlights the importance of some understanding how the hardware ports operate. Function argument c is both read and written to, and is therefore implemented as separate input and output ports c_i and c_o, as explained in S_AXILITE Example.

The first recommended flow for programing the s_axilite interface is for a one-time execution of the function:

  • Use the interrupt function standard API implementations provided in the C Driver Files to determine how you want the interrupt to operate.
  • Load the register values for the block input ports. In the above example this is performed using API functions XExample_Set_a, XExample_Set_b, and XExample_Set_c_i.
  • Set the ap_start bit to 1 using XExample_Start to start executing the function. This register is self-clearing as noted in the header file above. After one transaction, the block will suspend operation.
  • Allow the function to execute. Address any interrupts which are generated.
  • Read the output registers. In the above example this is performed using API functions XExample_Get_c_o_vld, to confirm the data is valid, and XExample_Get_c_o.
    Note: The registers in the s_axilite interface obey the same I/O protocol as the ports. In this case, the output valid is set to logic 1 to indicate if the data is valid.
  • Repeat for the next transaction.

The second recommended flow is for continuous execution of the block. In this mode, the input ports included in the AXI4-Lite interface should only be ports which perform configuration. The block will typically run much faster than a CPU. If the block must wait for inputs, the block will spend most of its time waiting:

  • Use the interrupt function to determine how you wish the interrupt to operate.
  • Load the register values for the block input ports. In the above example this is performed using API functions XExample_Set_a, XExample_Set_a and XExample_Set_c_i.
  • Set the auto-start function using API XExample_EnableAutoRestart.
  • Allow the function to execute. The individual port I/O protocols will synchronize the data being processed through the block.
  • Address any interrupts which are generated. The output registers could be accessed during this operation but the data may change often.
  • Use the API function XExample_DisableAutoRestart to prevent any more executions.
  • Read the output registers. In the above example this is performed using API functions XExample_Get_c_o and XExample_Set_c_o_vld.

Controlling Software

The API functions can be used in the software running on the CPU to control the hardware block. An overview of the process is:

  • Create an instance of the hardware
  • Look Up the device configuration
  • Initialize the device
  • Set the input parameters of the HLS block
  • Start the device and read the results

An example application is shown below.

#include "xexample.h"    // Device driver for HLS HW block
#include "xparameters.h" 

// HLS HW instance
XExample HlsExample;
XExample_Config *ExamplePtr

int main() {
 int res_hw;

// Look Up the device configuration 
 ExamplePtr = XExample_LookupConfig(XPAR_XEXAMPLE_0_DEVICE_ID);
 if (!ExamplePtr) {
 print("ERROR: Lookup of accelerator configuration failed.\n\r");
 return XST_FAILURE;
 }

// Initialize the Device
 status = XExample_CfgInitialize(&HlsExample, ExamplePtr);
 if (status != XST_SUCCESS) {
 print("ERROR: Could not initialize accelerator.\n\r");
 exit(-1);
 }

//Set the input parameters of the HLS block
 XExample_Set_a(&HlsExample, 42);
 XExample_Set_b(&HlsExample, 12);
 XExample_Set_c_i(&HlsExample, 1);

// Start the device and read the results
 XExample_Start(&HlsExample);
 do {
 res_hw = XExample_Get_c_o(&HlsExample);
 } while (XExample_Get_c_o(&HlsExample) == 0); // wait for valid data output
 print("Detected HLS peripheral complete. Result received.\n\r");
}

Control Clock and Reset in AXI4-Lite Interfaces

Note: If you instantiate the slave AXI4-Lite register file in a bus fabric that uses a different clock frequency, Vivado IP integrator will automatically generate a clock domain crossing (CDC) slice that performs the same function as the control clock described below, making use of the option unnecessary.

By default, Vitis HLS uses the same clock for the AXI4-Lite interface and the synthesized design. Vitis HLS connects all registers in the AXI4-Lite interface to the clock used for the synthesized logic (ap_clk).

Optionally, you can use the INTERFACE directive clock option to specify a separate clock for each AXI4-Lite port. When connecting the clock to the AXI4-Lite interface, you must use the following protocols:

  • AXI4-Lite interface clock must be synchronous to the clock used for the synthesized logic (ap_clk). That is, both clocks must be derived from the same master generator clock.
  • AXI4-Lite interface clock frequency must be equal to or less than the frequency of the clock used for the synthesized logic (ap_clk).

If you use the clock option with the INTERFACE directive, you only need to specify the clock option on one function argument in each bundle. Vitis HLS implements all other function arguments in the bundle with the same clock and reset. Vitis HLS names the generated reset signal with the prefix ap_rst_ followed by the clock name. The generated reset signal is active-Low independent of the config_rtl command.

The following example shows how Vitis HLS groups function arguments a and b into an AXI4-Lite port with a clock named AXI_clk1 and an associated reset port.

// Default AXI-Lite interface implemented with independent clock called AXI_clk1
#pragma HLS interface mode=s_axilite port=a clock=AXI_clk1
#pragma HLS interface mode=s_axilite port=b

In the following example, Vitis HLS groups function arguments c and d into AXI4-Lite port CTRL1 with a separate clock called AXI_clk2 and an associated reset port.

// CTRL1 AXI-Lite bundle implemented with a separate clock (called AXI_clk2)
#pragma HLS interface mode=s_axilite port=c bundle=CTRL1 clock=AXI_clk2
#pragma HLS interface mode=s_axilite port=d bundle=CTRL1

Customizing AXI4-Lite Slave Interfaces in IP Integrator

When an HLS RTL design using an AXI4-Lite slave interface is incorporated into a design in Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click with the mouse button and select Customize Block.

The address width is by default configured to the minimum required size. Modify this to connect to blocks with address sizes less than 32-bit.

Figure 11: Customizing AXI4-Lite Slave Interfaces in IP Integrator

AXI4-Stream Interfaces

An AXI4-Stream interface can be applied to any input argument and any array or pointer output argument. Because an AXI4-Stream interface transfers data in a sequential streaming manner, it cannot be used with arguments that are both read and written. In terms of data layout, the data type of the AXI4-Stream is aligned to the next byte. For example, if the size of the data type is 12 bits, it will be extended to 16 bits. Depending on whether a signed/unsigned interface is selected, the extended bits are either sign-extended or zero-extended. If the stream data type is a user-defined struct, the struct is aggregated and aligned to the size of the largest data element within the struct.

TIP: The maximum supported port width is 4096 bits, even for aggregated structs or reshaped arrays.

The following code examples show how the packed alignment depends on your struct type. If the struct contains only char type, as shown in the following example, then it will be packed with alignment of one byte. Total size of the struct will be two bytes:

struct A {
  char foo;
  char bar;
};

However, if the struct has elements with different data types, as shown below, then it will be packed and aligned to the size of the largest data element, or four bytes in this example. Element bar will be padded with three bytes resulting in a total size of eight bytes for the struct:

struct A {
  int foo;
  char bar;
};
IMPORTANT: Structs contained in AXI4-Stream interfaces (axis) are aggregated by default, and the stream itself cannot be disaggregated. If separate streams for member elements of the struct are desired then this must be manually coded as separate elements, resulting in a separate axis interface for each element.

How AXI4-Stream is Implemented

The AXI4-Stream interface is implemented as a struct type in Vitis HLS and has the following signature (defined in ap_axi_sdata.h):

template <typename T, size_t WUser, size_t WId, size_t WDest> struct axis { .. };

Where:

T
Stream data type
WUser
Width of the TUSER signal
WId
Width of the TID signal
WDest
Width of the TDest signal

When the stream data type (T) are simple integer types, there are two predefined types of AXI4-Stream implementations available:

  • A signed implementation of the AXI4-Stream class (or more simply ap_axis<Wdata, WUser, WId, WDest>)
    hls::axis<ap_int<WData>, WUser, WId, WDest>
  • An unsigned implementation of the AXI4-Stream class (or more simply ap_axiu<WData, WUser, WId, WDest>)
    hls::axis<ap_uint<WData>, WUser, WId, WDest>

The value specified for the WUser, WId, and WDest template parameters controls the usage of side-channel signals in the AXI4-Stream interface.

When the hls::axis class is used, the generated RTL will typically contain the actual data signal TDATA, and the following additional signals: TVALID, TREADY, TKEEP, TSTRB, TLAST, TUSER, TID, and TDEST.

TVALID, TREADY, and TLAST are necessary control signals for the AXI4-Stream protocol. TKEEP, TSTRB, TUSER, TID, and TDEST signals are special signals that can be used to pass around additional bookkeeping data.

TIP: If WUser, WId, and WDest are set to 0, the generated RTL will not include the TUSER, TID, and TDEST signals in the interface.

How AXI4-Stream Works

AXI4-Stream is a protocol designed for transporting arbitrary unidirectional data. In an AXI4-Stream, TDATA width of bits is transferred per clock cycle. The transfer is started once the producer sends the TVALID signal and the consumer responds by sending the TREADY signal (once it has consumed the initial TDATA). At this point, the producer will start sending TDATA and TLAST (TUSER if needed to carry additional user-defined sideband data). TLAST signals the last byte of the stream. So the consumer keeps consuming the incoming TDATA until TLAST is asserted.

Figure 12: AXI4-Stream Handshake

AXI4-Stream has additional optional features like sending positional data with TKEEP and TSTRB ports which makes it possible to multiplex both the data position and data itself on the TDATA signal. Using the TID and TDIST signals, you can route streams as these fields roughly corresponds to stream identifier and stream destination identifier. Refer to Vivado Design Suite: AXI Reference Guide (UG1037) or the AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A) for more information.

Registered AXI4-Stream Interfaces

As a default, AXI4-Stream interfaces are always implemented as registered interfaces to ensure that no combinational feedback paths are created when multiple HLS IP blocks with AXI4-Stream interfaces are integrated into a larger design. For AXI4-Stream interfaces, four types of register modes are provided to control how the interface registers are implemented:

Forward
Only the TDATA and TVALID signals are registered.
Reverse
Only the TREADY signal is registered.
Both
All signals (TDATA, TREADY, and TVALID) are registered. This is the default.
Off
None of the port signals are registered.

The AXI4-Stream side-channel signals are considered to be data signals and are registered whenever TDATA is registered.

Note: When connecting HLS generated IP blocks with AXI4-Stream interfaces at least one interface should be implemented as a registered interface or the blocks should be connected via an AXI4-Stream Register Slice.

There are two basic methods to use an AXI4-Stream in your design:

  • Use an AXI4-Stream without side-channels.
  • Use an AXI4-Stream with side-channels.

This second use model provides additional functionality, allowing the optional side-channels which are part of the AXI4-Stream standard, to be used directly in your C/C++ code.

AXI4-Stream Interfaces without Side-Channels

An AXI4-Stream is used without side-channels when the function argument, ap_axis or ap_axiu data type, does not contain any AXI4 side-channel elements (that is, when the WUser, WId, and WDest parameters are set to 0). In the following example, both interfaces are implemented using an AXI4-Stream:

#include "ap_axi_sdata.h"
#include "hls_stream.h"

typedef ap_axiu<32, 0, 0, 0> trans_pkt;

void example(hls::stream< trans_pkt > &A, hls::stream< trans_pkt > &B)
{
#pragma HLS INTERFACE mode=axis port=A
#pragma HLS INTERFACE mode=axis port=B
    trans_pkt tmp;
    A.read(tmp);
    tmp.data += 5;
    B.write(tmp);
}

After synthesis, both arguments are implemented with a data port (TDATA) and the standard AXI4-Stream protocol ports, TVALID, TREADY, TKEEP, TLAST, and TSTRB, as shown in the following figure.

Figure 13: AXI4-Stream Interfaces without Side-Channels
TIP: If you specify an hls::stream object with a data type other than ap_axis or ap_axiu, the tool will infer an AXI4-Stream interface without the TLAST signal, or any of the side-channel signals. This implementation of the AXI4-Stream interface consumes fewer device resources, but offers no visibility into when the stream is ending.

Multiple variables can be combined into the same AXI4-Stream interface by using a struct, which is aggregated by Vitis HLS by default. Aggregating the elements of a struct into a single wide-vector, allows all elements of the struct to be implemented in the same AXI4-Stream interface.

AXI4-Stream Interfaces with Side-Channels

The following example shows how the side-channels can be used directly in the C/C++ code and implemented on the interface. The code uses #include "ap_axi_sdata.h" to provide an API to handle the side-channels of the AXI4-Stream interface. In the following example a signed 32-bit data type is used:

#include "ap_axi_sdata.h"
#include "ap_int.h"
#include "hls_stream.h"

#define DWIDTH 32

typedef ap_axiu<DWIDTH, 1, 1, 1> trans_pkt;

extern "C"{
    void krnl_stream_vmult(hls::stream<trans_pkt> &A, 
						   hls::stream<trans_pkt> &B) {
#pragma HLS INTERFACE mode=axis port=A
#pragma HLS INTERFACE mode=axis port=B
#pragma HLS INTERFACE mode=s_axilite port=return bundle=control
        bool eos = false;
        
        vmult: do {
#pragma HLS PIPELINE II=1
            trans_pkt t2 = A.read();
            
            // Packet for Output
            trans_pkt t_out;
            
            // Reading data from input packet
            ap_uint<DWIDTH> in2 = t2.data;
            ap_uint<DWIDTH> tmpOut = in2 * 5;

            // Setting data and configuration to output packet
            t_out.data = tmpOut;
            t_out.last = t2.last;
            t_out.keep = -1; //Enabling all bytes
            // Writing packet to output stream
            B.write(t_out);
            if (t2.last) {
               eos = true;
            }
        } while (eos == false);
    }
}

After synthesis, both the A and B arguments are implemented with data ports, the standard AXI4-Stream protocol ports, TVALID and TREADY and all of the optional ports described in the struct.

Figure 14: AXI4-Stream Interfaces with Side-Channels

Coding Style for Array to Stream

You should perform all the operations on temp variables. Read the input stream, process the temp variable, and write the output stream, as shown in the example below. This approach lets you preserve the sequential reading and writing of the stream of data, rather than attempting multiple or random reads or writes.

struct A {
  short varA;
  int varB;
};
 
void dut(A in[N], A out[N], bool flag) {
  #pragma HLS interface mode=axis port=in,out
  for (unsigned i=0; i<N; i++) {
    A tmp = in[i];
    if (flag)
      tmp.varB += 5;
    out[i] = tmp;
  }
}

If this coding style is not adhered to, it will lead to functional failures of the stream processing.

Port-Level I/O Protocols

IMPORTANT: The port-level I/O protocols of interfaces defined in the Vitis kernel flow are set by design and should not be modified as a general rule.

By default input pointers and pass-by-value arguments are implemented as simple wire ports with no associated handshaking signal. For example, in the vadd function discussed in Interfaces for Vivado IP Flow, the input ports are implemented without an I/O protocol, only a data port. If the port has no I/O protocol, (by default or by design) the input data must be held stable until it is read.

By default output pointers are implemented with an associated output valid signal to indicate when the output data is valid. In the vadd function example, the output port is implemented with an associated output valid port (out_r_o_ap_vld) which indicates when the data on the port is valid and can be read. If there is no I/O protocol associated with the output port, it is difficult to know when to read the data.
TIP: It is always a good idea to use an I/O protocol on an output.

Function arguments which are both read from and written to are split into separate input and output ports. In the vadd function example, the out_r argument is implemented as both an input port out_r_i, and an output port out_r_o with associated I/O protocol port out_r_o_ap_vld.

If the function has a return value, an output port ap_return is implemented to provide the return value. When the RTL design completes one transaction, this is equivalent to one execution of the C/C++ function, the block-level protocols indicate the function is complete with the ap_done signal. This also indicates the data on port ap_return is valid and can be read.

Note: The return value of the top-level function cannot be a pointer.

For the example code shown the timing behavior is shown in the following figure (assuming that the target technology and clock frequency allow a single addition per clock cycle).

Figure 15: RTL Port Timing with Default Synthesis
  • The design starts when ap_start is asserted High.
  • The ap_idle signal is asserted Low to indicate the design is operating.
  • The input data is read at any clock after the first cycle. Vitis HLS schedules when the reads occur. The ap_ready signal is asserted High when all inputs have been read.
  • When output sum is calculated, the associated output handshake (sum_o_ap_vld) indicates that the data is valid.
  • When the function completes, ap_done is asserted. This also indicates that the data on ap_return is valid.
  • Port ap_idle is asserted High to indicate that the design is waiting start again.

Port-Level I/O: No Protocol

The ap_none specifies that no I/O protocol be added to the port. When this is specified the argument is implemented as a data port with no other associated signals. The ap_none mode is the default for scalar inputs.

ap_none

The ap_none port-level I/O protocol is the simplest interface type and has no other signals associated with it. Neither the input nor output data signals have associated control ports that indicate when data is read or written. The only ports in the RTL design are those specified in the source code.

An ap_none interface does not require additional hardware overhead. However, the ap_none interface does requires the following:

  • Producer blocks to do one of the following:
    • Provide data to the input port at the correct time
    • Hold data for the length of a transaction until the design completes
  • Consumer blocks to read output ports at the correct time
Note: The ap_none interface cannot be used with array arguments.

Port-Level I/O: Wire Handshakes

Interface mode ap_hs includes a two-way handshake signal with the data port. The handshake is an industry standard valid and acknowledge handshake. Mode ap_vld is the same but only has a valid port and ap_ack only has a acknowledge port.

Mode ap_ovld is for use with in-out arguments. When the in-out is split into separate input and output ports, mode ap_none is applied to the input port and ap_vld applied to the output port. This is the default for pointer arguments that are both read and written.

The ap_hs mode can be applied to arrays that are read or written in sequential order. If Vitis HLS can determine the read or write accesses are not sequential, it will halt synthesis with an error. If the access order cannot be determined, Vitis HLS will issue a warning.

ap_hs (ap_ack, ap_vld, and ap_ovld)

The ap_hs port-level I/O protocol provides the greatest flexibility in the development process, allowing both bottom-up and top-down design flows. Two-way handshakes safely perform all intra-block communication, and manual intervention or assumptions are not required for correct operation. The ap_hs port-level I/O protocol provides the following signals:

  • Data port
  • Valid signal to indicate when the data signal is valid and can be read
  • Acknowledge signal to indicate when the data has been read

The following figure shows how an ap_hs interface behaves for both an input and output port. In this example, the input port is named in, and the output port is named out.

Note: The control signals names are based on the original port name. For example, the valid port for data input in is named in_vld.
Figure 16: Behavior of ap_hs Interface

For inputs, the following occurs:

  • After start is applied, the block begins normal operation.
  • If the design is ready for input data but the input valid is Low, the design stalls and waits for the input valid to be asserted to indicate a new input value is present.
    Note: The preceding figure shows this behavior. In this example, the design is ready to read data input in on clock cycle 4 and stalls waiting for the input valid before reading the data.
  • When the input valid is asserted High, an output acknowledge is asserted High to indicate the data was read.

For outputs, the following occurs:

  • After start is applied, the block begins normal operation.
  • When an output port is written to, its associated output valid signal is simultaneously asserted to indicate valid data is present on the port.
  • If the associated input acknowledge is Low, the design stalls and waits for the input acknowledge to be asserted.
  • When the input acknowledge is asserted, indicating the data has been read, the output valid is deasserted on the next clock edge.

ap_ack

The ap_ack port-level I/O protocol is a subset of the ap_hs interface type. The ap_ack port-level I/O protocol provides the following signals:

  • Data port
  • Acknowledge signal to indicate when data is consumed
    • For input arguments, the design generates an output acknowledge that is active-High in the cycle the input is read.
    • For output arguments, Vitis HLS implements an input acknowledge port to confirm the output was read.
    Note: After a write operation, the design stalls and waits until the input acknowledge is asserted High, which indicates the output was read by a consumer block. However, there is no associated output port to indicate when the data can be consumed.
CAUTION: You cannot use C/RTL co-simulation to verify designs that use ap_ack on an output port.

ap_vld

The ap_vld is a subset of the ap_hs interface type. The ap_vld port-level I/O protocol provides the following signals:

  • Data port
  • Valid signal to indicate when the data signal is valid and can be read
    • For input arguments, the design reads the data port as soon as the valid is active. Even if the design is not ready to read new data, the design samples the data port and holds the data internally until needed.
    • For output arguments, Vitis HLS implements an output valid port to indicate when the data on the output port is valid.

ap_ovld

The ap_ovld is a subset of the ap_hs interface type. The ap_ovld port-level I/O protocol provides the following signals:

  • Data port
  • Valid signal to indicate when the data signal is valid and can be read
    • For input arguments and the input half of inout arguments, the design defaults to type ap_none.
    • For output arguments and the output half of inout arguments, the design implements type ap_vld.

Port-Level I/O: Memory Interface Protocol

Array arguments are implemented by default as an ap_memory interface. This is a standard block RAM interface with data, address, chip-enable, and write-enable ports.

An ap_memory interface can be implemented as a single-port of dual-port interface. If Vitis HLS can determine that using a dual-port interface will reduce the initial interval, it will automatically implement a dual-port interface. The BIND_STORAGE pragma or directive is used to specify the memory resource and if this directive is specified on the array with a single-port block RAM, a single-port interface will be implemented. Conversely, if a dual-port interface is specified using the BIND_STORAGE pragma and Vitis HLS determines this interface provides no benefit it will automatically implement a single-port interface.

If the array is accessed in a sequential manner an ap_fifo interface can be used. As with the ap_hs interface, Vitis HLS will halt if it determines the data access is not sequential, report a warning if it cannot determine if the access is sequential or issue no message if it determines the access is sequential. The ap_fifo interface can only be used for reading or writing, not both.

ap_memory, bram

The ap_memory and bram interface port-level I/O protocols are used to implement array arguments. This type of port-level I/O protocol can communicate with memory elements (for example, RAMs and ROMs) when the implementation requires random accesses to the memory address locations.

Note: If you only need sequential access to the memory element, use the ap_fifo interface instead. The ap_fifo interface reduces the hardware overhead, because address generation is not performed.

The ap_memory and bram interface port-level I/O protocols are identical. The only difference is the way Vivado IP integrator shows the blocks:

  • The ap_memory interface appears as discrete ports.
  • The bram interface appears as a single, grouped port. In IP integrator, you can use a single connection to create connections to all ports.

When using an ap_memory interface, specify the array targets using the BIND_STORAGE pragma. If no target is specified for the arrays, Vitis HLS determines whether to use a single or dual-port RAM interface.

TIP: Before running synthesis, ensure array arguments are targeted to the correct memory type using the BIND_STORAGE pragma. Re-synthesizing with corrected memories can result in a different schedule and RTL.

The following figure shows an array named d specified as a single-port block RAM. The port names are based on the C/C++ function argument. For example, if the C/C++ argument is d, the chip-enable is d_ce, and the input data is d_q0 based on the output/q port of the BRAM.

Figure 17: Behavior of ap_memory Interface

After reset, the following occurs:

  • After start is applied, the block begins normal operation.
  • Reads are performed by applying an address on the output address ports while asserting the output signal d_ce.
    Note: For a default block RAM, the design expects the input data d_q0 to be available in the next clock cycle. You can use the BIND_STORAGE pragma to indicate the RAM has a longer read latency.
  • Write operations are performed by asserting output ports d_ce and d_we while simultaneously applying the address and output data d_d0.

ap_fifo

When an output port is written to, its associated output valid signal interface is the most hardware-efficient approach when the design requires access to a memory element and the access is always performed in a sequential manner, that is, no random access is required. The ap_fifo port-level I/O protocol supports the following:

  • Allows the port to be connected to a FIFO
  • Enables complete, two-way empty-full communication
  • Works for arrays, pointers, and pass-by-reference argument types
Note: Functions that can use an ap_fifo interface often use pointers and might access the same variable multiple times. To understand the importance of the volatile qualifier when using this coding style, see Multi-Access Pointers on the Interface.

In the following example, in1 is a pointer that accesses the current address, then two addresses above the current address, and finally one address below.

void foo(int* in1, ...) {
 int data1, data2, data3;  
       ...
 data1= *in1; 
 data2= *(in1+2);
 data3= *(in1-1);
 ...
}

If in1 is specified as an ap_fifo interface, Vitis HLS checks the accesses, determines the accesses are not in sequential order, issues an error, and halts. To read from non-sequential address locations, use an ap_memory or bram interface.

You cannot specify an ap_fifo interface on an argument that is both read from and written to. You can only specify an ap_fifo interface on an input or an output argument. A design with input argument in and output argument out specified as ap_fifo interfaces behaves as shown in the following figure.

Figure 18: Behavior of ap_fifo Interface

For inputs, the following occurs:

  • After ap_start is applied, the block begins normal operation.
  • If the input port is ready to be read but the FIFO is empty as indicated by input port in_empty_n Low, the design stalls and waits for data to become available.
  • When the FIFO contains data as indicated by input port in_empty_n High, an output acknowledge in_read is asserted High to indicate the data was read in this cycle.

For outputs, the following occurs:

  • After start is applied, the block begins normal operation.
  • If an output port is ready to be written to but the FIFO is full as indicated by out_full_n Low, the data is placed on the output port but the design stalls and waits for the space to become available in the FIFO.
  • When space becomes available in the FIFO as indicated by out_full_n High, the output acknowledge signal out_write is asserted to indicate the output data is valid.
  • If the top-level function or the top-level loop is pipelined using the -rewind option, Vitis HLS creates an additional output port with the suffix _lwr. When the last write to the FIFO interface completes, the _lwr port goes active-High.

Block-Level Control Protocols

The execution mode of a Vitis kernel or Vivado IP is specified by the block-level control protocol. Execution modes of kernels include:
  • Pipelined execution (ap_ctrl_chain) permitting overlapping kernel runs to begin processing additional data as soon as the kernel is ready.
  • Sequential execution (ap_ctrl_hs) requiring the kernel to complete one cycle before beginning another.
  • Data driven execution (ap_ctrl_none) which enables the kernel to run when data is available, and stall when data is not.
You can specify the block-level control protocol on the function or the function return. If the C/C++ code does not return a value, you can still specify the control protocol on the function return. If the C/C++ code uses a function return, Vitis HLS creates an output port ap_return for the return value.
TIP: When the function return is specified as an AXI4-Lite interface (s_axilite) all the ports in the control protocol are bundled into the s_axilite interface. This is a common practice for software-controllable kernels or IP when an application or software driver is used to configure and control when the block starts and stops operation. This is a requirement of XRT and the Vitis kernel flow.

The ap_ctrl_hs block-level control protocol is the default for the Vivado IP flow. Interfaces for Vivado IP Flow shows the resulting RTL ports and behavior when Vitis HLS implements ap_ctrl_hs on a function.

The ap_ctrl_chain control protocol is the default for the Vitis kernel flow as explained in Interfaces for Vitis Kernel Flow. It is similar to ap_ctrl_hs but provides an additional input signal ap_continue to apply back pressure. Xilinx recommends using the ap_ctrl_chain block-level I/O protocol when chaining Vitis HLS blocks together.

TIP: Refer to Supported Kernel Execution Models for more information on how XRT uses these control protocols.

ap_ctrl_hs

The following figure shows the behavior of the block-level handshake signals created by the ap_ctrl_hs control protocol for a non-pipelined design.

Figure 19: Behavior of ap_ctrl_hs Interface

After reset, the following occurs:

  1. The block waits for ap_start to go High before it begins operation.
  2. Output ap_idle goes Low immediately to indicate the design is no longer idle.
  3. The ap_start signal must remain High until ap_ready goes High. Once ap_ready goes High:
    • If ap_start remains High the design will start the next transaction.
    • If ap_start is taken Low, the design will complete the current transaction and halt operation.
  4. Data can be read on the input ports.
  5. Data can be written to the output ports.
    Note: The input and output ports can also specify a port-level I/O protocol that is independent of the control protocol. For details, see Port-Level I/O Protocols.
  6. Output ap_done goes High when the block completes operation.
    Note: If there is an ap_return port, the data on this port is valid when ap_done is High. Therefore, the ap_done signal also indicates when the data on output ap_return is valid.
  7. When the design is ready to accept new inputs, the ap_ready signal goes High. Following is additional information about the ap_ready signal:
    • The ap_ready signal is inactive until the design starts operation.
    • In non-pipelined designs, the ap_ready signal is asserted at the same time as ap_done.
    • In pipelined designs, the ap_ready signal might go High at any cycle after ap_start is sampled High. This depends on how the design is pipelined.
    • If the ap_start signal is Low when ap_ready is High, the design executes until ap_done is High and then stops operation.
    • If the ap_start signal is High when ap_ready is High, the next transaction starts immediately, and the design continues to operate.
  8. The ap_idle signal indicates when the design is idle and not operating. Following is additional information about the ap_idle signal:
    • If the ap_start signal is Low when ap_ready is High, the design stops operation, and the ap_idle signal goes High one cycle after ap_done.
    • If the ap_start signal is High when ap_ready is High, the design continues to operate, and the ap_idle signal remains Low.

ap_ctrl_chain

The ap_ctrl_chain control protocol is similar to the ap_ctrl_hs protocol but provides an additional input port named ap_continue. An active-High ap_continue signal indicates that the downstream block that consumes the output data is ready for new data inputs. If the downstream block is not able to consume new data inputs, the ap_continue signal is Low, which prevents upstream blocks from generating additional data.

The ap_ready port of the downstream block can directly drive the ap_continue port. Following is additional information about the ap_continue port:

  • If the ap_continue signal is High when ap_done is High, the design continues operating. The behavior of the other block-level control signals is identical to those described in the ap_ctrl_hs block-level I/O protocol.
  • If the ap_continue signal is Low when ap_done is High, the design stops operating, the ap_done signal remains High, and data remains valid on the ap_return port if the ap_return port is present.

In the following figure, the first transaction completes, and the second transaction starts immediately because ap_continue is High when ap_done is High. However, the design halts at the end of the second transaction until ap_continue is asserted High.

Figure 20: Behavior of ap_ctrl_chain Interface

ap_ctrl_none

If you specify the ap_ctrl_none control protocol, the handshake signal ports (ap_start, ap_idle, ap_ready, and ap_done) are not created. You can use this protocol to create a block without control signals as used in data driven kernels.

IMPORTANT: If you use the ap_ctrl_none control protocol in your design, you must meet at least one of the conditions for C/RTL co-simulation as described in Interface Synthesis Requirements to verify the RTL design. If at least one of these conditions is not met, C/RTL co-simulation halts with the following message:
@E [SIM-345] Cosim only supports the following 'ap_ctrl_none' designs: (1) 
combinational designs; (2) pipelined design with task interval of 1; (3) designs with 
array streaming or hls_stream ports.
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***

Managing Interfaces with SSI Technology Devices

Certain Xilinx devices use stacked silicon interconnect (SSI) technology. In these devices, the total available resources are divided over multiple super logic regions (SLRs). The connections between SLRs use super long line (SSL) routes. SSL routes incur delays costs that are typically greater than standard FPGA routing. To ensure designs operate at maximum performance, use the following guidelines:

  • Register all signals that cross between SLRs at both the SLR output and SLR input.
  • You do not need to register a signal if it enters or exits an SLR via an I/O buffer.
  • Ensure that the logic created by Vitis HLS fits within a single SLR.
Note: When you select an SSI technology device as the target technology, the utilization report includes details on both the SLR usage and the total device usage.

If the logic is contained within a single SLR device, Vitis HLS provides a -register_all_io option to the config_rtl command. If the option is enabled, all inputs and outputs are registered. If disabled, none of the inputs or outputs are registered.