Managing Interface Synthesis

Introduction to Interface Synthesis

The parameters of the software functions defined in a Vitis HLS design are synthesized into ports in the RTL code. The parameters of the top-level function are synthesized into interfaces and ports that group multiple signals to encapsulate the communication protocol between the HLS design and things external to the design. Vitis HLS defines interfaces automatically, using industry standards to specify the protocol used. The type of interfaces that Vitis HLS creates depends on the data type and direction of the parameters of the top-level function, the target flow for the active solution, the default interface configuration settings as specified by config_interface, and any specified INTERFACE pragmas or directives.

TIP: The interface types and attributes can be manually assigned using the INTERFACE pragma or directive.
The target flows supported by Vitis HLS as described in Vitis HLS Process Overview include:
  • The Vivado IP flow which is the default flow for the tool
  • The Vitis Kernel flow for the Vitis Application Acceleration Development flow
You can specify the target flow when creating a project solution, as described in Creating a New Vitis HLS Project, or by using the following command:
open_solution -flow_target [vitis | vivado]

The interface provides channels for data from outside the IP or kernel to flow into or out of the design. Data can flow from a variety of sources external to the kernel, such as a host application, an external camera or sensor, or from another kernel or IP implemented on the Xilinx device. However, the interface also provides a control scheme for the IP or kernel to be used in the operation of the block as a unit, and to manage the flow of data across a specific channel or port. These control signals are defined by the block protocol and the port protocol as explained in Block and Port Interface Protocols. The interface defines the size and performance characteristics of the data channel into the hardware design, controls the flow of data through the channels, and also controls the operation of the block.

You can see that the choice and configuration of interfaces is a key to the success of your design. However, Vitis HLS tries to simplify the process by selecting default interfaces for the target flows. For more information on the defaults used refer to Default Interfaces for Vivado IP Flow or Default Interfaces for Vitis Kernel Flow as appropriate to your design.

After synthesis completes you can review the mapping of the software arguments of your C/C++ code to hardware ports or interfaces in the SW I/O Information section of the Synthesis Summary report.

Block and Port Interface Protocols

Block-Level Control Protocols

Block-level interface protocols provide a mechanism for controlling the operation of the Vivado IP or Vitis kernel from other RTL modules, from software applications or drivers, or from the Xilinx Run Time (XRT) in the Vitis Application Acceleration Development flow. Port-level interface protocols provide a similar mechanism for controlling the flow of data through individual ports on the IP or kernel. The specified protocols are keywords directing the HLS tool as to which protocol to implement when generating the RTL output.

Vitis HLS uses block-level control protocols ap_ctrl_chain, ap_ctrl_hs, and ap_ctrl_none to specify if the RTL is implemented with block-level handshake signals, and what signals to include. Block-level handshake signals specify the following:

  • When the design can start to perform the operation
  • When the operation ends
  • When the design is idle and ready for new inputs

By default, Vitis HLS adds a block-level interface protocol to the synthesized design to control the block. The ports of a block-level interface control when the block can start processing data (ap_start), indicate when it is ready to accept new inputs (ap_ready), and indicate if the design is idle (ap_idle) or has completed operation (ap_done). These are discussed in detail in Block-Level I/O Protocols.

Port-Level Control Protocols

Port-level I/O protocols are control signals assigned to the data ports of the Vivado IP or Vitis kernel. The I/O protocol created depends on the type of C/C++ argument and on the target flow. While block-level protocols control when the kernel is started and when it can accept data, the port-level I/O protocols are used to control the flow of data through specific ports.

Port-level protocols come in a variety of standards implemented to support the different designs and applications supported by Xilinx devices. For example, by default in the Vivado IP flow input pointers and pass-by-value arguments are implemented as simple wire ports with no associated handshaking signal, indicated by port protocol ap_none. Output pointers are implemented with an associated signal to indicate when the data is valid, using interface mode ap_vld.

TIP: It is always a good idea to specify a port protocol on an output.

As described in Default Interfaces for Vitis Kernel Flow, AXI4 interfaces are the default. The m_axi interface mode specifies an AXI4 master I/O protocol for arrays and pointers (and references in C++) only. The s_axilite interface mode specifies an AXI4-Lite slave I/O protocol for most other types. However, the ports assigned to these interfaces can also have port protocols to indicate when data is valid and when it has been consumed. Additional details are provided in Port-Level I/O Protocols.

Clock and Reset Ports

If the design takes more than 1 cycle to complete operation, a clock-enable port (ap_ce) can optionally be added to the entire block using the config_interface command, or in the Vitis HLS GUI using the Solution > Solution Settings > General command.

The operation of the reset is described in Controlling the Reset Behavior, and can be modified using the config_rtl command, also available in the Solutions Settings dialog box.

Default Interfaces for Vivado IP Flow

The Vivado IP flow supports a wide variety of I/O protocols and handshakes due to the requirement of supporting FPGA design for a wide variety of applications. This flow implements the following interfaces by default:

  • Scalar inputs: ap_none interface mode
  • Array: ap_memory interface mode
  • Pointers or Reference:
    • Input: ap_none interface mode
    • InOut: ap_ovld interface mode
    • Output: ap_vld interface mode
  • Arguments specified as hls::stream: ap_fifo interface mode
  • Function Return: ap_return port using the ap_none interface mode
  • Block Protocol: ap_ctrl_hs

The sum_io function in the following code provides an example of interface synthesis.

#include "sum_io.h"

dout_t sum_io(din_t in1, din_t in2, dio_t *sum) {

 dout_t temp;

 *sum = in1 + in2 + *sum;
 temp = in1 + in2;

 return  temp;
}

The sum_io function includes:

  • Two pass-by-value inputs: in1 and in2.
  • A pointer: sum that is both read from and written to.
  • A function return assigned the value of temp.

With the default interface synthesis settings used for the Vivado IP flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.

Figure 1: RTL Ports After Default Interface Synthesis

In the default Vivado IP flow the tool creates two types of interface ports on the RTL design to handle the flow of both data and control.

  • Block-level interface protocol: The ap_ctrl interface in the preceding figure has been expanded to show the signals provided by the default ap_ctrl_hs protocol: ap_start, ap_done, ap_ready, and ap_idle.
  • Port-level interface protocols: These are created for each argument in the top-level function and the function return (if the function returns a value). As explained above most of the arguments use a port protocol of ap_none, and so have no control signals. In the sum_io example above these ports include: in1, in2, sum_i, and ap_return. However, the output port uses the ap_vld protocol and so the sum_o output is associated with the sum_o_ap_vld signal.
    Note: Notice that the inout argument, sum, has been split into input and output ports.

Default Interfaces for Vitis Kernel Flow

The Vitis kernel flow provides support for compiled kernel objects (.xo) for control by the Xilinx Run Time (XRT) and integration with a host application. This flow has very specific interface requirements that Vitis HLS must meet. This flow implements the following interfaces by default:

  • Scalar inputs: AXI4-Lite interface (s_axilite)
  • Pointers to an Array: AXI4 memory mapped interface (m_axi) to access the memory, and thes_axilite interface to receive the offset into the memory address space.
  • Arguments specified as hls::stream: AXI4-Stream interface (axis)
  • Function Return: ap_return port added to the s_axilite interface
  • Block Protocol: ap_ctrl_chain specified on the s_axilite interface.

The s_axilite interface is special in the Vitis kernel flow. It handles the input of scalar arguments from the software function into the kernel as well as any function return value; but it also specifies offsets for m_axi interfaces and handles the block control protocol.

The sum_io function in the following code provides an example of interface synthesis.

#include "sum_io.h"

dout_t sum_io(din_t in1, din_t in2, dio_t *sum) {

 dout_t temp;

 *sum = in1 + in2 + *sum;
 temp = in1 + in2;

 return  temp;
}

The sum_io function includes:

  • Two pass-by-value inputs: in1 and in2.
  • A pointer: sum that is both read from and written to.
  • A function return assigned the value of temp.

With the default interface synthesis settings used by Vitis HLS for the Vivado IP flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.

Figure 2: RTL Ports After Default Interface Synthesis

In the default Vitis kernel flow the tool creates three types of interface ports on the RTL design to handle the flow of both data and control.

  • Clock, Reset, and Interrupt ports: ap_clk and ap_rst_n and interrupt are added to the kernel.
  • AXI4-Lite interface: s_axi_control interface to handle data values for scalar arguments in1 and in2, and to handle the function return value. The interface is expanded to show the various ports associated with it.
  • AXI4 memory mapped interface: m_axi_gmem interface to handle the sum argument.
  • Block-Level interface protocol: The default ap_ctrl_chain protocol is associated with the s_axi_control interface.

Details of Interface Synthesis

The following sections provide additional details of how to add and configure interfaces, and details of the available block-level and port-level protocols, including waveform diagrams.

Specifying Interfaces

As discussed previously, the type of interfaces that Vitis HLS creates depends on the data type and direction of the arguments of the top-level function, the target flow for the active solution, the default interface configuration settings, and any specified INTERFACE pragmas or directives.

IMPORTANT: If you specify an illegal interface, Vitis HLS issues a message and implements the default interface mode.

The configuration settings are defined by the config_interface command. You can change the defaults defined in the configuration settings by selecting the Solution > Solution Settings menu command as described in Setting Configuration Options.

The INTERFACE pragma or directive lets you specify details for a specific function argument, or interface port, augmenting the default configuration or overriding it as needed. To specify the interface mode for arguments, open the source code editor to open the Directives view. Right-click the argument on the top-level function in the Directives view in the Vitis HLS IDE, and select Insert Directive to open the Vitis HLS Directive Editor dialog box. Select INTERFACE for the Directive as shown in the following figure.

Figure 3: Vitis HLS Directive Editor Dialog Box

The various options displayed for the INTERFACE directive change depending on the specific interface mode you select. Refer to set_directive_interface for details on the various options, or select the Help command in the Directive Editor dialog box. Following are some items of interest:

Destination
Specifies whether the INTERFACE is added as a directive to the directives.tcl script, or as a pragma to the source code. Refer to Using Directives in Scripts vs. Pragmas in Code for more information.
mode
The interface mode specifies the port control protocol used by the selected argument. Refer to Port-Level I/O Protocols for additional information on the different available port protocols.
register
If you select this option, all pass-by-value reads are performed in the first cycle of operation. For output ports, the register option guarantees the output is registered. You can apply the register option to any function in the design. For memory, FIFO, and AXI4 interfaces, the register option has no effect.
port
Specifies the selected port or argument the INTERFACE is applied to.
depth
This option specifies how many samples are provided to the design by the test bench and how many output values the test bench must store. Use whichever number is greater, the samples or output values.
Note: For cases in which a pointer is read from or written to multiple times within a single transaction, the depth option is required for C/RTL co-simulation. The depth option is not required for arrays or when using the hls::stream construct. It is only required when using pointers on the interface.

If the depth option is set too small, the C/RTL co-simulation might deadlock as follows:

  • The input reads might stall waiting for data that the test bench cannot provide.
  • The output writes might stall when trying to write data, because the storage is full.
offset
This option is used for AXI4 interfaces, and specifies the memory address offset for the specified argument.

AXI Adapter Interface Protocols

IMPORTANT: As discussed in Default Interfaces for Vitis Kernel Flow, the AXI4 adapter interfaces are the default interfaces used by Vitis HLS for the Vitis Application Acceleration Development flow.

The AXI4 interfaces supported by Vitis HLS include the AXI4-Stream interface (axis), AXI4-Lite (s_axilite), and AXI4 master (m_axi) interfaces. For a complete description of the AXI4 interfaces, including timing and ports, see the Vivado Design Suite: AXI Reference Guide (UG1037).

s_axilite
Specify this protocol on any type of argument except streams. The s_axilite mode specifies an AXI4-Lite slave interface.
TIP: You can bundle multiple arguments into a single s_axilite interface.
m_axi
Specify on arrays and pointers (and references in C++) only. The m_axi mode specifies an AXI4 Memory Mapped interface.
TIP: You can group bundle arguments into a single m_axi interface.
axis
Specify this protocol on input arguments or output arguments only, not on input/output arguments. The axis mode specifies an AXI4-Stream interface.

AXI4-Lite Interface

Overview

An HLS IP or kernel can be controlled by a host application, or embedded processor using the Slave AXI4-Lite interface (s_axilite) which acts as a system bus for communication between the processor and the kernel. Using the s_axilite interface the host or an embedded processor can start and stop the kernel, and read or write data to it. When Vitis HLS synthesizes the design the s_axilite interface is implemented as an adapter that captures the data that was communicated from the host in registers on the adapter.

The AXI4-Lite interface performs several functions within a Vivado IP or Vitis kernel:

  • It maps a block-level control mechanism which can be used to start and stop the kernel.
  • It provides a channel for passing scalar arguments, function return values, and address offsets for m_axi interfaces from the host to the IP or kernel
  • For the Vitis Kernel flow:
    • The tool will automatically infer the s_axilite interface pragma for pointer arguments, scalars, and function return type.
    • Bundle: Do not specify the bundle option for the s_axilite adapter in the Vitis Kernel flow. The tool will create a single s_axilite interface that will serve for the whole design.
      IMPORTANT: HLS will return an error if multiple bundles are specified for the Vitis Kernel flow.
    • Offset: The tool will automatically choose the offsets for the interface. Do not specify any offsets in this flow.
  • For the Vivado IP flow:
    • This flow will not use the s_axilite interface by default.
    • To use the s_axilite as a communication channel for scalar arguments, m_axi pointer address, and function return type, you must manually specify the INTERFACE pragma or directive.
    • Bundle: This flow supports multiple s_axilite interfaces, specified by bundle. Refer to S_AXILITE Bundle Rules for more information.
    • Offset: By default the tool will place the arguments in a sequential order starting from 0x10 in the control register map. Refer to S_AXILITE Offset Option for additional details.
S_AXILITE Example

The following example shows how Vitis HLS implements multiple arguments, including the function return, as an s_axilite interface. Because each pragma uses the same name for the bundle option, each of the ports is grouped into a single interface.

void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=a      bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=b      bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=c      bundle=BUS_A
#pragma HLS INTERFACE ap_vld port=b 

  *c += *a + *b;
}
TIP: If you do not specify the bundle option, Vitis HLS groups all arguments into a single s_axilite bundle and automatically names the port.
The synthesized example will be part of a system that has three important elements as shown in the figure below:
  1. Host application running on an x86 or embedded processor interacting with the IP or kernel
  2. SAXI Lite Adapter: The INTERFACE pragma implements an s_axilite adapter. The adapter has two primary functions: implementing the interface protocol to communicate with the host, and providing a Control Register Map to the IP or kernel.
  3. The HLS engine or function that implements the design logic
Figure 4: S_AXILITE Adapter

By default, Vitis HLS automatically assigns the address for each port that is grouped into an s_axilite interface. The size, or range of addresses assigned to a port is dependent on the argument data type and the port protocol used, as described below. You can also explicitly define the address using the offset option as discussed in S_AXILITE Offset Option.

  • Port a: By default, is implemented as ap_none. 1-word for the data signal is assigned and only 3 bits are used as the argument data type is char. Remaining bits are unused.
  • Port b: is implemented as ap_vld defined by the INTERFACE pragma in the example. The corresponding control register is of size 2 bytes (16-bits) and is divided into two sections as follows:
    • (0x1c) Control signal : 1-word for the control signal is assigned.
    • (0x18) Data signal: 1-word for the data signal is assigned and only 3 bits are used as the argument data type is char. Remaining bits are unused.
  • Port C: By default, is implemented as ap_ovld as an output. The corresponding control register is of size 4 bytes (32 bits) and is divided into three sections:
    • (0x20) Data signal of c_i: 1-word for the input data signal is assigned, and only 3 bits are used as the argument data type is char, the rest are not used
    • (0x24) Reserved Space
    • (0x28) Data signal of c_o: 1-word for the output data signal is assigned.
    • (0x2c) Control signal of c_o : 1-word for control signal ap_ovld is assigned and only 3 bits are used as the argument data type is char. Remaining bits are unused.

In operation the host application will initially start the kernel by writing into the Control address space (0x00). The host/CPU completes the initial setup by writing into the other address spaces which are associated with the various function arguments as defined in the example.

The control signal for port b is asserted and only then can the HLS engine read ports a and b (port a is ap_none and does not have a control signal). Until that time the design is stalled and waiting for the valid register to be set for port b. Each time port b is read by the HLS engine the input valid register is cleared and the register resets to logic 0.

After the HLS engine finishes its computation, the output value on port C is stored in the control register and the corresponding valid bit is set for the host to read. After the host reads the data, the HLS engine will write the ap_done bit in the Control register (0x00) to mark the end of the IP computation.

Vitis HLS reports the assigned addresses in the in the S_AXILITE Control Register Map, and also provides them in C Driver Files to aid in your software development. Using the s_axilite interface, you can output C driver files for use with code running on an embedded or x86 processor using provided C application program interface (API) functions, to let you control the hardware from your software.

S_AXILITE Control Register Map
Vitis HLS automatically generates a Control Register Map for controlling the Vivado IP or Vitis kernel, and the ports grouped into s_axilite interface. The register map, which is added to the generated RTL files, can be divided into two sections:
  1. Block-level control signals
  2. Function arguments mapped into the s_axilite interface
In the Vitis kernel flow, the block protocol is associated with the s_axilite interface by default. To change the default block protocol, specify the interface pragma as follows:
#pragma HLS INTERFACE ap_ctrl_hs port=return
In the Vivado IP flow though, the block control protocol is assigned to its own interface, ap_ctrl, as seen in Default Interfaces for Vivado IP Flow. However, if you are using an s_axilite interface in your IP, you can also assign the block control protocol to that interface using the following INTERFACE pragmas, as an example:
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE ap_ctrl_hs port=return bundle=BUS_A

In the Control Register Map, Vitis HLS reserves addresses0x00 through 0x0C for the block-level protocol signals and interrupt controls, as shown below:

Address Description
0x00 Control signals
0x04 Global Interrupt Enable Register
0x08 IP Interrupt Enable Register (Read/Write)
0x0c IP Interrupt Status Register (Read/TOW)

The Control signals (0X00) contains ap_start, ap_done, ap_ready, and ap_idle; and in the case of ap_ctrl_chain the block protocol also contains ap_continue. These are the block-level interface signals which are accessed through the s_axilite adapter.

To start the block operation theap_start bit in the Control register must be set to 1. The HLS engine will then proceed and read any inputs grouped into theAXI4-Liteslave interface from the register in the interface.

When the block completes the operation, theap_done,ap_idleandap_ready registers will be set by the hardware output ports and the results for any output ports grouped into the s_axilite interface read from the appropriate register.

For function arguments, Vitis HLS automatically assigns the address for each argument or port that is assigned to the s_axilite interface. The tool will assign each port an offset starting from 0x10, the lower addresses being reserved for control signals. The size, or range of addresses assigned to a port is dependent on the argument data type and the port protocol used.

Because the variables grouped into an AXI4-Lite interface are function arguments which do not have a default value in the C code, none of the argument registers in the s_axilite interface can be assigned a default value. The registers can be implemented with a reset using the config_rtl command, but they cannot be assigned any other default value.

The Control Register Map generated by Vitis HLS for the previous example is provided below:

//------------------------Address Info-------------------
// 0x00 : Control signals
//        bit 0  - ap_start (Read/Write/COH)
//        bit 1  - ap_done (Read/COR)
//        bit 2  - ap_idle (Read)
//        bit 3  - ap_ready (Read)
//        bit 7  - auto_restart (Read/Write)
//        others - reserved
// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - enable ap_done interrupt (Read/Write)
//        bit 1  - enable ap_ready interrupt (Read/Write)
//        others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - ap_done (COR/TOW)
//        bit 1  - ap_ready (COR/TOW)
//        others - reserved
// 0x10 : Data signal of a
//        bit 7~0 - a[7:0] (Read/Write)
//        others  - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
//        bit 7~0 - b[7:0] (Read/Write)
//        others  - reserved
//  : Control signal of b
//        bit 0  - b_ap_vld (Read/Write/SC)
//        others - reserved
// 0x20 : Data signal of c_i
//        bit 7~0 - c_i[7:0] (Read/Write)
//        others  - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
//        bit 7~0 - c_o[7:0] (Read)
//        others  - reserved
// 0x2c : Control signal of c_o
//        bit 0  - c_o_ap_vld (Read/COR)
//        others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
S_AXILITE and Port-Level Protocols
IMPORTANT: The Vitis kernel flow does not support port protocol assignments on the s_axilite interface. Only the default assignments are supported.
In the Vivado IP flow, you can assign port-level I/O protocols to the individual ports and signals bundled into an s_axilite interface. Port-level I/O protocols sequence data into and out of the HLS engine from the s_axilite adapter as seen in S_AXILITE Example. The tool assigns a default port protocol to a port depending on the type and direction of the argument associated with it. The port can contain one or more of the following:
  • Data signal for the argument
  • Valid signal (ap_vld/ap_ovld) to indicate when the data can be read
  • Acknowledge signal (ap_ack) to indicate when the data has been read

The default port protocol assignments for various argument types are as follows:

Argument Type Default Supported
scalar ap_none ap_ack and ap_vld can also be used
Pointers/References
Inputs ap_none ap_ack and ap_vld
Outputs ap_vld ap_none, ap_ack, and ap_ovld can also be used
Inouts ap_ovld ap_none, ap_ack, and ap_vld are also supported
IMPORTANT: Arrays default to ap_memory. The bram port protocol is not supported for arrays in an s_axilite interface.

The example groups port b into the s_axilite interface and specifies port b as using the ap_vld protocol with INTERFACE pragmas. As a result, the s_axilite adapter contains a register for the port b data, and a register for the port b input valid signal.

If the input valid register is not set to logic 1, the data in the b data register is not considered valid, and the design stalls and waits for the valid register to be set. Each time port b is read, Vitis HLS automatically clears the input valid register and resets the register to logic 0.

Note: To simplify the operation of your design, Xilinx recommends that you use the default port protocols associated with the s_axilite interface.
S_AXILITE Bundle Rules

In the S_AXILITE Example all the function arguments are grouped into a single s_axilite interface adapter specified by the bundle=BUS_A option in the INTERFACE pragma. The bundle option simply lets you group ports together into one interface.

In the Vitis kernel flow there should only be a single interface bundle, commonly named s_axi_control by the tool. So you should not specify the bundle option in that flow, or you will probably encounter an error during synthesis. However, in the Vivado IP flow you can specify multiple bundles using the s_axilite interface, and this will create a separate interface adapter for each bundle you have defined. The following example shows this:
void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=c bundle=OUT
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE ap_vld port=b
  *c += *a + *b;
}

After synthesis completes, the Synthesis Summary report provides feedback regarding the number of s_axilite adapters generated, The SW-to-HW Mapping section of the report contains the HW info showing the control register offset and the address range for each port.

However, there are some rules related to using bundles with the s_axilite interface.

  1. Default Bundle Names: This rule explicitly groups all interface ports with no bundle name into the same AXI4-Lite interface port, uses the tool default bundle name, and names the RTL port s_axi_<default>, typically s_axi_control.
    In this example all ports are mapped to the default bundle:
    void top(char *a, char *b, char *c)
    {
    #pragma HLS INTERFACE s_axilite port=a
    #pragma HLS INTERFACE s_axilite port=b
    #pragma HLS INTERFACE s_axilite port=c
         *c += *a + *b;
    }
  2. User-Specified Bundle Names: This rule explicitly groups all interface ports with the same bundle name into the same AXI4-Lite interface port, and names the RTL port the value specified bys_axi_<string>
    The following example results in interfaces named s_axi_BUS_A, s_axi_BUS_B, and s_axi_OUT:
    void example(char *a, char *b, char *c)
    {
    #pragma HLS INTERFACE s_axilite port=a bundle=BUS_A
    #pragma HLS INTERFACE s_axilite port=b bundle=BUS_B
    #pragma HLS INTERFACE s_axilite port=c bundle=OUT
    #pragma HLS INTERFACE s_axilite port=return bundle=OUT
    #pragma HLS INTERFACE ap_vld port=b
         *c += *a + *b;
    }
  3. Partially Specified Bundle Names: If you specify bundle names for some arguments, but leave other arguments unassigned, then the tool will bundle the arguments as follows:
    • Group all ports into the specified bundles as indicated by the INTERFACE pragmas.
    • Group any ports without bundle assignments into a default named bundle. The default name can either be the standard tool default, or an alternative default name if the tool default has already been specified by the user.

    In the following example the user has specified bundle=control, which is the tool default name. In this case, port c will be assigned to s_axi_control as specified by the user, and the remaining ports will be bundled under s_axi_control_r, which is an alternative default name used by the tool.

    void top(char *a, char *b, char *c) {
    #pragma HLS INTERFACE s_axilite port=a
    #pragma HLS INTERFACE s_axilite port=b
    #pragma HLS INTERFACE s_axilite port=c bundle=control
    }
S_AXILITE Offset Option
Note: The Vitis kernel flow determines the required offsets. Do not specify the offset option in that flow.

In the Vivado IP flow, Vitis HLS defines the size, or range of addresses assigned to a port in the S_AXILITE Control Register Map depending on the argument data type and the port protocol used. However, the INTERFACE pragma also contains an offset option that lets you specify the address offset in the AXI4-Lite interface.

When specifying the offset for your argument, you must consider the size of your data and reserve some extra for the port control protocol. The range of addresses you reserve should be based on a 32-bit word. You should reserve enough 32-bit words to fit your argument data type, and add reserve one additional word for the control protocol, even for ap_none.

TIP: In the case of the ap_memory protocol for arrays, you do not need to reserve the extra word for the control protocol. In this case, simply reserve enough 32-bit words to fit your argument data type.

For example, to reserve enough space for a double you need to reserve 2 32-bit words for the 64-bit data type, and then reserve an additional 32-bit word for the control protocol. So you need to reserve a total of 3 32-bit words, or 96 bits. If your argument offset starts at 0x020, then the next available offset would begin at 0x02c, in order to reserve the required address range for your argument.

If you make a mistake in setting the offset of your arguments, by not reserving enough address range to fit your data type and the control protocol, Vitis HLS will recognize the error, will warn you of the issue, and will recover by moving your misplaced argument register to the end of the Control Register Map. This will allow your build to proceed, but may not work with your host application or driver if they were written to your specified offset.

C Driver Files

When an AXI4-Lite slave interface is implemented, a set of C driver files are automatically created. These C driver files provide a set of APIs that can be integrated into any software running on a CPU and used to communicate with the device via the AXI4-Lite slave interface.

The C driver files are created when the design is packaged as IP in the IP catalog.

Driver files are created for standalone and Linux modes. In standalone mode the drivers are used in the same way as any other Xilinx standalone drivers. In Linux mode, copy all the C files (.c) and header files (.h) files into the software project.

The driver files and API functions derive their name from the top-level function for synthesis. In the above example, the top-level function is called “example”. If the top-level function was named “DUT” the name “example” would be replaced by “DUT” in the following description. The driver files are created in the packaged IP (located in the impl directory inside the solution).

Table 1. C Driver Files for a Design Named example
File Path Usage Mode Description
data/example.mdd Standalone Driver definition file.
data/example.tcl Standalone Used by SDK to integrate the software into an SDK project.
src/xexample_hw.h Both Defines address offsets for all internal registers.
src/xexample.h Both API definitions
src/xexample.c Both Standard API implementations
src/xexample_sinit.c Standalone Initialization API implementations
src/xexample_linux.c Linux Initialization API implementations
src/Makefile Standalone Makefile

In file xexample.h, two structs are defined.

XExample_Config
This is used to hold the configuration information (base address of each AXI4-Lite slave interface) of the IP instance.
XExample
This is used to hold the IP instance pointer. Most APIs take this instance pointer as the first argument.

The standard API implementations are provided in files xexample.c, xexample_sinit.c, xexample_linux.c, and provide functions to perform the following operations.

  • Initialize the device
  • Control the device and query its status
  • Read/write to the registers
  • Set up, monitor, and control the interrupts

Refer to Vitis HLS C Driver Reference for a description of the API functions provided in the C driver files.

IMPORTANT: The C driver APIs always use an unsigned 32-bit type (U32). You might be required to cast the data in the C code into the expected type.
C Driver Files and Float Types

C driver files always use a data 32-bit unsigned integer (U32) for data transfers. In the following example, the function uses float type arguments a and r1. It sets the value of a and returns the value of r1:

float caculate(float a, float *r1)
{
#pragma HLS INTERFACE ap_vld register port=r1
#pragma HLS INTERFACE s_axilite port=a 
#pragma HLS INTERFACE s_axilite port=r1 
#pragma HLS INTERFACE s_axilite port=return 

 *r1 = 0.5f*a;
 return (a>0);
}

After synthesis, Vitis HLS groups all ports into the default AXI4-Lite interface and creates C driver files. However, as shown in the following example, the driver files use type U32:

// API to set the value of A
void XCaculate_SetA(XCaculate *InstancePtr, u32 Data) {
    Xil_AssertVoid(InstancePtr != NULL);
    Xil_AssertVoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);
    XCaculate_WriteReg(InstancePtr->Hls_periph_bus_BaseAddress, 
XCACULATE_HLS_PERIPH_BUS_ADDR_A_DATA, Data);
}

// API to get the value of R1
u32 XCaculate_GetR1(XCaculate *InstancePtr) {
    u32 Data;

    Xil_AssertNonvoid(InstancePtr != NULL);
    Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);

    Data = XCaculate_ReadReg(InstancePtr->Hls_periph_bus_BaseAddress, 
XCACULATE_HLS_PERIPH_BUS_ADDR_R1_DATA);
    return Data;
}

If these functions work directly with float types, the write and read values are not consistent with expected float type. When using these functions in software, you can use the following casts in the code:

float a=3.0f,r1;
u32 ua,ur1;

// cast float “a” to type U32
XCaculate_SetA(&calculate,*((u32*)&a));
ur1=XCaculate_GetR1(&caculate);

// cast return type U32 to float type for “r1”
r1=*((float*)&ur1);
Controlling Hardware

In this example, the hardware header file xexample_hw.h provides a complete list of the memory mapped locations for the ports grouped into the AXI4-Lite slave interface, as described in S_AXILITE Control Register Map.

// 0x00 : Control signals
//        bit 0  - ap_start (Read/Write/SC)
//        bit 1  - ap_done (Read/COR)
//        bit 2  - ap_idle (Read)
//        bit 3  - ap_ready (Read)
//        bit 7  - auto_restart (Read/Write)
//        others - reserved
// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - Channel 0 (ap_done)
//        bit 1  - Channel 1 (ap_ready)
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - Channel 0 (ap_done)
//        others - reserved
// 0x10 : Data signal of a
//        bit 7~0 - a[7:0] (Read/Write)
//        others  - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
//        bit 7~0 - b[7:0] (Read/Write)
//        others  - reserved
// 0x1c : reserved
// 0x20 : Data signal of c_i
//        bit 7~0 - c_i[7:0] (Read/Write)
//        others  - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
//        bit 7~0 - c_o[7:0] (Read)
//        others  - reserved
// 0x2c : Control signal of c_o
//        bit 0  - c_o_ap_vld (Read/COR)
//        others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on 
Handshake)

To correctly program the registers in the s_axilite interface, you must understand how the hardware ports operate with the default port protocols, or the custom protocols as described in S_AXILITE and Port-Level Protocols.

For example, to start the block operation the ap_start register must be set to 1. The device will then proceed and read any inputs grouped into the AXI4-Lite slave interface from the register in the interface. When the block completes operation, the ap_done, ap_idle and ap_ready registers will be set by the hardware output ports and the results for any output ports grouped into the AXI4-Lite slave interface read from the appropriate register.

The implementation of function argument c in the example highlights the importance of some understanding how the hardware ports operate. Function argument c is both read and written to, and is therefore implemented as separate input and output ports c_i and c_o, as explained in S_AXILITE Example.

The first recommended flow for programing the s_axilite interface is for a one-time execution of the function:

  • Use the interrupt function standard API implementations provided in the C Driver Files to determine how you want the interrupt to operate.
  • Load the register values for the block input ports. In the above example this is performed using API functions XExample_Set_a, XExample_Set_b, and XExample_Set_c_i.
  • Set the ap_start bit to 1 using XExample_Start to start executing the function. This register is self-clearing as noted in the header file above. After one transaction, the block will suspend operation.
  • Allow the function to execute. Address any interrupts which are generated.
  • Read the output registers. In the above example this is performed using API functions XExample_Get_c_o_vld, to confirm the data is valid, and XExample_Get_c_o.
    Note: The registers in the s_axilite interface obey the same I/O protocol as the ports. In this case, the output valid is set to logic 1 to indicate if the data is valid.
  • Repeat for the next transaction.

The second recommended flow is for continuous execution of the block. In this mode, the input ports included in the AXI4-Lite interface should only be ports which perform configuration. The block will typically run much faster than a CPU. If the block must wait for inputs, the block will spend most of its time waiting:

  • Use the interrupt function to determine how you wish the interrupt to operate.
  • Load the register values for the block input ports. In the above example this is performed using API functions XExample_Set_a, XExample_Set_a and XExample_Set_c_i.
  • Set the auto-start function using API XExample_EnableAutoRestart.
  • Allow the function to execute. The individual port I/O protocols will synchronize the data being processed through the block.
  • Address any interrupts which are generated. The output registers could be accessed during this operation but the data may change often.
  • Use the API function XExample_DisableAutoRestart to prevent any more executions.
  • Read the output registers. In the above example this is performed using API functions XExample_Get_c_o and XExample_Set_c_o_vld.
Controlling Software

The API functions can be used in the software running on the CPU to control the hardware block. An overview of the process is:

  • Create an instance of the hardware
  • Look Up the device configuration
  • Initialize the device
  • Set the input parameters of the HLS block
  • Start the device and read the results

An example application is shown below.

#include "xexample.h"    // Device driver for HLS HW block
#include "xparameters.h" 

// HLS HW instance
XExample HlsExample;
XExample_Config *ExamplePtr

int main() {
 int res_hw;

// Look Up the device configuration 
 ExamplePtr = XExample_LookupConfig(XPAR_XEXAMPLE_0_DEVICE_ID);
 if (!ExamplePtr) {
 print("ERROR: Lookup of accelerator configuration failed.\n\r");
 return XST_FAILURE;
 }

// Initialize the Device
 status = XExample_CfgInitialize(&HlsExample, ExamplePtr);
 if (status != XST_SUCCESS) {
 print("ERROR: Could not initialize accelerator.\n\r");
 exit(-1);
 }

//Set the input parameters of the HLS block
 XExample_Set_a(&HlsExample, 42);
 XExample_Set_b(&HlsExample, 12);
 XExample_Set_c_i(&HlsExample, 1);

// Start the device and read the results
 XExample_Start(&HlsExample);
 do {
 res_hw = XExample_Get_c_o(&HlsExample);
 } while (XExample_Get_c_o(&HlsExample) == 0); // wait for valid data output
 print("Detected HLS peripheral complete. Result received.\n\r");
}
Control Clock and Reset in AXI4-Lite Interfaces
Note: If you instantiate the slave AXI4-Lite register file in a bus fabric that uses a different clock frequency, Vivado IP integrator will automatically generate a clock domain crossing (CDC) slice that performs the same function as the control clock described below, making use of the option unnecessary.

By default, Vitis HLS uses the same clock for the AXI4-Lite interface and the synthesized design. Vitis HLS connects all registers in the AXI4-Lite interface to the clock used for the synthesized logic (ap_clk).

Optionally, you can use the INTERFACE directive clock option to specify a separate clock for each AXI4-Lite port. When connecting the clock to the AXI4-Lite interface, you must use the following protocols:

  • AXI4-Lite interface clock must be synchronous to the clock used for the synthesized logic (ap_clk). That is, both clocks must be derived from the same master generator clock.
  • AXI4-Lite interface clock frequency must be equal to or less than the frequency of the clock used for the synthesized logic (ap_clk).

If you use the clock option with the INTERFACE directive, you only need to specify the clock option on one function argument in each bundle. Vitis HLS implements all other function arguments in the bundle with the same clock and reset. Vitis HLS names the generated reset signal with the prefix ap_rst_ followed by the clock name. The generated reset signal is active-Low independent of the config_rtl command.

The following example shows how Vitis HLS groups function arguments a and b into an AXI4-Lite port with a clock named AXI_clk1 and an associated reset port.

// Default AXI-Lite interface implemented with independent clock called AXI_clk1
#pragma HLS interface s_axilite port=a clock=AXI_clk1
#pragma HLS interface s_axilite port=b

In the following example, Vitis HLS groups function arguments c and d into AXI4-Lite port CTRL1 with a separate clock called AXI_clk2 and an associated reset port.

// CTRL1 AXI-Lite bundle implemented with a separate clock (called AXI_clk2)
#pragma HLS interface s_axilite port=c bundle=CTRL1 clock=AXI_clk2
#pragma HLS interface s_axilite port=d bundle=CTRL1
Customizing AXI4-Lite Slave Interfaces in IP Integrator

When an HLS RTL design using an AXI4-Lite slave interface is incorporated into a design in Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click with the mouse button and select Customize Block.

The address width is by default configured to the minimum required size. Modify this to connect to blocks with address sizes less than 32-bit.

Figure 5: Customizing AXI4-Lite Slave Interfaces in IP Integrator

AXI4 Master Interface

AXI4 memory-mapped (m_axi) interfaces allow kernels to read and write data in global memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of sharing data across different elements of the accelerated application, such as between the host and kernel, or between kernels on the accelerator card. The main advantages for m_axi interfaces are listed below:
  • The interface has a separate and independent read and write channels
  • It supports burst-based accesses with potential performance of ~19 GB/s
  • It provides support for outstanding transactions

In the Vitis Kernel flow the m_axi interface is assigned by default to pointer and array arguments. In this flow it supports the following default features:

  • Pointer and array arguments are automatically mapped to the m_axi interface
  • The default mode of operation is offset=slave in the Vitis flow and should not be changed
  • All pointer and array arguments are mapped to a single interface bundle to conserve device resources, and ports share read and write access across the time it is active
  • The default alignment in the Vitis flow is set to 64 bytes
  • The maximum read/write burst length is set to 16 by default
While not used by default in the Vivado IP flow, when the m_axi interface is specified it has the following default features:
  • The default operation mode is offset=off but you can change it as described in Offset and Modes of Operation
  • Assigned pointer and array arguments are mapped to a single interface bundle to conserve device resources, and share the interface across the time it is active
  • The default alignment in Vivado IP flow is set to 1 byte
  • The maximum read/write burst length is set to 16 by default

In both the Vivado IP flow and Vitis kernel flow, the INTERFACE pragma or directive can be used to modify default values as needed.

You can use an AXI4 master interface on array or pointer/reference arguments, which Vitis HLS implements in one of the following modes:

  • Individual data transfers
  • Burst mode data transfers

With individual data transfers, Vitis HLS reads or writes a single element of data for each address. The following example shows a single read and single write operation. In this example, Vitis HLS generates an address on the AXI interface to read a single data value and an address to write a single data value. The interface transfers one data value per address.

void bus (int *d) {
 static int acc = 0;

 acc += *d;
 *d  = acc;
}

With burst mode transfers, Vitis HLS reads or writes data using a single base address followed by multiple sequential data samples, which makes this mode capable of higher data throughput. Burst mode of operation is possible when you use the C memcpy function or a pipelined for loop. Refer to Optimizing Burst Transfers for more information.

IMPORTANT: The C memcpy function is only supported for synthesis when used to transfer data to or from a top-level function argument specified with an AXI4 master interface.

The following example shows a copy of burst mode using the memcpy function. The top-level function argument a is specified as an AXI4 master interface.

void example(volatile int *a){

//Port a is assigned to an AXI4 master interface
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE s_axilite port=return

 int i;
 int buff[50];

//memcpy creates a burst access to memory
 memcpy(buff,(const int*)a,50*sizeof(int));

 for(i=0; i < 50; i++){
 buff[i] = buff[i] + 100;
 }

 memcpy((int *)a,buff,50*sizeof(int));
}

When this example is synthesized, it results in the interface shown in the following figure.

Note: In this figure, the AXI4 interfaces are collapsed.
Figure 6: AXI4 Interface

The following example shows the same code as the preceding example but uses a for loop to copy the data out:

void example(volatile int *a){

#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE s_axilite port=return

//Port a is assigned to an AXI4 master interface

 int i;
 int buff[50];

//memcpy creates a burst access to memory
 memcpy(buff,(const int*)a,50*sizeof(int));

 for(i=0; i < 50; i++){
 buff[i] = buff[i] + 100;
 }

 for(i=0; i < 50; i++){
#pragma HLS PIPELINE
 a[i] = buff[i];
 }
}

When using a for loop to implement burst reads or writes, follow these requirements:

  • Pipeline the loop
  • Access addresses in increasing order
  • Do not place accesses inside a conditional statement
  • For nested loops, do not flatten loops, because this inhibits the burst operation
Note: Only one read and one write is allowed in a for loop unless the ports are bundled in different AXI ports. The following example shows how to perform two reads in burst mode using different AXI interfaces.

In the following example, Vitis HLS implements the port reads as burst transfers. Port a is specified without using the bundle option and is implemented in the default AXI interface. Port b is specified using a named bundle and is implemented in a separate AXI interface called d2_port.

void example(volatile int *a, int *b){

#pragma HLS INTERFACE s_axilite port=return 
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE m_axi depth=50 port=b bundle=d2_port


 int i;
 int buff[50];

//copy data in
 for(i=0; i < 50; i++){
#pragma HLS PIPELINE
 buff[i] = a[i] + b[i];
 }
...
 }
Offset and Modes of Operation
IMPORTANT: In the Vitis kernel flow the default mode of operation is offset=slave and should not be changed.

The AXI4 Master interface has a read/write address channel that can be used to read/write specific addresses. By default the m_axi interface starts all read and write operations from the address 0x00000000. For example, given the following code, the design reads data from addresses 0x00000000 to 0x000000C7 (50 32-bit words, gives 200 bytes), which represents 50 address values. The design then writes data back to the same addresses.

#include <stdio.h>
#include <string.h>
 
void example(volatile int *a){
   
#pragma HLS INTERFACE m_axi port=a depth=50
   
  int i;
  int buff[50];
   
  //memcpy creates a burst access to memory
  //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
  //memcpy requires a local buffer to store the results of the memory transaction
  memcpy(buff,(const int*)a,50*sizeof(int));
   
  for(i=0; i < 50; i++){
    buff[i] = buff[i] + 100;
  }
   
  memcpy((int *)a,buff,50*sizeof(int));
}

The tool provides the capability to let the base address be configured statically in the Vivado IP for instance, or dynamically by the application or another IP during run time.

The m_axi interface can be both a master initiating transactions, and also a slave interface that receives the data and sends acknowledgment. Depending on the mode specified with the offset option of the INTERFACE pragma, an HLS IP can use multiple approaches to set the base address.

TIP: The config_interface -m_axi_offset command provides a global setting for the offset, that can be overridden for specific m_axi interfaces using the INTERFACE pragma offset option.
  • Master Mode: When acting as a master interface with different offset options, the m_axi interface start address can be either hard-coded or set at run time.
    • offset=off: Vitis HLS sets a base address for the m_axi interface when the IP is used in the Vivado IP integrator tool. One disadvantage with this approach is that you cannot change the base address during run time. See Customizing AXI4 Master Interfaces in IP Integrator for setting the base address.
      The following example is synthesized with offset=off, the default for the Vivado IP flow.
      void example(volatile int *a){
      #pragma HLS INTERFACE m_axi depth=50 port=a offset=off
         
        int i;
        int buff[50];
         
        //memcpy creates a burst access to memory
        //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
        //memcpy requires a local buffer to store the results of the memory transaction
        memcpy(buff,(const int*)a,50*sizeof(int));
         
        for(i=0; i < 50; i++){
          buff[i] = buff[i] + 100;
        }
         
        memcpy((int *)a,buff,50*sizeof(int));
      }
    • offset=direct: Vitis HLS generates a port on the IP for setting the address. Note the addition of the a port as shown in the figure below. This lets you update the address at run time, so you can have one m_axi interface reading and writing different locations. For example, an HLS module that reads data from an ADC into RAM, and an HLS module that processes that data. Since you can change the address on the module, while one HLS module is processing the initial dataset the other module can be reading more data into different address.
      void example(volatile int *a){
      #pragma HLS INTERFACE m_axi depth=50 port=a offset=direct
      ...
      }
    Figure 7: offset=direct
  • Slave Mode: The slave mode for an interface is set with offset=slave. In this mode the IP will be controlled by the host application, or the micro-controller through the s_axilite interface. This is the default for the Vitis kernel flow, and can also be used in the Vivado IP flow. Here is the flow of operation:
    1. initially, the Host/CPU will start the IP or kernel using the block-level control protocol which is mapped to the s_axilite adapter.
    2. The host will send the scalars and address offsets for the m_axi interfaces through the s_axilite adapter.
    3. The m_axi adapter will read the start address from the s_axilite adapter and store it in a queue.
    4. The HLS design starts to read the data from the global memory.

As shown in the figure below, the HLS design will have both the s_axilite adapter for the base address, and the m_axi to perform read and write transfer to the global memory.

Figure 8: AXI Adapters in Slave Mode

The following are rules associated with the offset option:

  • Fully Specified Offset: When the user explicitly sets the offset value the tool uses the specified settings. The user can also set different offset values for different m_axi interfaces in the design, and the tool will use the specified offsets.
    #pragma HLS INTERFACE s_axilite port=return
    #pragma HLS INTERFACE m_axi bundle=BUS_A port=out_r offset=direct
    #pragma HLS INTERFACE m_axi bundle=BUS_B port=in1 offset=slave
    #pragma HLS INTERFACE m_axi bundle=BUS_C port=in2 offset=off
  • No Offset Specified: If there are no offsets specified in the INTERFACE pragma, the tool will defer to the setting specified by config_interface -m_axi_offset.
    Note: If the global m_axi_offset setting is specified, and the design has an s_axilite interface, the global setting is ignored and offset=slave is assumed.
    void top(int *a) {
    #pragma HLS interface m_axi port=a
    #pragma HLS interface s_axilite port=a
    }
Controlling the Address Offset in an AXI4 Interface

By default, the AXI4 master interface starts all read and write operations from address 0x00000000. For example, given the following code, the design reads data from addresses 0x00000000 to 0x000000C7 (50 32-bit words, gives 200 bytes), which represents 50 address values. The design then writes data back to the same addresses.

void example(volatile int *a){

#pragma HLS INTERFACE m_axi depth=50 port=a 
#pragma HLS INTERFACE s_axilite port=return bundle=AXILiteS

 int i;
 int buff[50];

 memcpy(buff,(const int*)a,50*sizeof(int));

 for(i=0; i < 50; i++){
 buff[i] = buff[i] + 100;
 }
 memcpy((int *)a,buff,50*sizeof(int));
}

To apply an address offset, use the -offset option with the INTERFACE directive, and specify one of the following options:

  • off: Does not apply an offset address. This is the default.
  • direct: Adds a 32-bit port to the design for applying an address offset.
  • slave: Adds a 32-bit register inside the AXI4-Lite interface for applying an address offset.

In the final RTL, Vitis HLS applies the address offset directly to any read or write address generated by the AXI4 master interface. This allows the design to access any address location in the system.

If you use the slave option in an AXI interface, you must use an AXI4-Lite port on the design interface. Xilinx recommends that you implement the AXI4-Lite interface using the following pragma:

#pragma HLS INTERFACE s_axilite port=return

In addition, if you use the slave option and you used several AXI4-Lite interfaces, you must ensure that the AXI master port offset register is bundled into the correct AXI4-Lite interface. In the following example, port a is implemented as an AXI master interface with an offset and AXI4-Lite interfaces called AXI_Lite_1 and AXI_Lite_2:

#pragma HLS INTERFACE m_axi port=a depth=50 offset=slave 
#pragma HLS INTERFACE s_axilite port=return bundle=AXI_Lite_1
#pragma HLS INTERFACE s_axilite port=b bundle=AXI_Lite_2

The following INTERFACE directive is required to ensure that the offset register for port a is bundled into the AXI4-Lite interface called AXI_Lite_1:

#pragma HLS INTERFACE s_axilite port=a bundle=AXI_Lite_1
M_AXI Bundles

Vitis HLS groups function arguments with compatible options into a single m_axi interface adapter. Bundling ports into a single interface helps save FPGA resources by eliminating AXI logic, but it can limit the performance of the kernel because all the memory transfers have to go through a single interface. The m_axi interface has independent READ and WRITE channels, so a single interface can read and write simultaneously, though only at one location. Using multiple bundles the bandwidth and throughput of the kernel can be increased by creating multiple interfaces to connect to multiple memory banks.

In the following example all the pointer arguments are grouped into a single m_axi adapter using the interface option bundle=BUS_A, and adds a single s_axilite adapter also named BUS_A for the m_axi offsets, the scalar argument size, and the function return.

extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
          const unsigned int *in2, // Read-Only Vector 2
          unsigned int *out_r,     // Output Result
          int size                 // Size in integer
          ) {
 
#pragma HLS INTERFACE m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE m_axi bundle=BUS_A port=in2
#pragma HLS INTERFACE s_axilite port=in1
#pragma HLS INTERFACE s_axilite port=in2
#pragma HLS INTERFACE s_axilite port=out_r
#pragma HLS INTERFACE s_axilite port=size
#pragma HLS INTERFACE s_axilite port=return
Figure 9: MAXI and S_AXILITE

You can also choose to bundle function arguments into separate interface adapters as shown in the following code. Here the argument in2 is grouped into a separate interface adapter with bundle=BUS_B. This creates a new m_axi interface adapter for port in2.

extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
          const unsigned int *in2, // Read-Only Vector 2
          unsigned int *out_r,     // Output Result
          int size                 // Size in integer
          ) {
 
#pragma HLS INTERFACE m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE m_axi bundle=BUS_B port=in2
#pragma HLS INTERFACE s_axilite port=in1
#pragma HLS INTERFACE s_axilite port=in2
#pragma HLS INTERFACE s_axilite port=out_r
#pragma HLS INTERFACE s_axilite port=size
#pragma HLS INTERFACE s_axilite port=return
Figure 10: 2 MAXI Bundles

The global configuration command config_interface -m_axi_auto_max_ports false will limit the number of interface bundles to the minimum required. It will allow the tool to group compatible ports into a single m_axi interface. The default setting for this command is disabled (false), but you can enable it to maximize bandwidth by creating a separate m_axi adapter for each port.

With m_axi_auto_max_ports disabled, the following are some rules for how the tool handles bundles under different circumstances:

  1. Default Bundle Name: The tool groups all interface ports with no bundle name into a single m_axi interface port using the tool default name bundle=<default>, and names the RTL port m_axi_<default>. The following pragmas:
    #pragma HLS INTERFACE m_axi port=a depth=50 
    #pragma HLS INTERFACE m_axi port=a depth=50
    #pragma HLS INTERFACE m_axi port=a depth=50 
    

    Result in the following messages:

    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    
  2. User-Specified Bundle Names: The tool groups all interface ports with the same user-specified bundle=<string> into the same m_axi interface port, and names the RTL port the value specified by m_axi_<string>. Ports without bundle assignments are grouped into the default bundle as described above. The following pragmas:
    #pragma HLS INTERFACE m_axi port=a depth=50 bundle=BUS_A
    #pragma HLS INTERFACE m_axi port=b depth=50
    #pragma HLS INTERFACE m_axi port=c depth=50
    

    Result in the following messages:

    INFO: [RTGEN 206-500] Setting interface mode on port 'example/BUS_A' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
    
    IMPORTANT: If you bundle incompatible interfaces Vitis HLS issues a message and ignores the bundle assignment.
Controlling AXI4 Burst Behavior

An optimal AXI4 interface is one in which the design never stalls while waiting to access the bus, and after bus access is granted, the bus never stalls while waiting for the design to read/write. To create the optimal AXI4 interface, the following options are provided in the INTERFACE pragma or directive to specify the behavior of the bursts and optimize the efficiency of the AXI4 interface. Refer to Optimizing Burst Transfers for more information on burst transfers.

Some of these options use internal storage to buffer data and may have an impact on area and resources:

  • latency: Specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request a number of cycles (latency) before the read or write is expected. If this figure is too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be granted but the bus may stall waiting on the design to start the access.
  • max_read_burst_length: Specifies the maximum number of data values read during a burst transfer.
  • num_read_outstanding: Specifies how many read requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size: num_read_outstanding*max_read_burst_length*word_size.
  • max_write_burst_length: Specifies the maximum number of data values written during a burst transfer.
  • num_write_outstanding: Specifies how many write requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size: num_read_outstanding*max_read_burst_length*word_size

The following example can be used to help explain these options:

 #pragma HLS interface m_axi port=input offset=slave bundle=gmem0 
depth=1024*1024*16/(512/8) 
 latency=100 
 num_read_outstanding=32 
 num_write_outstanding=32 
 max_read_burst_length=16
 max_write_burst_length=16 

The interface is specified as having a latency of 100. Vitis HLS seeks to schedule the request for burst access 100 clock cycles before the design is ready to access the AXI4 bus. To further improve bus efficiency, the options num_write_outstanding and num_read_outstanding ensure the design contains enough buffering to store up to 32 read and write accesses. This allows the design to continue processing until the bus requests are serviced. Finally, the options max_read_burst_length and max_write_burst_length ensure the maximum burst size is 16 and that the AXI4 interface does not hold the bus for longer than this.

These options allow the behavior of the AXI4 interface to be optimized for the system in which it will operate. The efficiency of the operation does depend on these values being set accurately.

Automatic Port Width Resizing

In the Vitis tool flow Vitis HLS provides the ability to automatically re-size m_axi interface ports to 512-bits to improve burst access. However, automatic port width resizing only supports standard C data types and does not support non-aggregate types such as ap_int, ap_uint, struct, or array.

IMPORTANT: Structs on the interface prevent automatic widening of the port. You must break the struct into individual elements to enable this feature.

Vitis HLS controls automatic port width resizing using the following two commands:

  • config_interface -m_axi_max_widen_bitwidth <N>: Directs the tool to automatically widen bursts on M-AXI interfaces up to the specified bitwidth. The value of <N> must be a power-of-two between 0 and 1024.
  • config_interface -m_axi_alignment_byte_size <N>: Note that burst widening also requires strong alignment properties. Assume pointers that are mapped to m_axi interfaces are at least aligned to the provided width in bytes (power of two). This can help automatic burst widening.
In the Vitis Kernel flow automatic port width resizing is enabled by default with the following:
config_interface -m_axi_max_widen_bitwidth 512
config_interface -m_axi_alignment_byte_size 64
In the Vivado IP flow this feature is disabled by default:
config_interface -m_axi_max_widen_bitwidth 0
config_interface -m_axi_alignment_byte_size 0

Automatic port width resizing will only re-size the port if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in Optimizing Burst Transfers, are also needed for port resizing. These conditions include:

  • Must be a monotonically increasing order of access (both in terms of the memory location being accessed as well as in time). You cannot access a memory location that is in between two previously accessed memory locations- aka no overlap.
  • The access pattern from the global memory should be in sequential order, and with the following additional requirements:
    • The sequential accesses need to be on a non-vector type
    • The start of the sequential accesses needs to be aligned to the widen word size
    • The length of the sequential accesses needs to be divisible by the widen factor

The following code example is used in the calculations that follow:

vadd_pipeline:
  for (int i = 0; i < iterations; i++) {
#pragma HLS LOOP_TRIPCOUNT min = c_len/c_n max = c_len/c_n

  // Pipelining loops that access only one variable is the ideal way to
  // increase the global memory bandwidth.
  read_a:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      result[x] = a[i * N + x];
    }

  read_b:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      result[x] += b[i * N + x];
    }

  write_c:
    for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
      c[i * N + x] = result[x];
    }
  }
}
}

The width of the automatic optimization for the code above is performed in three steps:

  1. First, the tool checks for the number of access patterns in the read_a loop. There is one access during one loop iteration, so the optimization determines the interface bit-width as 32= 32 *1 (bitwidth of the int variable * accesses).
  2. The tool tries to reach the default max specified by the config_interface m_axi_max_widen_bitwidth 512, using the following expression terms:
    length = (ceil((loop-bound of index inner loops) * 
    (loop-bound of index - outer loops)) * #(of access-patterns))
    • In the above code, the outer loop is an imperfect loop so there will not be burst transfers on the outer-loop. Therefore the length will only include the inner-loop. Therefore the formula will be shortened to:
      length = (ceil((loop-bound of index inner loops)) * #(of access-patterns))

      or: length = ceil(128) *32 = 4096

  3. Finally, is the calculated length a power of 2? If Yes, then the length will be capped to the width specified by the m_axi_max_widen_bitwidth.

There are some pros and cons to using the automatic port width resizing which you should consider when using this feature. This feature improves the read latency from the DDR as the tool is reading a big vector, instead of the data type size. It also adds more resources as it needs to buffer the huge vector and shift the data accordingly to the data path size.

Creating an AXI4 Interface with 32-bit Address Capability
By default, Vitis HLS implements the AXI4 port with a 64-bit address bus. Optionally, you can implement the AXI4 interface with a 32-bit address bus by disabling the m_axi_addr64 interface configuration option as follows:
  1. Select Solution > Solution Settings.
  2. In the Solution Settings dialog box, click the General category, and Edit the existing config_interface command, or click Add to add one.
  3. In the Edit or Add dialog box, select config_interface, and disable m_axi_addr64.
IMPORTANT: When you select the m_axi_addr64 option, Vitis HLS implements all AXI4 interfaces in the design with a 32-bit address bus.
Customizing AXI4 Master Interfaces in IP Integrator

When you incorporate an HLS RTL design that uses an AXI4 master interface into a design in the Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click, and select Customize Block to customize any of the settings provided. A complete description of the AXI4 parameters is provided in this link in the Vivado Design Suite: AXI Reference Guide (UG1037).

The following figure shows the Re-Customize IP dialog box for the design shown below. This design includes an AXI4-Lite port.

Figure 11: Customizing AXI4 Master Interfaces in IP Integrator

AXI4-Stream Interfaces

An AXI4-Stream interface can be applied to any input argument and any array or pointer output argument. Because an AXI4-Stream interface transfers data in a sequential streaming manner, it cannot be used with arguments that are both read and written. In terms of data layout, the data type of the AXI4-Stream is aligned to the next byte. For example, if the size of the data type is 12 bits, it will be extended to 16 bits. Depending on whether a signed/unsigned interface is selected, the extended bits are either sign-extended or zero-extended. If the stream data type is a user-defined struct, the struct is aggregated and aligned to the size of the largest data element within the struct.

The following code examples show how the packed alignment depends on your struct type. If the struct contains only char type, as shown in the following example, then it will be packed with alignment of one byte. Total size of the struct will be two bytes:

struct A {
  char foo;
  char bar;
};

However, if the struct has elements with different data types, as shown below, then it will be packed and aligned to the size of the largest data element, or four bytes in this example. Element bar will be padded with three bytes resulting in a total size of eight bytes for the struct:

struct A {
  int foo;
  char bar;
};

By default, user-defined structs in streams are aggregated. However, you can disaggregate the struct and infer a stream for each element of the struct, using the following steps:

  1. Specify the DISAGGREGATE pragma or directive for the struct.
  2. Specify the AXI4-Stream INTERFACE pragma or directive for each element of the disaggregated struct.

The result will be one AXI4-Stream for every member of the struct in the interface.

How AXI4-Stream is Implemented

The AXI4-Stream interface is implemented as a struct type in Vitis HLS and has the following signature (defined in ap_axi_sdata.h):

template <typename T, size_t WUser, size_t WId, size_t WDest> struct axis { .. };

Where:

T
Stream data type
WUser
Width of the TUSER signal
WId
Width of the TID signal
WDest
Width of the TDest signal

When the stream data type (T) are simple integer types, there are two predefined types of AXI4-Stream implementations available:

  • A signed implementation of the AXI4-Stream class (or more simply ap_axis<Wdata, WUser, WId, WDest>)
    hls::axis<ap_int<WData>, WUser, WId, WDest>
  • An unsigned implementation of the AXI4-Stream class (or more simply ap_axiu<WData, WUser, WId, WDest>)
    hls::axis<ap_uint<WData>, WUser, WId, WDest>

The value specified for the WUser, WId, and WDest template parameters controls the usage of side-channel signals in the AXI4-Stream interface.

When the hls::axis class is used, the generated RTL will typically contain the actual data signal TDATA, and the following additional signals: TVALID, TREADY, TKEEP, TSTRB, TLAST, TUSER, TID, and TDEST.

TVALID, TREADY, and TLAST are necessary control signals for the AXI4-Stream protocol. TKEEP, TSTRB, TUSER, TID, and TDEST signals are special signals that can be used to pass around additional bookkeeping data.

TIP: If WUser, WId, and WDest are set to 0, the generated RTL will not include the TUSER, TID, and TDEST signals in the interface.
How AXI4-Stream Works

AXI4-Stream is a protocol designed for transporting arbitrary unidirectional data. In an AXI4-Stream, TDATA width of bits is transferred per clock cycle. The transfer is started once the producer sends the TVALID signal and the consumer responds by sending the TREADY signal (once it has consumed the initial TDATA). At this point, the producer will start sending TDATA and TLAST (TUSER if needed to carry additional user-defined sideband data). TLAST signals the last byte of the stream. So the consumer keeps consuming the incoming TDATA until TLAST is asserted.

Figure 12: AXI4-Stream Handshake

AXI4-Stream has additional optional features like sending positional data with TKEEP and TSTRB ports which makes it possible to multiplex both the data position and data itself on the TDATA signal. Using the TID and TDIST signals, you can route streams as these fields roughly corresponds to stream identifier and stream destination identifier. Refer to Vivado Design Suite: AXI Reference Guide (UG1037) or the AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A) for more information.

Registered AXI4-Stream Interfaces

As a default, AXI4-Stream interfaces are always implemented as registered interfaces to ensure that no combinational feedback paths are created when multiple HLS IP blocks with AXI4-Stream interfaces are integrated into a larger design. For AXI4-Stream interfaces, four types of register modes are provided to control how the interface registers are implemented:

Forward
Only the TDATA and TVALID signals are registered.
Reverse
Only the TREADY signal is registered.
Both
All signals (TDATA, TREADY, and TVALID) are registered. This is the default.
Off
None of the port signals are registered.

The AXI4-Stream side-channel signals are considered to be data signals and are registered whenever TDATA is registered.

Note: When connecting HLS generated IP blocks with AXI4-Stream interfaces at least one interface should be implemented as a registered interface or the blocks should be connected via an AXI4-Stream Register Slice.

There are two basic methods to use an AXI4-Stream in your design:

  • Use an AXI4-Stream without side-channels.
  • Use an AXI4-Stream with side-channels.

This second use model provides additional functionality, allowing the optional side-channels which are part of the AXI4-Stream standard, to be used directly in your C/C++ code.

AXI4-Stream Interfaces without Side-Channels

An AXI4-Stream is used without side-channels when the function argument, ap_axis or ap_axiu data type, does not contain any AXI4 side-channel elements (that is, when the WUser, WId, and WDest parameters are set to 0). In the following example, both interfaces are implemented using an AXI4-Stream:

#include "ap_axi_sdata.h"
#include "hls_stream.h"

typedef ap_axiu<32, 0, 0, 0> trans_pkt;

void example(hls::stream< trans_pkt > &A, hls::stream< trans_pkt > &B)
{
#pragma HLS INTERFACE axis port=A
#pragma HLS INTERFACE axis port=B
    trans_pkt tmp;
    A.read(tmp);
    tmp.data += 5;
    B.write(tmp);
}

After synthesis, both arguments are implemented with a data port (TDATA) and the standard AXI4-Stream protocol ports, TVALID, TREADY, TKEEP, TLAST, and TSTRB, as shown in the following figure.

Figure 13: AXI4-Stream Interfaces without Side-Channels
TIP: If you specify an hls::stream object with a data type other than ap_axis or ap_axiu, the tool will infer an AXI4-Stream interface without the TLAST signal, or any of the side-channel signals. This implementation of the AXI4-Stream interface consumes fewer device resources, but offers no visibility into when the stream is ending.

Multiple variables can be combined into the same AXI4-Stream interface by using a struct, which is aggregated by Vitis HLS by default. Aggregating the elements of a struct into a single wide-vector, allows all elements of the struct to be implemented in the same AXI4-Stream interface.

AXI4-Stream Interfaces with Side-Channels

The following example shows how the side-channels can be used directly in the C/C++ code and implemented on the interface. The code uses #include "ap_axi_sdata.h" to provide an API to handle the side-channels of the AXI4-Stream interface. In the following example a signed 32-bit data type is used:

#include "ap_axi_sdata.h"
#include "ap_int.h"
#include "hls_stream.h"

#define DWIDTH 32

typedef ap_axiu<DWIDTH, 1, 1, 1> trans_pkt;

extern "C"{
    void krnl_stream_vmult(hls::stream<trans_pkt> &A, 
						   hls::stream<trans_pkt> &B) {
#pragma HLS INTERFACE axis port=A
#pragma HLS INTERFACE axis port=B
#pragma HLS INTERFACE s_axilite port=return bundle=control
        bool eos = false;
        
        vmult: do {
#pragma HLS PIPELINE II=1
            trans_pkt t2 = A.read();
            
            // Packet for Output
            trans_pkt t_out;
            
            // Reading data from input packet
            ap_uint<DWIDTH> in2 = t2.data;
            ap_uint<DWIDTH> tmpOut = in2 * 5;

            // Setting data and configuration to output packet
            t_out.data = tmpOut;
            t_out.last = t2.last;
            t_out.keep = -1; //Enabling all bytes
            // Writing packet to output stream
            B.write(t_out);
            if (t2.last) {
               eos = true;
            }
        } while (eos == false);
    }
}

After synthesis, both the A and B arguments are implemented with data ports, the standard AXI4-Stream protocol ports, TVALID and TREADY and all of the optional ports described in the struct.

Figure 14: AXI4-Stream Interfaces with Side-Channels

Port-Level I/O Protocols

By default input pointers and pass-by-value arguments are implemented as simple wire ports with no associated handshaking signal. For example, in the sum_io function discussed in Default Interfaces for Vivado IP Flow, the input ports are implemented without an I/O protocol, only a data port. If the port has no I/O protocol, (by default or by design) the input data must be held stable until it is read.

By default output pointers are implemented with an associated output valid signal to indicate when the output data is valid. In the sum_io function example, the output port is implemented with an associated output valid port (sum_o_ap_vld) which indicates when the data on the port is valid and can be read. If there is no I/O protocol associated with the output port, it is difficult to know when to read the data.
TIP: It is always a good idea to use an I/O protocol on an output.

Function arguments which are both read from and written to are split into separate input and output ports. In the sum_io function example, the sum argument is implemented as both an input port sum_i, and an output port sum_o with associated I/O protocol port sum_o_ap_vld.

If the function has a return value, an output port ap_return is implemented to provide the return value. When the RTL design completes one transaction, this is equivalent to one execution of the C/C++ function, the block-level protocols indicate the function is complete with the ap_done signal. This also indicates the data on port ap_return is valid and can be read.

Note: The return value of the top-level function cannot be a pointer.

For the example code shown the timing behavior is shown in the following figure (assuming that the target technology and clock frequency allow a single addition per clock cycle).

Figure 15: RTL Port Timing with Default Synthesis
  • The design starts when ap_start is asserted High.
  • The ap_idle signal is asserted Low to indicate the design is operating.
  • The input data is read at any clock after the first cycle. Vitis HLS schedules when the reads occur. The ap_ready signal is asserted High when all inputs have been read.
  • When output sum is calculated, the associated output handshake (sum_o_ap_vld) indicates that the data is valid.
  • When the function completes, ap_done is asserted. This also indicates that the data on ap_return is valid.
  • Port ap_idle is asserted High to indicate that the design is waiting start again.

Port-Level I/O: No Protocol

The ap_none specifies that no I/O protocol be added to the port. When this is specified the argument is implemented as a data port with no other associated signals. The ap_none mode is the default for scalar inputs.

ap_none

The ap_none port-level I/O protocol is the simplest interface type and has no other signals associated with it. Neither the input nor output data signals have associated control ports that indicate when data is read or written. The only ports in the RTL design are those specified in the source code.

An ap_none interface does not require additional hardware overhead. However, the ap_none interface does requires the following:

  • Producer blocks to do one of the following:
    • Provide data to the input port at the correct time
    • Hold data for the length of a transaction until the design completes
  • Consumer blocks to read output ports at the correct time
Note: The ap_none interface cannot be used with array arguments.

Port-Level I/O: Wire Handshakes

Interface mode ap_hs includes a two-way handshake signal with the data port. The handshake is an industry standard valid and acknowledge handshake. Mode ap_vld is the same but only has a valid port and ap_ack only has a acknowledge port.

Mode ap_ovld is for use with in-out arguments. When the in-out is split into separate input and output ports, mode ap_none is applied to the input port and ap_vld applied to the output port. This is the default for pointer arguments that are both read and written.

The ap_hs mode can be applied to arrays that are read or written in sequential order. If Vitis HLS can determine the read or write accesses are not sequential, it will halt synthesis with an error. If the access order cannot be determined, Vitis HLS will issue a warning.

ap_hs (ap_ack, ap_vld, and ap_ovld)

The ap_hs port-level I/O protocol provides the greatest flexibility in the development process, allowing both bottom-up and top-down design flows. Two-way handshakes safely perform all intra-block communication, and manual intervention or assumptions are not required for correct operation. The ap_hs port-level I/O protocol provides the following signals:

  • Data port
  • Valid signal to indicate when the data signal is valid and can be read
  • Acknowledge signal to indicate when the data has been read

The following figure shows how an ap_hs interface behaves for both an input and output port. In this example, the input port is named in, and the output port is named out.

Note: The control signals names are based on the original port name. For example, the valid port for data input in is named in_vld.
Figure 16: Behavior of ap_hs Interface

For inputs, the following occurs:

  • After start is applied, the block begins normal operation.
  • If the design is ready for input data but the input valid is Low, the design stalls and waits for the input valid to be asserted to indicate a new input value is present.
    Note: The preceding figure shows this behavior. In this example, the design is ready to read data input in on clock cycle 4 and stalls waiting for the input valid before reading the data.
  • When the input valid is asserted High, an output acknowledge is asserted High to indicate the data was read.

For outputs, the following occurs:

  • After start is applied, the block begins normal operation.
  • When an output port is written to, its associated output valid signal is simultaneously asserted to indicate valid data is present on the port.
  • If the associated input acknowledge is Low, the design stalls and waits for the input acknowledge to be asserted.
  • When the input acknowledge is asserted, indicating the data has been read, the output valid is deasserted on the next clock edge.
ap_ack

The ap_ack port-level I/O protocol is a subset of the ap_hs interface type. The ap_ack port-level I/O protocol provides the following signals:

  • Data port
  • Acknowledge signal to indicate when data is consumed
    • For input arguments, the design generates an output acknowledge that is active-High in the cycle the input is read.
    • For output arguments, Vitis HLS implements an input acknowledge port to confirm the output was read.
    Note: After a write operation, the design stalls and waits until the input acknowledge is asserted High, which indicates the output was read by a consumer block. However, there is no associated output port to indicate when the data can be consumed.
CAUTION: You cannot use C/RTL co-simulation to verify designs that use ap_ack on an output port.
ap_vld

The ap_vld is a subset of the ap_hs interface type. The ap_vld port-level I/O protocol provides the following signals:

  • Data port
  • Valid signal to indicate when the data signal is valid and can be read
    • For input arguments, the design reads the data port as soon as the valid is active. Even if the design is not ready to read new data, the design samples the data port and holds the data internally until needed.
    • For output arguments, Vitis HLS implements an output valid port to indicate when the data on the output port is valid.
ap_ovld

The ap_ovld is a subset of the ap_hs interface type. The ap_ovld port-level I/O protocol provides the following signals:

  • Data port
  • Valid signal to indicate when the data signal is valid and can be read
    • For input arguments and the input half of inout arguments, the design defaults to type ap_none.
    • For output arguments and the output half of inout arguments, the design implements type ap_vld.

Port-Level I/O: Memory Interface Protocol

Array arguments are implemented by default as an ap_memory interface. This is a standard block RAM interface with data, address, chip-enable, and write-enable ports.

An ap_memory interface can be implemented as a single-port of dual-port interface. If Vitis HLS can determine that using a dual-port interface will reduce the initial interval, it will automatically implement a dual-port interface. The BIND_STORAGE pragma or directive is used to specify the memory resource and if this directive is specified on the array with a single-port block RAM, a single-port interface will be implemented. Conversely, if a dual-port interface is specified using the BIND_STORAGE pragma and Vitis HLS determines this interface provides no benefit it will automatically implement a single-port interface.

If the array is accessed in a sequential manner an ap_fifo interface can be used. As with the ap_hs interface, Vitis HLS will halt if it determines the data access is not sequential, report a warning if it cannot determine if the access is sequential or issue no message if it determines the access is sequential. The ap_fifo interface can only be used for reading or writing, not both.

ap_memory, bram

The ap_memory and bram interface port-level I/O protocols are used to implement array arguments. This type of port-level I/O protocol can communicate with memory elements (for example, RAMs and ROMs) when the implementation requires random accesses to the memory address locations.

Note: If you only need sequential access to the memory element, use the ap_fifo interface instead. The ap_fifo interface reduces the hardware overhead, because address generation is not performed.

The ap_memory and bram interface port-level I/O protocols are identical. The only difference is the way Vivado IP integrator shows the blocks:

  • The ap_memory interface appears as discrete ports.
  • The bram interface appears as a single, grouped port. In IP integrator, you can use a single connection to create connections to all ports.

When using an ap_memory interface, specify the array targets using the BIND_STORAGE pragma. If no target is specified for the arrays, Vitis HLS determines whether to use a single or dual-port RAM interface.

TIP: Before running synthesis, ensure array arguments are targeted to the correct memory type using the BIND_STORAGE pragma. Re-synthesizing with corrected memories can result in a different schedule and RTL.

The following figure shows an array named d specified as a single-port block RAM. The port names are based on the C/C++ function argument. For example, if the C/C++ argument is d, the chip-enable is d_ce, and the input data is d_q0 based on the output/q port of the BRAM.

Figure 17: Behavior of ap_memory Interface

After reset, the following occurs:

  • After start is applied, the block begins normal operation.
  • Reads are performed by applying an address on the output address ports while asserting the output signal d_ce.
    Note: For a default block RAM, the design expects the input data d_q0 to be available in the next clock cycle. You can use the BIND_STORAGE pragma to indicate the RAM has a longer read latency.
  • Write operations are performed by asserting output ports d_ce and d_we while simultaneously applying the address and output data d_d0.
ap_fifo

When an output port is written to, its associated output valid signal interface is the most hardware-efficient approach when the design requires access to a memory element and the access is always performed in a sequential manner, that is, no random access is required. The ap_fifo port-level I/O protocol supports the following:

  • Allows the port to be connected to a FIFO
  • Enables complete, two-way empty-full communication
  • Works for arrays, pointers, and pass-by-reference argument types
Note: Functions that can use an ap_fifo interface often use pointers and might access the same variable multiple times. To understand the importance of the volatile qualifier when using this coding style, see Multi-Access Pointers on the Interface.

In the following example, in1 is a pointer that accesses the current address, then two addresses above the current address, and finally one address below.

void foo(int* in1, ...) {
 int data1, data2, data3;  
       ...
 data1= *in1; 
 data2= *(in1+2);
 data3= *(in1-1);
 ...
}

If in1 is specified as an ap_fifo interface, Vitis HLS checks the accesses, determines the accesses are not in sequential order, issues an error, and halts. To read from non-sequential address locations, use an ap_memory or bram interface.

You cannot specify an ap_fifo interface on an argument that is both read from and written to. You can only specify an ap_fifo interface on an input or an output argument. A design with input argument in and output argument out specified as ap_fifo interfaces behaves as shown in the following figure.

Figure 18: Behavior of ap_fifo Interface

For inputs, the following occurs:

  • After ap_start is applied, the block begins normal operation.
  • If the input port is ready to be read but the FIFO is empty as indicated by input port in_empty_n Low, the design stalls and waits for data to become available.
  • When the FIFO contains data as indicated by input port in_empty_n High, an output acknowledge in_read is asserted High to indicate the data was read in this cycle.

For outputs, the following occurs:

  • After start is applied, the block begins normal operation.
  • If an output port is ready to be written to but the FIFO is full as indicated by out_full_n Low, the data is placed on the output port but the design stalls and waits for the space to become available in the FIFO.
  • When space becomes available in the FIFO as indicated by out_full_n High, the output acknowledge signal out_write is asserted to indicate the output data is valid.
  • If the top-level function or the top-level loop is pipelined using the -rewind option, Vitis HLS creates an additional output port with the suffix _lwr. When the last write to the FIFO interface completes, the _lwr port goes active-High.

Block-Level I/O Protocols

You can specify block-level I/O protocols on the function or the function return. If the C/C++ code does not return a value, you can still specify the block-level I/O protocol on the function return. If the C/C++ code uses a function return, Vitis HLS creates an output port ap_return for the return value.
TIP: When the function return is specified as an AXI4-Lite interface (s_axilite) all the ports in the block-level interface are grouped into the s_axilite interface. This is a common practice when another device, such as a CPU, is used to configure and control when the block starts and stops operation, and is a requirement of XRT.

The ap_ctrl_hs block-level I/O protocol is the default for the Vivado IP flow. Default Interfaces for Vivado IP Flow shows the resulting RTL ports and behavior when Vitis HLS implements ap_ctrl_hs on a function.

The ap_ctrl_chain control protocol is similar to ap_ctrl_hs but provides an additional input signal ap_continue to apply back pressure. Xilinx recommends using the ap_ctrl_chain block-level I/O protocol when chaining Vitis HLS blocks together and is the default for the Vitis Kernel flow. Refer to Supported Kernel Execution Models for more information on how XRT uses these control protocols.

ap_ctrl_hs

The following figure shows the behavior of the block-level handshake signals created by the ap_ctrl_hs I/O protocol for a non-pipelined design.

Figure 19: Behavior of ap_ctrl_hs Interface

After reset, the following occurs:

  1. The block waits for ap_start to go High before it begins operation.
  2. Output ap_idle goes Low immediately to indicate the design is no longer idle.
  3. The ap_start signal must remain High until ap_ready goes High. Once ap_ready goes High:
    • If ap_start remains High the design will start the next transaction.
    • If ap_start is taken Low, the design will complete the current transaction and halt operation.
  4. Data can be read on the input ports.
  5. Data can be written to the output ports.
    Note: The input and output ports can also specify a port-level I/O protocol that is independent of this block-level I/O protocol. For details, see Port-Level I/O Protocols.
  6. Output ap_done goes High when the block completes operation.
    Note: If there is an ap_return port, the data on this port is valid when ap_done is High. Therefore, the ap_done signal also indicates when the data on output ap_return is valid.
  7. When the design is ready to accept new inputs, the ap_ready signal goes High. Following is additional information about the ap_ready signal:
    • The ap_ready signal is inactive until the design starts operation.
    • In non-pipelined designs, the ap_ready signal is asserted at the same time as ap_done.
    • In pipelined designs, the ap_ready signal might go High at any cycle after ap_start is sampled High. This depends on how the design is pipelined.
    • If the ap_start signal is Low when ap_ready is High, the design executes until ap_done is High and then stops operation.
    • If the ap_start signal is High when ap_ready is High, the next transaction starts immediately, and the design continues to operate.
  8. The ap_idle signal indicates when the design is idle and not operating. Following is additional information about the ap_idle signal:
    • If the ap_start signal is Low when ap_ready is High, the design stops operation, and the ap_idle signal goes High one cycle after ap_done.
    • If the ap_start signal is High when ap_ready is High, the design continues to operate, and the ap_idle signal remains Low.

ap_ctrl_chain

The ap_ctrl_chain block-level I/O protocol is similar to the ap_ctrl_hs protocol but provides an additional input port named ap_continue. An active-High ap_continue signal indicates that the downstream block that consumes the output data is ready for new data inputs. If the downstream block is not able to consume new data inputs, the ap_continue signal is Low, which prevents upstream blocks from generating additional data.

The ap_ready port of the downstream block can directly drive the ap_continue port. Following is additional information about the ap_continue port:

  • If the ap_continue signal is High when ap_done is High, the design continues operating. The behavior of the other block-level I/O signals is identical to those described in the ap_ctrl_hs block-level I/O protocol.
  • If the ap_continue signal is Low when ap_done is High, the design stops operating, the ap_done signal remains High, and data remains valid on the ap_return port if the ap_return port is present.

In the following figure, the first transaction completes, and the second transaction starts immediately because ap_continue is High when ap_done is High. However, the design halts at the end of the second transaction until ap_continue is asserted High.

Figure 20: Behavior of ap_ctrl_chain Interface

ap_ctrl_none

If you specify the ap_ctrl_none block-level I/O protocol, the handshake signal ports (ap_start, ap_idle, ap_ready, and ap_done) are not created. You can use this protocol to create a block without control signals, as used in free-running kernels.

IMPORTANT: If you use the ap_ctrl_none block-level I/O protocol on your design, you must meet at least one of the conditions for C/RTL co-simulation as described in Interface Synthesis Requirements to verify the RTL design. If at least one of these conditions is not met, C/RTL co-simulation halts with the following message:
@E [SIM-345] Cosim only supports the following 'ap_ctrl_none' designs: (1) 
combinational designs; (2) pipelined design with task interval of 1; (3) designs with 
array streaming or hls_stream ports.
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***

Managing Interfaces with SSI Technology Devices

Certain Xilinx devices use stacked silicon interconnect (SSI) technology. In these devices, the total available resources are divided over multiple super logic regions (SLRs). The connections between SLRs use super long line (SSL) routes. SSL routes incur delays costs that are typically greater than standard FPGA routing. To ensure designs operate at maximum performance, use the following guidelines:

  • Register all signals that cross between SLRs at both the SLR output and SLR input.
  • You do not need to register a signal if it enters or exits an SLR via an I/O buffer.
  • Ensure that the logic created by Vitis HLS fits within a single SLR.
Note: When you select an SSI technology device as the target technology, the utilization report includes details on both the SLR usage and the total device usage.

If the logic is contained within a single SLR device, Vitis HLS provides a -register_all_io option to the config_rtl command. If the option is enabled, all inputs and outputs are registered. If disabled, none of the inputs or outputs are registered.