Defining Interfaces
Introduction to Interface Synthesis
The arguments of the top-level function in a Vitis HLSdesign are synthesized into interfaces and
ports that group multiple signals to define the communication protocol between the
HLS design and components external to the design. Vitis HLS defines interfaces automatically, using industry standards
to specify the protocol used. The type of interfaces that Vitis HLS creates depends on the data type and direction of the
parameters of the top-level function, the target flow for the active solution, the
default interface configuration settings as specified by config_interface
, and any specified INTERFACE pragmas or directives.
- The Vivado IP flow which is the default flow for the tool
- The Vitis Kernel flow, which is the bottom-up design flow for the Vitis Application Acceleration Development flow
open_solution -flow_target [vitis | vivado]
- The interface defines channels for data to flow into or out of the HLS design. Data can flow from a variety of sources external to the kernel or IP, such as a host application, an external camera or sensor, or from another kernel or IP implemented on the Xilinx device. The default channels for Vitis kernels are AXI adapters as described in Interfaces for Vitis Kernel Flow.
- The interface defines the port protocol that is used to
control the flow of data through the data channel, defining when the data is
valid and can be read or can be written, as defined in Port-Level I/O Protocols. TIP: These port protocols can be customized in the Vivado IP flow, but are set and cannot be changed in the Vitis kernel flow, in most cases.
- The interface also defines the execution control scheme for the HLS design, specifying the operation of the kernel or IP as pipelined or sequential, as defined in Block-Level Control Protocols.
As described in Designing Efficient Kernels the choice and configuration of interfaces is a key to the success of your design. However, Vitis HLS tries to simplify the process by selecting default interfaces for the target flows. For more information on the defaults used refer to Interfaces for Vivado IP Flow or Interfaces for Vitis Kernel Flow as appropriate to your design.
After synthesis completes you can review the mapping of the software arguments of your C/C++ code to hardware ports or interfaces in the SW I/O Information section of the Synthesis Summary report.
Interfaces for Vitis Kernel Flow
The Vitis kernel flow provides support for compiled kernel objects (.xo) for software control from a host application and by the Xilinx Run Time (XRT). As described in Kernel Properties in the Vitis Unified Software Platform Documentation (UG1416), this flow has very specific interface requirements that Vitis HLS must meet.
-
Memory Paradigm (
m_axi
): the data is accessed by the kernel through memory such as DDR, HBM, PLRAM/BRAM/URAM -
Stream Paradigm (
axis
): the data is streamed into the kernel from another streaming source, such as video processor or another kernel, and can also be streamed out of the kernel. -
Register Paradigm (
s_axilite
): The data is accessed by the kernel through register interfaces and performed by software register reads/writes.
The Vitis kernel flow implements the following interfaces by default:
C-argument type | Paradigm | Interface protocol (I/O/Inout) |
---|---|---|
Scalar(pass by value) | Register | AXI4-Lite (s_axilite ) |
Array | Memory | AXI4 Memory Mapped
(m_axi ) |
Pointer to array | Memory | m_axi |
Pointer to scalar | Register | s_axilite |
Reference | Register | s_axilite |
hls::stream | Stream | AXI4-Stream (axis ) |
m_axi
interface for data transfer.
The pointer to a scalar is implemented using the s_axilite
interface. A scalar value passed as a constant does not need read
access, while a pointer to a scalar value needs both read/write access. The s_axilite
interface implements an additional internal protocol
depending upon the C argument type. This internal implementation can be controlled using
Port-Level I/O Protocols. However, you should not modify the
default port protocols in the Vitis kernel flow unless
necessary. The default execution mode for Vitis
kernel flow is pipelined execution, which enables overlapping execution of a kernel to
improve throughput. This is specified by the ap_ctrl_chain
block control protocol on the s_axilite
interface.
The vadd
function in the following code
provides an example of interface synthesis.
#define VDATA_SIZE 16
typedef struct v_datatype { unsigned int data[VDATA_SIZE]; } v_dt;
extern "C" {
void vadd(const v_dt* in1, // Read-Only Vector 1
const v_dt* in2, // Read-Only Vector 2
v_dt* out_r, // Output Result for Addition
const unsigned int size // Size in integer
) {
unsigned int vSize = ((size - 1) / VDATA_SIZE) + 1;
// Auto-pipeline is going to apply pipeline to this loop
vadd1:
for (int i = 0; i < vSize; i++) {
vadd2:
for (int k = 0; k < VDATA_SIZE; k++) {
out_r[i].data[k] = in1[i].data[k] + in2[i].data[k];
}
}
}
}
The vadd
function includes:
- Two pointer inputs:
in1
andin2
- A pointer:
out_r
that the results are written to - A scalar value
size
With the default interface synthesis settings used by Vitis HLS for the Vitis kernel flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.
The tool creates three types of interface ports on the RTL design to handle the flow of both data and control.
- Clock, Reset, and Interrupt ports:
ap_clk
andap_rst_n
andinterrupt
are added to the kernel. - AXI4-Lite interface:
s_axi_control
interface which contains the scalar arguments likesize
, and manages address offsets for the m_axi interface, and defines the block control protocol.
- AXI4 memory mapped interface:
m_axi_gmem
interface which contains the pointer arguments:in1
,in2
, andout_r
Details of M_AXI Interfaces for Vitis
m_axi
) interfaces allow kernels to read and write data in
global memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of
sharing data across different elements of the accelerated application, such as between
the host and kernel, or between kernels on the accelerator card. The main advantages for
m_axi
interfaces are listed below: - The interface has independent read and write channels
- It supports burst-based accesses with potential performance of ~19 GB/s
- It provides a queue for outstanding transactions
- Understanding Burst Access
- AXI4 memory-mapped
interfaces support high throughput bursts of up to 4K bytes with just a single
address phase. With burst mode transfers, Vitis HLS reads or writes data using a single base address followed
by multiple sequential data samples, which makes this mode capable of higher data
throughput. Burst mode of operation is possible when you use the C
memcpy
function or a pipelinedfor
loop. Refer to Controlling AXI4 Burst Behavior or Optimizing Burst Transfers for more information. - Automatic Port Widening and Port Width Alignment
-
As discussed in Automatic Port Width Resizing, Vitis HLS has the ability to automatically widen a port width to facilitate data transfers and improve burst access, if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in Optimizing Burst Transfers, are also needed for port resizing.
In the Vitis Kernel flow automatic port width resizing is enabled by default with the following configuration commands (notice that one command is specified as bits and the other is specified as bytes):config_interface -m_axi_max_widen_bitwidth 512 config_interface -m_axi_alignment_byte_size 64
- Rules for Offset
-
IMPORTANT: In the Vitis kernel flow the default mode of operation is offset=direct and default_slave_interface=s_axilite and should not be changed.
The correct specification of the offset will let the HLS kernel correctly integrate into the Vitis system. Refer to Offset and Modes of Operation for more information.
- Bundle Interfaces - Performance vs. Resource Utilization
-
By default, Vitis HLS groups function arguments with compatible options into a single
m_axi
interface adapter as described in M_AXI Bundles. Bundling ports into a single interface helps save device resources by eliminating AXI4 logic, which can be necessary when working in congested designs.However, a single interface bundle can limit the performance of the kernel because all the memory transfers have to go through a single interface. The
m_axi
interface has independent READ and WRITE channels, so a single interface can read and write simultaneously, though only at one location. Using multiple bundles lets you increase the bandwidth and throughput of the kernel by creating multiple interfaces to connect to memory banks.
Details of S_AXILITE Interfaces for Vitis
In C++, a function starts to process data when the function is called from a parent function. The function call is pushed onto the stack when called, and removed from the stack when processing is complete to return control to the calling function. This process ensures the parent knows the status of the child.
Since the host and kernel occupy two separate compute spaces in the
Vitis kernel flow, the "stack" is managed by the
Xilinx Run Time (XRT), and communication is
managed through the s_axilite
interface. The kernel is
software controlled through XRT by reading and writing the control registers of an
s_axilite
interface as described in S_AXILITE Control Register Map.The interface provides the following
features:
- Control Protocols
- The block control protocol defines control registers in the
s_axilite
interface that let you set control signals to manage execution and operation of the kernel. - Scalar Arguments
- Scalar inputs on a kernel are typical, and can be thought of as
programming constants or parameters. The host application transfers these values
through the
s_axilite
interface. - Pointers to Scalar Arguments
- Vitis HLS lets you read to
or write from a pointer to a scalar value when assigned to an
s_axilite
interface. Pointers are assigned by default tom_axi
interfaces, so this requires you to manually assign the pointer to thes_axilite
using the INTERFACE pragma or directive:int top(int *a, int *b) { #pragma HLS interface s_axilite port=a
- Rules for Offset
-
Note: The Vitis kernel flow determines the required offsets. Do not specify the
offset
option in that flow. - Rules for Bundle
-
The Vitis kernel flow supports only a single
s_axilite
interface, which means that alls_axilite
interfaces must be bundled together.- When no bundle is specified the tool automatically
creates a default bundle named
Control
. - If for some reason you want to manually specify the
bundle name, you must apply the same bundle to all
s_axilite
interfaces to create a single bundle.
- When no bundle is specified the tool automatically
creates a default bundle named
Details of AXIS Interfaces for Vitis
The AXI4-Stream protocol (AXIS) defines a single uni-directional channel for streaming data in a sequential manner. The AXI4-Stream interfaces can burst an unlimited amount of data, which significantly improves performance. Unlike the AXI4 memory-mapped interface which needs an address to read/write the memory, the AXIS interface simply passes data to another AXIS interface without needing an address, and so uses fewer device resources. Combined, these features make the streaming interface a light-weight high performance interface.
The AXI4-Stream works on an industry-standard ready/valid handshake between a producer and consumer, as shown in the figure below. The data transfer is started once the producer sends the TVALID signal, and the consumer responds by sending the TREADY signal. This handshake of data and control should continue until either TREADY or TVALID are set low, or the producer asserts the TLAST signal indicating it is the last data packet of the transfer.
hls::stream
and not an AXIS interface. You should define the streaming data type using hls::stream<T_data_type>
, and use the ap_axis
struct type to implement the AXIS interface. As
explained in AXI4-Stream Interfaces the ap_axis
struct lets you choose the implementation of the
interface as with or without side-channels:
- AXI4-Stream Interfaces without Side-Channels implements the AXIS interface as a very light-weight interface using fewer resources
- AXI4-Stream Interfaces with Side-Channels implements a full featured interface providing greater control
Interfaces for Vivado IP Flow
- Software Control: The system is controlled through a software application running on an embedded Arm processor or external x86 processor, using drivers to access elements of the hardware design, and reading and writing registers in the hardware to control the execution of IP in the system.
- Self Synchronous: In this mode the IP exposes signals which are used for starting and stopping the kernel. These signals are driven by other IP or other elements of the system design that handles the execution of the IP.
The Vivado IP flow supports memory, stream, and register interface paradigms where each paradigm supports different interface protocols to communicate with the external world, as shown in the following table. Note that while the Vitis kernel flow supports only the AXI4 interface adapters, this flow supports a number of different interface types.
Paradigm | Description | s |
---|---|---|
Memory | Data is accessed by the kernel through memory such as DDR, HBM, PLRAM/BRAM/URAMSupported Interface Protocol | ap_memory , BRAM,
AXI4 Memory Mapped (m_axi ) |
Stream | Supported InterfaceData is streamed into the kernel from another streaming source, such as video processor or another kernel, and can also be streamed out of the kernel. | ap_fifo , AXI4-Stream (axis ) |
Register | Data is accessed by the kernel through register interfaces performed by register reads and writes. | ap_none , ap_hs , ap_ack ,
ap_ovld , ap_vld , and AXI4-Lite adapter
(s_axilite ). |
The default interfaces are defined by the C-argument type in the top-level function, and the default paradigm, as shown in the following table.
C-Argument Type | Supported Paradigms | Default Paradigm | Default Interface Protocol | ||
Input | Output | Inout | |||
Scalar variable (pass by value) | Register | Register | ap_none | N/A | N/A |
Array | Memory, Stream | Memory | ap_memory | ap_memory | ap_memory |
Pointer | Memory, Stream, Register | Register | ap_none | ap_vld | ap_ovld |
Reference | Register | Register | ap_none | ap_vld | ap_vld |
hls::stream |
Stream | Stream | ap_fifo | ap_fifo | N/A |
The default execution mode for Vivado
IP flow is sequential execution, which requires the HLS IP to complete one iteration before
starting the next. This is specified by the ap_ctrl_hs
block control protocol. The control protocol can be changed as specified in Block-Level Control Protocols.
The vadd
function in the following code
provides an example of interface synthesis in the Vivado IP flow.
#define VDATA_SIZE 16
typedef struct v_datatype { unsigned int data[VDATA_SIZE]; } v_dt;
extern "C" {
void vadd(const v_dt* in1, // Read-Only Vector 1
const v_dt* in2, // Read-Only Vector 2
v_dt* out_r, // Output Result for Addition
const unsigned int size // Size in integer
) {
unsigned int vSize = ((size - 1) / VDATA_SIZE) + 1;
// Auto-pipeline is going to apply pipeline to this loop
vadd1:
for (int i = 0; i < vSize; i++) {
vadd2:
for (int k = 0; k < VDATA_SIZE; k++) {
out_r[i].data[k] = in1[i].data[k] + in2[i].data[k];
}
}
}
}
The vadd
function includes:
- Two pointer inputs:
in1
andin2
- A pointer:
out_r
that the results are written to - A scalar value
size
With the default interface synthesis settings used for the Vivado IP flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.
In the default Vivado IP flow the tool creates three types of interface ports on the RTL design to handle the flow of both data and control.
- Clock and Reset ports:
ap_clk
andap_rst
are added to the kernel. - Block-level control protocol: The ap_ctrl interface is implemented as an
s_axilite
interface. - Port-level interface protocols: These are created for each argument in
the top-level function and the function return (if the function returns a value). As
explained in the table above most of the arguments use a port protocol of
ap_none
, and so have no control signals. In thevadd
example above these ports include: in1, in2, and size. However, the out_r_o output port uses theap_vld
protocol and so is associated with the out_r_o_ap_vld signal.
AP_Memory in the Vivado IP Flow
The ap_memory
is the default
interface for the memory paradigm described in the tables above. In the Vivado IP flow it is used for communicating with memory
resources such as BRAM and URAM. The ap_memory
protocol also follows the address and data phase. The protocol initially requests to
read/write the resource and waits until it receives an acknowledgment of the resource
availability. It then initiates the data transfer phase of read/write.
An important consideration for ap_memory
is that it can only perform a single beat data transfer to a
single address, which is different from m_axi
which
can do burst accesses. This makes the ap_memory
a
lightweight protocol, compared to the others.
- Memory Resources: By default Vitis HLS implements a protocol to communicate with a single-port RAM
resource. You can control the implementation of the protocol by specifying the
storage_type
as part of the INTERFACE pragma or directive. Thestorage_type
lets you explicitly define which type of RAM is used, and which RAM ports are created (single-port or dual-port). If nostorage_type
is specified Vitis HLS uses:- A single-port RAM by default.
- A dual-port RAM if it reduces the initiation interval or latency.
M_AXI Interfaces in the Vivado IP Flow
m_axi
) interfaces allow an IP to read and write data in
global memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of
sharing data across multiple IP. The main advantages for m_axi
interfaces are listed below: - The interface has independent read and write channels
- It supports burst-based accesses with potential performance of ~19 GB/s
- It provides a queue for outstanding transactions
- Understanding Burst Access
- AXI4 memory-mapped
interfaces support high throughput bursts of up to 4K bytes with just a single
address phase. With burst mode transfers, Vitis HLS reads or writes data using a single base address followed
by multiple sequential data samples, which makes this mode capable of higher data
throughput. Burst mode of operation is possible when you use the C
memcpy
function or a pipelinedfor
loop. Refer to Controlling AXI4 Burst Behavior or Optimizing Burst Transfers for more information. - Automatic Port Widening and Port Width Alignment
-
As discussed in Automatic Port Width Resizing, Vitis HLS has the ability to automatically widen a port width to facilitate data transfers and improve burst access when all the preconditions needed for bursting are present. In the Vivado IP flow the following configuration settings disable automatic port width resizing by default. To enable this feature you must change these configuration options (notice that one command is specified as bits and the other is specified as bytes):
config_interface -m_axi_max_widen_bitwidth 0 config_interface -m_axi_alignment_byte_size 0
- Specifying Alignment for Vivado IP mode
-
The alignment for an
m_axi
port allows the port to read and write memory according to the specified alignment. Choosing the correct alignment is important as it will impact performance in the best case, and can impact functionality in the worst case.Aligned memory access means that the pointer (or the start address of the data) is a multiple of a type-specific value called the alignment. The alignment is the natural address multiple where the type must be or should be stored (e.g. for performance reasons) on a Memory. For example, Intel 32-bit architecture stores words of 32 bits, each of 4 bytes in the memory. The data is aligned to one-word or 4-byte boundary.
The alignment should be consistent in the system. The alignment is determined when the IP is operating in AXI4 master mode and should be specified, like the Intel 32-bit architecture with 4-byte alignment. When the IP is operating in slave mode the alignment should match the alignment of the master.
- Rules for Offset
-
The default for
m_axi
offset is offset=direct and default_slave_interface=s_axilite. However, in the Vivado IP flow you can change it as described in Offset and Modes of Operation. - Bundle Interfaces - Performance vs. Resource Utilization
-
By default, Vitis HLS groups function arguments with compatible options into a single
m_axi
interface adapter as described in M_AXI Bundles. Bundling ports into a single interface helps save device resources by eliminating AXI4 logic, which can be necessary when working in congested designs.However, a single interface bundle can limit the performance of the IP because all the memory transfers have to go through a single interface. The
m_axi
interface has independent READ and WRITE channels, so a single interface can read and write simultaneously, though only at one location. Using multiple bundles lets you increase performance by creating multiple interfaces to connect to memory banks.
S_AXILITE in the Vivado IP Flow
In the Vivado IP flow, the default
execution control is managed by register reads and writes through an s_axilite
interface using the default ap_ctrl_hs
control protocol. The IP is software controlled
by reading and writing the control registers of an s_axilite
interface as described in S_AXILITE Control Register Map.
The s_axilite
interface provides the
following features:
- Control Protocols
- The block control protocol as specified in Block-Level Control Protocols.
- Scalar Arguments
- Scalar arguments from the top-level function can be mapped to an
s_axilite
interface which creates a register for the value as described in S_AXILITE Control Register Map. The software can perform reads/writes to this register space. - Rules for Offset
- The Vivado IP flow defines the size, or range of addresses assigned to a port based on the data type of the associated C-argument in the top-level function. However, the tool also lets you manually define the offset size as described in S_AXILITE Offset Option.
- Rules for Bundle
- In the Vivado IP flow you
can specify multiple bundles using the
s_axilite
interface, and this will create a separate interface adapter for each bundle you have defined. However, there are some rules related to using multiple bundles that you should be familiar with as explained in S_AXILITE Bundle Rules.
AP_FIFO in the Vivado IP Flow
In the Vivado IP flow, the ap_fifo
interface protocol is the default interface for the
streaming paradigm on the interface for communication with a memory resource FIFO, and
can also be used as a communication channel between different functions inside the IP.
This protocol should only be used if the data is accessed sequentially, and Xilinx
strongly recommends using the hls::stream<data type>
which implements a FIFO.
<data type>
should not be the same as the T_data_type
, which should only be used on the interface.
AXIS Interfaces in the Vivado IP Flow
The AXI4-Stream protocol (axis
) is an alternative for streaming interfaces, and
defines a single uni-directional channel for streaming data in a sequential
manner. Unlike the m_axi
protocol, the AXI4-Stream interfaces can burst an unlimited amount of
data, which significantly improves performance. Unlike the AXI4 memory-mapped interface which needs an address to read/write the
memory, the axis
interface simply passes data to
another axis
interface without needing an address, and
so uses fewer device resources. Combined, these features make the streaming interface a
light-weight high performance interface as described in AXI4-Stream Interfaces.
AXI Adapter Interface Protocols
The AXI4 interfaces supported by
Vitis HLS include the AXI4-Stream interface (axis
), AXI4-Lite (s_axilite
), and
AXI4 master (m_axi
) interfaces. For a complete description of the AXI4 interfaces, including timing and ports, see the Vivado Design Suite: AXI Reference
Guide (UG1037).
- m_axi
- Specify on arrays and pointers (and references in C++) only.
The
m_axi
mode specifies an AXI4 Memory Mapped interface.TIP: You can group bundle arguments into a singlem_axi
interface. - s_axilite
- Specify this protocol on any type of argument except streams.
The
s_axilite
mode specifies an AXI4-Lite slave interface.TIP: You can bundle multiple arguments into a singles_axilite
interface. - axis
- Specify this protocol on input arguments or output arguments
only, not on input/output arguments. The
axis
mode specifies an AXI4-Stream interface.
AXI4 Master Interface
m_axi
) interfaces allow kernels to read and write data in global
memory (DDR, HBM, PLRAM). Memory-mapped interfaces are a convenient way of sharing data across
different elements of the accelerated application, such as between the host and kernel, or
between kernels on the accelerator card. The main advantages for m_axi
interfaces are listed below:- The interface has a separate and independent read and write channels
- It supports burst-based accesses with potential performance of ~19 GB/s
- It provides support for outstanding transactions
In the Vitis Kernel flow the m_axi
interface is assigned by default to pointer and array
arguments. In this flow it supports the following default features:
- Pointer and array arguments are automatically mapped to the
m_axi
interface - The default mode of operation is
offset=slave
in the Vitis flow and should not be changed - All pointer and array arguments are mapped to a single interface bundle to conserve device resources, and ports share read and write access across the time it is active
- The default alignment in the Vitis flow is set to 64 bytes
- The maximum read/write burst length is set to 16 by default
m_axi
interface is specified it has the
following default features:- The default operation mode is offset=off but you can change it as described in Offset and Modes of Operation
- Assigned pointer and array arguments are mapped to a single interface bundle to conserve device resources, and share the interface across the time it is active
- The default alignment in Vivado IP flow is set to 1 byte
- The maximum read/write burst length is set to 16 by default
In both the Vivado IP flow and Vitis kernel flow, the INTERFACE pragma or directive can be used to modify default values as needed.
You can use an AXI4 master interface on array or pointer/reference arguments, which Vitis HLS implements in one of the following modes:
- Individual data transfers
- Burst mode data transfers
With individual data transfers, Vitis HLS reads or writes a single element of data for each address. The following example shows a single read and single write operation. In this example, Vitis HLS generates an address on the AXI interface to read a single data value and an address to write a single data value. The interface transfers one data value per address.
void bus (int *d) {
static int acc = 0;
acc += *d;
*d = acc;
}
With burst mode transfers, Vitis HLS reads
or writes data using a single base address followed by multiple sequential data samples, which
makes this mode capable of higher data throughput. Burst mode of operation is possible when
you use the C memcpy
function or a pipelined for
loop. Refer to Optimizing Burst Transfers for more information.
memcpy
function is only supported for synthesis when used to
transfer data to or from a top-level function argument specified with an AXI4 master interface.The following example shows a copy of burst mode using the memcpy
function. The top-level function argument a
is specified as an AXI4
master interface.
void example(volatile int *a){
//Port a is assigned to an AXI4 master interface
#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=s_axilite port=return
int i;
int buff[50];
//memcpy creates a burst access to memory
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
memcpy((int *)a,buff,50*sizeof(int));
}
When this example is synthesized, it results in the interface shown in the following figure.
The following example shows the same code as the preceding example but uses a
for
loop to copy the data out:
void example(volatile int *a){
#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=s_axilite port=return
//Port a is assigned to an AXI4 master interface
int i;
int buff[50];
//memcpy creates a burst access to memory
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
for(i=0; i < 50; i++){
#pragma HLS PIPELINE
a[i] = buff[i];
}
}
When using a for
loop to implement burst
reads or writes, follow these requirements:
- Pipeline the loop
- Access addresses in increasing order
- Do not place accesses inside a conditional statement
- For nested loops, do not flatten loops, because this inhibits the burst operation
for
loop unless the ports are bundled in different AXI
ports. The following example shows how to perform two reads in burst mode using different AXI
interfaces.In the following example, Vitis HLS
implements the port reads as burst transfers. Port a
is
specified without using the bundle
option and is implemented
in the default AXI interface. Port b
is specified using a
named bundle and is implemented in a separate AXI interface called d2_port
.
void example(volatile int *a, int *b){
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=m_axi depth=50 port=b bundle=d2_port
int i;
int buff[50];
//copy data in
for(i=0; i < 50; i++){
#pragma HLS PIPELINE
buff[i] = a[i] + b[i];
}
...
}
Offset and Modes of Operation
The AXI4 Master interface has a
read/write address channel that can be used to read/write specific addresses. By default the
m_axi
interface starts all read and write operations
from the address 0x00000000
. For example, given the
following code, the design reads data from addresses 0x00000000
to 0x000000C7
(50 32-bit words,
gives 200 bytes), which represents 50 address values. The design then writes data back to
the same addresses.
#include <stdio.h>
#include <string.h>
void example(volatile int *a){
#pragma HLS INTERFACE mode=m_axi port=a depth=50
int i;
int buff[50];
//memcpy creates a burst access to memory
//multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
//memcpy requires a local buffer to store the results of the memory transaction
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
memcpy((int *)a,buff,50*sizeof(int));
}
The tool provides the capability to let the base address be configured statically in the Vivado IP for instance, or dynamically by the application or another IP during run time.
The m_axi
interface can be both a master
initiating transactions, and also a slave interface that receives the data and sends
acknowledgment. Depending on the mode specified with the offset
option of the INTERFACE pragma, an HLS IP can use multiple approaches to
set the base address.
config_interface -m_axi_offset
command provides a global setting
for the offset, that can be overridden for specific m_axi
interfaces using the INTERFACE pragma offset
option.- Master Mode: When acting as a
master interface with different
offset
options, them_axi
interface start address can be either hard-coded or set at run time.offset=off
: Vitis HLS sets a base address for them_axi
interface when the IP is used in the Vivado IP integrator tool. One disadvantage with this approach is that you cannot change the base address during run time. See Customizing AXI4 Master Interfaces in IP Integrator for setting the base address.The following example is synthesized withoffset=off
, the default for the Vivado IP flow.void example(volatile int *a){ #pragma HLS INTERFACE m_axi depth=50 port=a offset=off int i; int buff[50]; //memcpy creates a burst access to memory //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially //memcpy requires a local buffer to store the results of the memory transaction memcpy(buff,(const int*)a,50*sizeof(int)); for(i=0; i < 50; i++){ buff[i] = buff[i] + 100; } memcpy((int *)a,buff,50*sizeof(int)); }
offset=direct
: Vitis HLS generates a port on the IP for setting the address. Note the addition of thea
port as shown in the figure below. This lets you update the address at run time, so you can have onem_axi
interface reading and writing different locations. For example, an HLS module that reads data from an ADC into RAM, and an HLS module that processes that data. Since you can change the address on the module, while one HLS module is processing the initial dataset the other module can be reading more data into different address.void example(volatile int *a){ #pragma HLS INTERFACE m_axi depth=50 port=a offset=direct ... }
- Slave Mode: The slave mode for an
interface is set with
offset=slave
. In this mode the IP will be controlled by the host application, or the micro-controller through thes_axilite
interface. This is the default for the Vitis kernel flow, and can also be used in the Vivado IP flow. Here is the flow of operation:- initially, the Host/CPU will start the IP or kernel using the
block-level control protocol which is mapped to the
s_axilite
adapter. - The host will send the scalars and address offsets for the
m_axi
interfaces through thes_axilite
adapter. - The
m_axi
adapter will read the start address from thes_axilite
adapter and store it in a queue. - The HLS design starts to read the data from the global memory.
- initially, the Host/CPU will start the IP or kernel using the
block-level control protocol which is mapped to the
As shown in the figure below, the HLS design will have both the s_axilite
adapter for the base address, and the m_axi
to perform read and write transfer to the global
memory.
Offset Rules
The following are rules associated with the offset
option:
- Fully Specified Offset: When the user explicitly sets the offset value
the tool uses the specified settings. The user can also set different offset values for
different
m_axi
interfaces in the design, and the tool will use the specified offsets.#pragma HLS INTERFACE s_axilite port=return #pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=out offset=direct #pragma HLS INTERFACE mode=m_axi bundle=BUS_B port=in1 offset=slave #pragma HLS INTERFACE mode=m_axi bundle=BUS_C port=in2 offset=off
- No Offset Specified: If there are no offsets specified in the INTERFACE
pragma, the tool will defer to the setting specified by
config_interface -m_axi_offset
.Note: If the globalm_axi_offset
setting is specified, and the design has ans_axilite
interface, the global setting is ignored andoffset=slave
is assumed.void top(int *a) { #pragma HLS interface mode=m_axi port=a #pragma HLS interface mode=s_axilite port=a }
Controlling the Address Offset in an AXI4 Interface
By default, the AXI4 master interface starts all read and
write operations from address 0x00000000
. For example, given the following code,
the design reads data from addresses 0x00000000
to 0x000000C7
(50 32-bit words, gives 200 bytes), which represents 50 address values. The design then writes
data back to the same addresses.
void example(volatile int *a){
#pragma HLS INTERFACE mode=m_axi depth=50 port=a
#pragma HLS INTERFACE mode=s_axilite port=return bundle=AXILiteS
int i;
int buff[50];
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
memcpy((int *)a,buff,50*sizeof(int));
}
To apply an address offset, use the -offset
option
with the INTERFACE directive, and specify one of the following options:
off
: Does not apply an offset address. This is the default.direct
: Adds a 32-bit port to the design for applying an address offset.slave
: Adds a 32-bit register inside the AXI4-Lite interface for applying an address offset.
In the final RTL, Vitis HLS applies the address offset directly to any read or write address generated by the AXI4 master interface. This allows the design to access any address location in the system.
If you use the slave
option in an AXI interface,
you must use an AXI4-Lite port on the design interface.
Xilinx recommends that you implement the AXI4-Lite interface using the following pragma:
#pragma HLS INTERFACE mode=s_axilite port=return
In addition, if you use the slave
option and
you used several AXI4-Lite interfaces, you must ensure that the AXI master
port offset register is bundled into the correct AXI4-Lite interface.
In the following example, port a
is implemented
as an AXI master interface with an offset and AXI4-Lite interfaces called
AXI_Lite_1
and AXI_Lite_2
:
#pragma HLS INTERFACE mode=m_axi port=a depth=50 offset=slave
#pragma HLS INTERFACE mode=s_axilite port=return bundle=AXI_Lite_1
#pragma HLS INTERFACE mode=s_axilite port=b bundle=AXI_Lite_2
The following INTERFACE directive is required to ensure that the offset register
for port a
is bundled into the AXI4-Lite interface called AXI_Lite_1
:
#pragma HLS INTERFACE mode=s_axilite port=a bundle=AXI_Lite_1
M_AXI Bundles
Vitis HLS groups function arguments
with compatible options into a single m_axi
interface
adapter. Bundling ports into a single interface helps save FPGA resources by eliminating
AXI logic, but it can limit the performance of the kernel because all the memory
transfers have to go through a single interface. The m_axi
interface has independent READ and WRITE channels, so a single
interface can read and write simultaneously, though only at one location. Using multiple
bundles the bandwidth and throughput of the kernel can be increased by creating multiple
interfaces to connect to multiple memory banks.
In the following example all the pointer arguments are grouped into a single
m_axi
adapter using the interface option bundle=BUS_A
, and adds a single s_axilite
adapter for the m_axi
offsets,
the scalar argument size
, and the function return.
extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
const unsigned int *in2, // Read-Only Vector 2
unsigned int *out, // Output Result
int size // Size in integer
) {
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=in2
#pragma HLS INTERFACE mode=s_axilite port=in1
#pragma HLS INTERFACE mode=s_axilite port=in2
#pragma HLS INTERFACE mode=s_axilite port=out
#pragma HLS INTERFACE mode=s_axilite port=size
#pragma HLS INTERFACE mode=s_axilite port=return
You can also choose to bundle function arguments into separate interface
adapters as shown in the following code. Here the argument in2
is grouped into a separate interface adapter with bundle=BUS_B
. This creates a new m_axi
interface adapter for port in2
.
extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
const unsigned int *in2, // Read-Only Vector 2
unsigned int *out, // Output Result
int size // Size in integer
) {
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE mode=m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE mode=m_axi bundle=BUS_B port=in2
#pragma HLS INTERFACE mode=s_axilite port=in1
#pragma HLS INTERFACE mode=s_axilite port=in2
#pragma HLS INTERFACE mode=s_axilite port=out
#pragma HLS INTERFACE mode=s_axilite port=size
#pragma HLS INTERFACE mode=s_axilite port=return
Bundle Rules
The global configuration command config_interface -m_axi_auto_max_ports false
will limit the number of
interface bundles to the minimum required. It will allow the tool to group
compatible ports into a single m_axi
interface.
The default setting for this command is disabled (false), but you can enable it to
maximize bandwidth by creating a separate m_axi
adapter for each port.
With m_axi_auto_max_ports
disabled, the following are some rules for how the tool handles bundles under
different circumstances:
- Default Bundle Name: The tool groups all interface
ports with no bundle name into a single
m_axi
interface port using the tool default namebundle=<default>
, and names the RTL portm_axi_<default>
. The following pragmas:#pragma HLS INTERFACE mode=m_axi port=a depth=50 #pragma HLS INTERFACE mode=m_axi port=a depth=50 #pragma HLS INTERFACE mode=m_axi port=a depth=50
Result in the following messages:
INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
- User-Specified Bundle Names: The tool groups all interface ports with the
same user-specified
bundle=<string>
into the samem_axi
interface port, and names the RTL port the value specified bym_axi_<string>
. Ports withoutbundle
assignments are grouped into the default bundle as described above. The following pragmas:#pragma HLS INTERFACE mode=m_axi port=a depth=50 bundle=BUS_A #pragma HLS INTERFACE mode=m_axi port=b depth=50 #pragma HLS INTERFACE mode=m_axi port=c depth=50
Result in the following messages:
INFO: [RTGEN 206-500] Setting interface mode on port 'example/BUS_A' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
IMPORTANT: If you bundle incompatible interfaces Vitis HLS issues a message and ignores the bundle assignment.
Controlling AXI4 Burst Behavior
An optimal AXI4 interface is one in which the design never stalls while waiting to access the bus, and after bus access is granted, the bus never stalls while waiting for the design to read/write. To create the optimal AXI4 interface, the following options are provided in the INTERFACE pragma or directive to specify the behavior of the bursts and optimize the efficiency of the AXI4 interface. Refer to Optimizing Burst Transfers for more information on burst transfers.
Some of these options use internal storage to buffer data and may have an impact on area and resources:
latency
: Specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request a number of cycles (latency) before the read or write is expected. If this figure is too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be granted but the bus may stall waiting on the design to start the access.max_read_burst_length
: Specifies the maximum number of data values read during a burst transfer.num_read_outstanding
: Specifies how many read requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size:num_read_outstanding
*max_read_burst_length
*word_size
.max_write_burst_length
: Specifies the maximum number of data values written during a burst transfer.num_write_outstanding
: Specifies how many write requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size:num_read_outstanding
*max_read_burst_length
*word_size
The following example can be used to help explain these options:
#pragma HLS interface mode=m_axi port=input offset=slave bundle=gmem0
depth=1024*1024*16/(512/8)
latency=100
num_read_outstanding=32
num_write_outstanding=32
max_read_burst_length=16
max_write_burst_length=16
The interface is specified as having a latency of 100. Vitis HLS seeks to schedule the request for burst access 100 clock
cycles before the design is ready to access the AXI4 bus. To further
improve bus efficiency, the options num_write_outstanding
and num_read_outstanding
ensure the design contains enough buffering to store
up to 32 read and write accesses. This allows the design to continue processing until
the bus requests are serviced. Finally, the options max_read_burst_length
and max_write_burst_length
ensure the maximum burst size is 16 and that the
AXI4 interface does not hold the bus for longer than this.
These options allow the behavior of the AXI4 interface to be optimized for the system in which it will operate. The efficiency of the operation does depend on these values being set accurately.
Automatic Port Width Resizing
In the Vitis tool flow Vitis HLS provides the ability to automatically re-size
m_axi
interface ports to 512-bits to improve burst
access. However, automatic port width resizing only supports standard C data types and
does not support non-aggregate types such as ap_int
,
ap_uint
, struct
,
or array
.
Vitis HLS controls automatic port width resizing using the following two commands:
config_interface -m_axi_max_widen_bitwidth <N>
: Directs the tool to automatically widen bursts on M-AXI interfaces up to the specified bitwidth. The value of <N> must be a power-of-two between 0 and 1024.config_interface -m_axi_alignment_byte_size <N>
: Note that burst widening also requires strong alignment properties. Assume pointers that are mapped tom_axi
interfaces are at least aligned to the provided width in bytes (power of two). This can help automatic burst widening.
config_interface -m_axi_max_widen_bitwidth 512
config_interface -m_axi_alignment_byte_size 64
config_interface -m_axi_max_widen_bitwidth 0
config_interface -m_axi_alignment_byte_size 0
Automatic port width resizing will only re-size the port if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in Optimizing Burst Transfers, are also needed for port resizing. These conditions include:
- Must be a monotonically increasing order of access (both in terms of the memory location being accessed as well as in time). You cannot access a memory location that is in between two previously accessed memory locations- aka no overlap.
- The access pattern from the global memory should be in sequential
order, and with the following additional requirements:
- The sequential accesses need to be on a non-vector type
- The start of the sequential accesses needs to be aligned to the widen word size
- The length of the sequential accesses needs to be divisible by the widen factor
The following code example is used in the calculations that follow:
vadd_pipeline:
for (int i = 0; i < iterations; i++) {
#pragma HLS LOOP_TRIPCOUNT min = c_len/c_n max = c_len/c_n
// Pipelining loops that access only one variable is the ideal way to
// increase the global memory bandwidth.
read_a:
for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
result[x] = a[i * N + x];
}
read_b:
for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
result[x] += b[i * N + x];
}
write_c:
for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
c[i * N + x] = result[x];
}
}
}
}
The width of the automatic optimization for the code above is performed in three steps:
- The tool checks for the number of access patterns in the read_a loop. There is one access during one loop iteration, so the optimization determines the interface bit-width as 32= 32 *1 (bitwidth of the int variable * accesses).
- The tool tries to reach the default max specified by the
config_interface m_axi_max_widen_bitwidth 512
, using the following expression terms:length = (ceil((loop-bound of index inner loops) * (loop-bound of index - outer loops)) * #(of access-patterns))
- In the above code, the outer loop is an imperfect loop so
there will not be burst transfers on the outer-loop. Therefore the length
will only include the inner-loop. Therefore the formula will be shortened
to:
length = (ceil((loop-bound of index inner loops)) * #(of access-patterns))
or: length = ceil(128) *32 = 4096
- In the above code, the outer loop is an imperfect loop so
there will not be burst transfers on the outer-loop. Therefore the length
will only include the inner-loop. Therefore the formula will be shortened
to:
- Is the calculated length a power of 2? If Yes, then the length will be capped
to the width specified by the
m_axi_max_widen_bitwidth
.
There are some pros and cons to using the automatic port width resizing which you should consider when using this feature. This feature improves the read latency from the DDR as the tool is reading a big vector, instead of the data type size. It also adds more resources as it needs to buffer the huge vector and shift the data accordingly to the data path size.
Creating an AXI4 Interface with 32-bit Address
m_axi_addr64
interface
configuration option as follows:- Select .
- In the Solution Settings dialog box, click the General category, and Edit the existing
config_interface
command, or click Add to add one. - In the Edit or Add dialog box, select config_interface, and disable m_axi_addr64.
Customizing AXI4 Master Interfaces in IP Integrator
When you incorporate an HLS RTL design that uses an AXI4 master interface into a design in the Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click, and select Customize Block to customize any of the settings provided. A complete description of the AXI4 parameters is provided in this link in the Vivado Design Suite: AXI Reference Guide (UG1037).
The following figure shows the Re-Customize IP dialog box for the design shown below. This design includes an AXI4-Lite port.
AXI4-Lite Interface
Overview
An HLS IP or kernel can be controlled by a host application,
or embedded processor using the Slave AXI4-Lite interface (s_axilite
) which acts as a system bus for communication
between the processor and the kernel. Using the s_axilite
interface the host or an embedded processor can
start and stop the kernel, and read or write data to it. When Vitis HLS synthesizes the design the
s_axilite
interface is implemented
as an adapter that captures the data that was communicated from the host in
registers on the adapter.
The AXI4-Lite interface performs several functions within a Vivado IP or Vitis kernel:
- It maps a block-level control mechanism which can be used to start and stop the kernel.
- It provides a channel for passing scalar arguments,
pointers to scalar values, function return values, and address offsets
for
m_axi
interfaces from the host to the IP or kernel - For the Vitis
Kernel flow:
- The tool will automatically infer the
s_axilite
interface pragma to provide offsets to pointer arguments assigned tom_axi
interfaces, scalar values, and function return type. - Vitis HLS lets you read to or write from a pointer to a scalar
value when assigned to an
s_axilite
interface. Pointers are assigned by default tom_axi
interfaces, so this requires you to manually assign the pointer to thes_axilite
using the INTERFACE pragma or directive:int top(int *a, int *b) { #pragma HLS interface s_axilite port=a
- Bundle: Do not specify the
bundle
option for thes_axilite
adapter in the Vitis Kernel flow. The tool will create a singles_axilite
interface that will serve for the whole design.IMPORTANT: HLS will return an error if multiple bundles are specified for the Vitis Kernel flow. - Offset: The tool will automatically choose the offsets for the interface. Do not specify any offsets in this flow.
- The tool will automatically infer the
- For the Vivado IP
flow:
- This flow will not use the
s_axilite
interface by default. - To use the
s_axilite
as a communication channel for scalar arguments, pointers to scalar values, offset tom_axi
pointer address, and function return type, you must manually specify the INTERFACE pragma or directive. - Bundle: This flow supports multiple
s_axilite
interfaces, specified by bundle. Refer to S_AXILITE Bundle Rules for more information. - Offset: By default the tool will place the arguments in a sequential order starting from 0x10 in the control register map. Refer to S_AXILITE Offset Option for additional details.
- This flow will not use the
S_AXILITE Example
The following example shows how Vitis HLS implements multiple arguments, including the function
return, as an s_axilite
interface. Because each
pragma uses the same name for the bundle
option,
each of the ports is grouped into a single interface.
void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=c bundle=BUS_A
#pragma HLS INTERFACE mode=ap_vld port=b
*c += *a + *b;
}
bundle
option, Vitis HLS groups all arguments into a single s_axilite
bundle and automatically names the
port.- Host application running on an x86 or embedded processor interacting with the IP or kernel
- SAXI Lite Adapter: The INTERFACE pragma implements an
s_axilite
adapter. The adapter has two primary functions: implementing the interface protocol to communicate with the host, and providing a Control Register Map to the IP or kernel. - The HLS engine or function that implements the design logic
By default, Vitis HLS
automatically assigns the address for each port that is grouped into an s_axilite
interface. The size, or range of
addresses assigned to a port is dependent on the argument data type and the port
protocol used, as described below. You can also explicitly define the address
using the offset
option as discussed in S_AXILITE Offset Option.
- Port a: By default, is implemented as
ap_none
. 1-word for the data signal is assigned and only 3 bits are used as the argument data type ischar
. Remaining bits are unused. - Port b: is implemented as
ap_vld
defined by the INTERFACE pragma in the example. The corresponding control register is of size 2 bytes (16-bits) and is divided into two sections as follows:- (0x1c) Control signal : 1-word for the control signal is assigned.
- (0x18) Data signal: 1-word for the data signal is
assigned and only 3 bits are used as the argument data type is
char
. Remaining bits are unused.
- Port c: By default, is implemented as
ap_ovld
as an output. The corresponding control register is of size 4 bytes (32 bits) and is divided into three sections:- (0x20) Data signal of
c_i
: 1-word for the input data signal is assigned, and only 3 bits are used as the argument data type ischar
, the rest are not used. - (0x24) Reserved Space
- (0x28) Data signal of
c_o
: 1-word for the output data signal is assigned. - (0x2c) Control signal of
c_o
: 1-word for control signalap_ovld
is assigned and only 3 bits are used as the argument data type ischar
. Remaining bits are unused.
- (0x20) Data signal of
In operation the host application will initially start the kernel by writing into the Control address space (0x00). The host/CPU completes the initial setup by writing into the other address spaces which are associated with the various function arguments as defined in the example.
The control signal for port b is asserted and only then can the HLS
engine read ports a and b (port a is ap_none
and
does not have a control signal). Until that time the design is stalled and waiting
for the valid register to be set for port b.
Each time port b is read by the HLS engine the input valid register is cleared and the register resets to logic
0.
After the HLS engine finishes its computation, the output value on
port C is stored in the control register and the corresponding valid bit is set for the host to read. After the
host reads the data, the HLS engine will write the ap_done
bit in the Control register (0x00) to mark the end of the
IP computation.
Vitis HLS reports the assigned
addresses in the S_AXILITE Control Register Map, and also
provides them in C Driver Files to aid in your software development. Using the s_axilite
interface, you can output C driver files
for use with code running on an embedded or x86 processor using provided C
application program interface (API) functions, to let you control the hardware
from your software.
S_AXILITE Control Register Map
s_axilite
interface. The register map, which
is added to the generated RTL files, can be divided into two sections:- Block-level control signals
- Function arguments mapped into the
s_axilite
interface
s_axilite
interface by default. To change the default block
protocol, specify the interface pragma as
follows:#pragma HLS INTERFACE mode=ap_ctrl_hs port=return
ap_ctrl
, as seen in Interfaces for Vivado IP Flow. However, if you are using an s_axilite
interface in your IP, you can also
assign the block control protocol to that interface using the following
INTERFACE pragmas, as an
example:#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=ap_ctrl_hs port=return bundle=BUS_A
In the Control Register Map, Vitis HLS reserves addresses 0x00
through 0x0C
for the
block-level protocol signals and interrupt controls, as shown below:
Address | Description |
---|---|
0x00 | Control signals |
0x04 | Global Interrupt Enable Register |
0x08 | IP Interrupt Enable Register (Read/Write) |
0x0c | IP Interrupt Status Register (Read/TOW) |
The Control signals (0X00) contains ap_start
, ap_done
, ap_ready
, and ap_idle
; and in the case of ap_ctrl_chain
the block protocol also contains ap_continue
. These are the block-level interface signals which are
accessed through the s_axilite
adapter.
To start the block operation theap_start
bit in the Control register must be set to 1. The
HLS engine will then proceed and read any inputs grouped into the AXI4-Lite slave interface from the register in the
interface.
When the block completes the operation, theap_done
,ap_idle
andap_ready
registers will be set by the hardware output ports and the results for any
output ports grouped into the s_axilite
interface read from the appropriate register.
For function arguments, Vitis HLS automatically assigns the address for each
argument or port that is assigned to the s_axilite
interface. The tool will assign each port an
offset starting from 0x10
, the lower
addresses being reserved for control signals. The size, or range of
addresses assigned to a port is dependent on the argument data type and the
port protocol used.
Because the variables grouped into an AXI4-Lite interface are function arguments which do not have
a default value in the C code, none of the argument registers in the s_axilite
interface can be assigned a
default value. The registers can be implemented with a reset using the
config_rtl
command, but they cannot
be assigned any other default value.
The Control Register Map generated by Vitis HLS for the ap_ctrl_hs
block control protocol is provided below:
//------------------------Address Info-------------------
// 0x00 : Control signals
// bit 0 - ap_start (Read/Write/COH)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read)
// bit 7 - auto_restart (Read/Write)
// others - reserved
// 0x04 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
// bit 0 - enable ap_done interrupt (Read/Write)
// bit 1 - enable ap_ready interrupt (Read/Write)
// others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
// bit 0 - ap_done (COR/TOW)
// bit 1 - ap_ready (COR/TOW)
// others - reserved
// 0x10 : Data signal of a
// bit 7~0 - a[7:0] (Read/Write)
// others - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
// bit 7~0 - b[7:0] (Read/Write)
// others - reserved
// : Control signal of b
// bit 0 - b_ap_vld (Read/Write/SC)
// others - reserved
// 0x20 : Data signal of c_i
// bit 7~0 - c_i[7:0] (Read/Write)
// others - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
// bit 7~0 - c_o[7:0] (Read)
// others - reserved
// 0x2c : Control signal of c_o
// bit 0 - c_o_ap_vld (Read/COR)
// others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
S_AXILITE and Port-Level Protocols
s_axilite
adapter as seen
in S_AXILITE Example. In the Vivado IP flow, you can assign port-level
I/O protocols to the individual ports and signals bundled into an s_axilite
interface. In the Vitis kernel flow, changing the default
port-level I/O protocols is not recommended unless necessary. The tool
assigns a default port protocol to a port depending on the type and
direction of the argument associated with it. The port can contain one or
more of the following:- Data signal for the argument
- Valid signal (
ap_vld
/ap_ovld
) to indicate when the data can be read - Acknowledge signal (
ap_ack
) to indicate when the data has been read
The default port protocol assignments for various argument types are as follows:
Argument Type | Default | Supported |
---|---|---|
scalar | ap_none |
ap_ack and ap_vld can also be used |
Pointers/References | ||
Inputs | ap_none |
ap_ack and ap_vld |
Outputs | ap_vld |
ap_none , ap_ack , and ap_ovld can also be used |
Inouts | ap_ovld |
ap_none , ap_ack , and ap_vld are also supported |
ap_memory
. The bram
port
protocol is not supported for arrays in an s_axilite
interface.The S_AXILITE Example groups
port b
into the s_axilite
interface and specifies port b
as using the ap_vld
protocol with INTERFACE pragmas. As a result, the
s_axilite
adapter contains a
register for the port b
data, and a
register for the port b
input valid
signal.
If the input valid register is not set to logic 1, the data in
the b
data register is not considered
valid, and the design stalls and waits for the valid register to be set.
Each time port b
is read, Vitis HLS automatically clears the input
valid register and resets the register to logic 0.
s_axilite
interface.S_AXILITE Bundle Rules
In the S_AXILITE Example all
the function arguments are grouped into a single s_axilite
interface adapter specified by the bundle=BUS_A
option in the INTERFACE pragma.
The bundle
option simply lets you group
ports together into one interface.
s_axi_control
by the tool. So you should not
specify the bundle
option in that flow, or
you will probably encounter an error during synthesis. However, in the
Vivado IP flow you can specify
multiple bundles using the s_axilite
interface, and this will create a separate interface adapter for each bundle
you have defined. The following example shows
this:void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE mode=s_axilite port=c bundle=OUT
#pragma HLS INTERFACE mode=s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE mode=ap_vld port=b
*c += *a + *b;
}
After synthesis completes, the Synthesis Summary report
provides feedback regarding the number of s_axilite
adapters generated. The SW-to-HW Mapping section of the
report contains the HW info showing the control register offset and the
address range for each port.
However, there are some rules related to using bundles with
the s_axilite
interface.
- Default Bundle Names: This rule explicitly groups all
interface ports with no bundle name into the same AXI4-Lite interface port, uses the
tool default bundle name, and names the RTL port
s_axi_<default>
, typicallys_axi_control
.In this example all ports are mapped to the default bundle:void top(char *a, char *b, char *c) { #pragma HLS INTERFACE mode=s_axilite port=a #pragma HLS INTERFACE mode=s_axilite port=b #pragma HLS INTERFACE mode=s_axilite port=c *c += *a + *b; }
- User-Specified Bundle Names: This rule explicitly
groups all interface ports with the same
bundle
name into the same AXI4-Lite interface port, and names the RTL port the value specified bys_axi_<string>
.The following example results in interfaces nameds_axi_BUS_A
,s_axi_BUS_B
, ands_axi_OUT
:void example(char *a, char *b, char *c) { #pragma HLS INTERFACE mode=s_axilite port=a bundle=BUS_A #pragma HLS INTERFACE mode=s_axilite port=b bundle=BUS_B #pragma HLS INTERFACE mode=s_axilite port=c bundle=OUT #pragma HLS INTERFACE mode=s_axilite port=return bundle=OUT #pragma HLS INTERFACE mode=ap_vld port=b *c += *a + *b; }
- Partially Specified Bundle Names: If you specify
bundle
names for some arguments, but leave other arguments unassigned, then the tool will bundle the arguments as follows:- Group all ports into the specified bundles as indicated by the INTERFACE pragmas.
- Group any ports without bundle assignments into a default named bundle. The default name can either be the standard tool default, or an alternative default name if the tool default has already been specified by the user.
In the following example the user has specified
bundle=control
, which is the tool default name. In this case, port c will be assigned tos_axi_control
as specified by the user, and the remaining ports will be bundled unders_axi_control_r
, which is an alternative default name used by the tool.void top(char *a, char *b, char *c) { #pragma HLS INTERFACE mode=s_axilite port=a #pragma HLS INTERFACE mode=s_axilite port=b #pragma HLS INTERFACE mode=s_axilite port=c bundle=control }
S_AXILITE Offset Option
offset
option in that flow. In the Vivado IP flow,
Vitis HLS defines the size, or
range of addresses assigned to a port in the S_AXILITE Control Register Map depending on the argument data type and
the port protocol used. However, the INTERFACE pragma also contains an
offset
option that lets you specify
the address offset in the AXI4-Lite
interface.
When specifying the offset for your argument, you must
consider the size of your data and reserve some extra for the port control
protocol. The range of addresses you reserve should be based on a 32-bit
word. You should reserve enough 32-bit words to fit your argument data type,
and add reserve one additional word for the control protocol, even for
ap_none
.
ap_memory
protocol for arrays,
you do not need to reserve the extra word for the control protocol. In this
case, simply reserve enough 32-bit words to fit your argument data type. For example, to reserve enough space for a double you need to reserve 2 32-bit words for the 64-bit data type, and then reserve an additional 32-bit word for the control protocol. So you need to reserve a total of 3 32-bit words, or 96 bits. If your argument offset starts at 0x020, then the next available offset would begin at 0x02c, in order to reserve the required address range for your argument.
If you make a mistake in setting the offset of your arguments, by not reserving enough address range to fit your data type and the control protocol, Vitis HLS will recognize the error, will warn you of the issue, and will recover by moving your misplaced argument register to the end of the Control Register Map. This will allow your build to proceed, but may not work with your host application or driver if they were written to your specified offset.
C Driver Files
When an AXI4-Lite slave interface is implemented, a set of C driver files are automatically created. These C driver files provide a set of APIs that can be integrated into any software running on a CPU and used to communicate with the device via the AXI4-Lite slave interface.
The C driver files are created when the design is packaged as IP in the IP catalog.
Driver files are created for standalone and Linux modes. In standalone mode the drivers are used in the same way as any other Xilinx standalone drivers. In Linux mode, copy all the C files (.c) and header files (.h) files into the software project.
The driver files and API functions derive their name from the top-level function
for synthesis. In the above example, the top-level function is called “example”. If
the top-level function was named “DUT” the name “example” would be replaced by “DUT”
in the following description. The driver files are created in the packaged IP
(located in the impl
directory inside the
solution).
File Path | Usage Mode | Description |
---|---|---|
data/example.mdd | Standalone | Driver definition file. |
data/example.tcl | Standalone | Used by SDK to integrate the software into an SDK project. |
src/xexample_hw.h | Both | Defines address offsets for all internal registers. |
src/xexample.h | Both | API definitions |
src/xexample.c | Both | Standard API implementations |
src/xexample_sinit.c | Standalone | Initialization API implementations |
src/xexample_linux.c | Linux | Initialization API implementations |
src/Makefile | Standalone | Makefile |
In file xexample.h, two structs are defined.
- XExample_Config
- This is used to hold the configuration information (base address of each AXI4-Lite slave interface) of the IP instance.
- XExample
- This is used to hold the IP instance pointer. Most APIs take this instance pointer as the first argument.
The standard API implementations are provided in files xexample.c, xexample_sinit.c, xexample_linux.c, and provide functions to perform the following operations.
- Initialize the device
- Control the device and query its status
- Read/write to the registers
- Set up, monitor, and control the interrupts
Refer to Vitis HLS C Driver Reference for a description of the API functions provided in the C driver files.
C Driver Files and Float Types
C driver files always use a data 32-bit unsigned integer (U32) for data
transfers. In the following example, the function uses float type arguments
a
and r1
. It sets the value
of a
and returns the value of r1
:
float caculate(float a, float *r1)
{
#pragma HLS INTERFACE mode=ap_vld register port=r1
#pragma HLS INTERFACE mode=s_axilite port=a
#pragma HLS INTERFACE mode=s_axilite port=r1
#pragma HLS INTERFACE mode=s_axilite port=return
*r1 = 0.5f*a;
return (a>0);
}
After synthesis, Vitis HLS groups all ports into the default AXI4-Lite interface and creates C driver files. However, as shown in the following example, the driver files use type U32:
// API to set the value of A
void XCaculate_SetA(XCaculate *InstancePtr, u32 Data) {
Xil_AssertVoid(InstancePtr != NULL);
Xil_AssertVoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);
XCaculate_WriteReg(InstancePtr->Hls_periph_bus_BaseAddress,
XCACULATE_HLS_PERIPH_BUS_ADDR_A_DATA, Data);
}
// API to get the value of R1
u32 XCaculate_GetR1(XCaculate *InstancePtr) {
u32 Data;
Xil_AssertNonvoid(InstancePtr != NULL);
Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);
Data = XCaculate_ReadReg(InstancePtr->Hls_periph_bus_BaseAddress,
XCACULATE_HLS_PERIPH_BUS_ADDR_R1_DATA);
return Data;
}
If these functions work directly with float types, the write and read values are not consistent with expected float type. When using these functions in software, you can use the following casts in the code:
float a=3.0f,r1;
u32 ua,ur1;
// cast float “a” to type U32
XCaculate_SetA(&calculate,*((u32*)&a));
ur1=XCaculate_GetR1(&caculate);
// cast return type U32 to float type for “r1”
r1=*((float*)&ur1);
Controlling Hardware
ap_ctrl_hs
block
control protocol, which is the default for the Vivado IP flow. Refer to Block-Level Control Protocols for more information and a description
of the ap_ctrl_chain
protocol which is the
default for the Vitis kernel flow. In this example, the hardware header file xexample_hw.h
provides a complete list of the memory
mapped locations for the ports grouped into the AXI4-Lite slave interface, as described in S_AXILITE Control Register Map.
// 0x00 : Control signals
// bit 0 - ap_start (Read/Write/SC)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read)
// bit 7 - auto_restart (Read/Write)
// others - reserved
// 0x04 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
// bit 0 - Channel 0 (ap_done)
// bit 1 - Channel 1 (ap_ready)
// 0x0c : IP Interrupt Status Register (Read/TOW)
// bit 0 - Channel 0 (ap_done)
// others - reserved
// 0x10 : Data signal of a
// bit 7~0 - a[7:0] (Read/Write)
// others - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
// bit 7~0 - b[7:0] (Read/Write)
// others - reserved
// 0x1c : reserved
// 0x20 : Data signal of c_i
// bit 7~0 - c_i[7:0] (Read/Write)
// others - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
// bit 7~0 - c_o[7:0] (Read)
// others - reserved
// 0x2c : Control signal of c_o
// bit 0 - c_o_ap_vld (Read/COR)
// others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on
Handshake)
To correctly program the registers in the s_axilite
interface, you must understand how the hardware
ports operate with the default port protocols, or the custom protocols as
described in S_AXILITE and Port-Level Protocols.
For example, to start the block operation the ap_start
register must be set to 1. The
device will then proceed and read any inputs grouped into the AXI4-Lite slave interface from the
register in the interface. When the block completes operation, the ap_done
, ap_idle
and ap_ready
registers will be set by the hardware output ports and the results for any
output ports grouped into the AXI4-Lite
slave interface read from the appropriate register.
The implementation of function argument c
in the example highlights the importance of some
understanding how the hardware ports operate. Function argument c
is both read and written to, and is
therefore implemented as separate input and output ports c_i
and c_o
, as explained in S_AXILITE Example.
The first recommended flow for programing the s_axilite
interface is for a one-time
execution of the function:
- Use the interrupt function standard API implementations provided in the C Driver Files to determine how you want the interrupt to operate.
- Load the register values for the block input ports. In
the above example this is performed using API functions
XExample_Set_a
,XExample_Set_b
, andXExample_Set_c_i
. - Set the
ap_start
bit to 1 usingXExample_Start
to start executing the function. This register is self-clearing as noted in the header file above. After one transaction, the block will suspend operation. - Allow the function to execute. Address any interrupts which are generated.
- Read the output registers. In the above example this is
performed using API functions
XExample_Get_c_o_vld
, to confirm the data is valid, andXExample_Get_c_o
.Note: The registers in thes_axilite
interface obey the same I/O protocol as the ports. In this case, the output valid is set to logic 1 to indicate if the data is valid. - Repeat for the next transaction.
The second recommended flow is for continuous execution of the block. In this mode, the input ports included in the AXI4-Lite interface should only be ports which perform configuration. The block will typically run much faster than a CPU. If the block must wait for inputs, the block will spend most of its time waiting:
- Use the interrupt function to determine how you wish the interrupt to operate.
- Load the register values for the block input ports. In
the above example this is performed using API functions
XExample_Set_a
,XExample_Set_a
andXExample_Set_c_i
. - Set the auto-start function using API
XExample_EnableAutoRestart
. - Allow the function to execute. The individual port I/O protocols will synchronize the data being processed through the block.
- Address any interrupts which are generated. The output registers could be accessed during this operation but the data may change often.
- Use the API function
XExample_DisableAutoRestart
to prevent any more executions. - Read the output registers. In the above example this is
performed using API functions
XExample_Get_c_o
andXExample_Set_c_o_vld
.
Controlling Software
The API functions can be used in the software running on the CPU to control the hardware block. An overview of the process is:
- Create an instance of the hardware
- Look Up the device configuration
- Initialize the device
- Set the input parameters of the HLS block
- Start the device and read the results
An example application is shown below.
#include "xexample.h" // Device driver for HLS HW block
#include "xparameters.h"
// HLS HW instance
XExample HlsExample;
XExample_Config *ExamplePtr
int main() {
int res_hw;
// Look Up the device configuration
ExamplePtr = XExample_LookupConfig(XPAR_XEXAMPLE_0_DEVICE_ID);
if (!ExamplePtr) {
print("ERROR: Lookup of accelerator configuration failed.\n\r");
return XST_FAILURE;
}
// Initialize the Device
status = XExample_CfgInitialize(&HlsExample, ExamplePtr);
if (status != XST_SUCCESS) {
print("ERROR: Could not initialize accelerator.\n\r");
exit(-1);
}
//Set the input parameters of the HLS block
XExample_Set_a(&HlsExample, 42);
XExample_Set_b(&HlsExample, 12);
XExample_Set_c_i(&HlsExample, 1);
// Start the device and read the results
XExample_Start(&HlsExample);
do {
res_hw = XExample_Get_c_o(&HlsExample);
} while (XExample_Get_c_o(&HlsExample) == 0); // wait for valid data output
print("Detected HLS peripheral complete. Result received.\n\r");
}
Control Clock and Reset in AXI4-Lite Interfaces
By default, Vitis HLS uses the same
clock for the AXI4-Lite
interface and the synthesized design. Vitis HLS connects all registers in the AXI4-Lite interface to the clock used for
the synthesized logic (ap_clk
).
Optionally, you can use the INTERFACE directive clock
option to specify a
separate clock for each AXI4-Lite port. When connecting the clock
to the AXI4-Lite interface,
you must use the following protocols:
- AXI4-Lite
interface clock must be synchronous to the clock
used for the synthesized logic (
ap_clk
). That is, both clocks must be derived from the same master generator clock. - AXI4-Lite
interface clock frequency must be equal to or less
than the frequency of the clock used for the
synthesized logic (
ap_clk
).
If you use the clock
option with the INTERFACE directive, you only need to specify
the clock
option on one
function argument in each bundle. Vitis HLS implements all other function
arguments in the bundle with the same clock and reset. Vitis HLS names the
generated reset signal with the prefix ap_rst_
followed by the clock name. The
generated reset signal is active-Low independent of the config_rtl
command.
The following example shows how Vitis HLS groups function arguments
a
and b
into an AXI4-Lite port with a
clock named AXI_clk1
and an
associated reset port.
// Default AXI-Lite interface implemented with independent clock called AXI_clk1
#pragma HLS interface mode=s_axilite port=a clock=AXI_clk1
#pragma HLS interface mode=s_axilite port=b
In the following example, Vitis HLS groups function arguments
c
and d
into AXI4-Lite port CTRL1
with a separate clock
called AXI_clk2
and an
associated reset port.
// CTRL1 AXI-Lite bundle implemented with a separate clock (called AXI_clk2)
#pragma HLS interface mode=s_axilite port=c bundle=CTRL1 clock=AXI_clk2
#pragma HLS interface mode=s_axilite port=d bundle=CTRL1
Customizing AXI4-Lite Slave Interfaces in IP Integrator
When an HLS RTL design using an AXI4-Lite slave interface is incorporated into a design in Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click with the mouse button and select Customize Block.
The address width is by default configured to the minimum required size. Modify this to connect to blocks with address sizes less than 32-bit.
AXI4-Stream Interfaces
An AXI4-Stream interface can be applied to any input argument and any array or pointer output argument. Because an AXI4-Stream interface transfers data in a sequential streaming manner, it cannot be used with arguments that are both read and written. In terms of data layout, the data type of the AXI4-Stream is aligned to the next byte. For example, if the size of the data type is 12 bits, it will be extended to 16 bits. Depending on whether a signed/unsigned interface is selected, the extended bits are either sign-extended or zero-extended. If the stream data type is a user-defined struct, the struct is aggregated and aligned to the size of the largest data element within the struct.
The following code examples show how the packed alignment depends on your struct type. If the struct contains only char type, as shown in the following example, then it will be packed with alignment of one byte. Total size of the struct will be two bytes:
struct A {
char foo;
char bar;
};
However, if the struct has elements with different data types, as shown
below, then it will be packed and aligned to the size of the largest data element, or four
bytes in this example. Element bar
will be padded with three
bytes resulting in a total size of eight bytes for the struct:
struct A {
int foo;
char bar;
};
How AXI4-Stream is Implemented
The AXI4-Stream interface is implemented as a struct type in Vitis HLS and has the following signature (defined in ap_axi_sdata.h):
template <typename T, size_t WUser, size_t WId, size_t WDest> struct axis { .. };
Where:
T
- Stream data type
WUser
- Width of the TUSER signal
WId
- Width of the TID signal
WDest
- Width of the TDest signal
When the stream data type (T
) are simple
integer types, there are two predefined types of AXI4-Stream implementations available:
- A signed implementation of the AXI4-Stream class (or more simply
ap_axis<Wdata, WUser, WId, WDest>
)hls::axis<ap_int<WData>, WUser, WId, WDest>
- An unsigned implementation of the AXI4-Stream class (or more simply
ap_axiu<WData, WUser, WId, WDest>
)hls::axis<ap_uint<WData>, WUser, WId, WDest>
The value specified for the WUser
, WId
, and WDest
template
parameters controls the usage of side-channel signals in the AXI4-Stream interface.
When the hls::axis
class is used, the
generated RTL will typically contain the actual data signal TDATA, and
the following additional signals: TVALID, TREADY,
TKEEP, TSTRB, TLAST,
TUSER, TID, and TDEST.
TVALID, TREADY, and TLAST are necessary control signals for the AXI4-Stream protocol. TKEEP, TSTRB, TUSER, TID, and TDEST signals are special signals that can be used to pass around additional bookkeeping data.
WUser
, WId
, and WDest
are set to 0, the generated RTL will not include the
TUSER, TID, and TDEST signals in
the interface.How AXI4-Stream Works
AXI4-Stream is a protocol designed for transporting arbitrary unidirectional data. In an AXI4-Stream, TDATA width of bits is transferred per clock cycle. The transfer is started once the producer sends the TVALID signal and the consumer responds by sending the TREADY signal (once it has consumed the initial TDATA). At this point, the producer will start sending TDATA and TLAST (TUSER if needed to carry additional user-defined sideband data). TLAST signals the last byte of the stream. So the consumer keeps consuming the incoming TDATA until TLAST is asserted.
AXI4-Stream has additional optional features like sending positional data with TKEEP and TSTRB ports which makes it possible to multiplex both the data position and data itself on the TDATA signal. Using the TID and TDIST signals, you can route streams as these fields roughly corresponds to stream identifier and stream destination identifier. Refer to Vivado Design Suite: AXI Reference Guide (UG1037) or the AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A) for more information.
Registered AXI4-Stream Interfaces
As a default, AXI4-Stream interfaces are always implemented as registered interfaces to ensure that no combinational feedback paths are created when multiple HLS IP blocks with AXI4-Stream interfaces are integrated into a larger design. For AXI4-Stream interfaces, four types of register modes are provided to control how the interface registers are implemented:
- Forward
- Only the TDATA and TVALID signals are registered.
- Reverse
- Only the TREADY signal is registered.
- Both
- All signals (TDATA, TREADY, and TVALID) are registered. This is the default.
- Off
- None of the port signals are registered.
The AXI4-Stream side-channel signals are considered to be data signals and are registered whenever TDATA is registered.
There are two basic methods to use an AXI4-Stream in your design:
- Use an AXI4-Stream without side-channels.
- Use an AXI4-Stream with side-channels.
This second use model provides additional functionality, allowing the optional side-channels which are part of the AXI4-Stream standard, to be used directly in your C/C++ code.
AXI4-Stream Interfaces without Side-Channels
An AXI4-Stream is used without
side-channels when the function argument, ap_axis
or
ap_axiu
data type, does not contain any AXI4 side-channel elements (that is, when the WUser
, WId
, and WDest
parameters are set to 0). In the following example,
both interfaces are implemented using an AXI4-Stream:
#include "ap_axi_sdata.h"
#include "hls_stream.h"
typedef ap_axiu<32, 0, 0, 0> trans_pkt;
void example(hls::stream< trans_pkt > &A, hls::stream< trans_pkt > &B)
{
#pragma HLS INTERFACE mode=axis port=A
#pragma HLS INTERFACE mode=axis port=B
trans_pkt tmp;
A.read(tmp);
tmp.data += 5;
B.write(tmp);
}
After synthesis, both arguments are implemented with a data port (TDATA) and the standard AXI4-Stream protocol ports, TVALID, TREADY, TKEEP, TLAST, and TSTRB, as shown in the following figure.
hls::stream
object with a data type other than ap_axis
or ap_axiu
, the
tool will infer an AXI4-Stream interface without the
TLAST signal, or any of the side-channel signals.
This implementation of the AXI4-Stream interface
consumes fewer device resources, but offers no visibility into when the stream is
ending.Multiple variables can be combined into the same AXI4-Stream interface by using a struct, which is aggregated by Vitis HLS by default. Aggregating the elements of a struct into a single wide-vector, allows all elements of the struct to be implemented in the same AXI4-Stream interface.
AXI4-Stream Interfaces with Side-Channels
The following example shows how the side-channels can be used directly in the C/C++ code and implemented on the interface. The code uses #include "ap_axi_sdata.h" to provide an API to handle the side-channels of the AXI4-Stream interface. In the following example a signed 32-bit data type is used:
#include "ap_axi_sdata.h"
#include "ap_int.h"
#include "hls_stream.h"
#define DWIDTH 32
typedef ap_axiu<DWIDTH, 1, 1, 1> trans_pkt;
extern "C"{
void krnl_stream_vmult(hls::stream<trans_pkt> &A,
hls::stream<trans_pkt> &B) {
#pragma HLS INTERFACE mode=axis port=A
#pragma HLS INTERFACE mode=axis port=B
#pragma HLS INTERFACE mode=s_axilite port=return bundle=control
bool eos = false;
vmult: do {
#pragma HLS PIPELINE II=1
trans_pkt t2 = A.read();
// Packet for Output
trans_pkt t_out;
// Reading data from input packet
ap_uint<DWIDTH> in2 = t2.data;
ap_uint<DWIDTH> tmpOut = in2 * 5;
// Setting data and configuration to output packet
t_out.data = tmpOut;
t_out.last = t2.last;
t_out.keep = -1; //Enabling all bytes
// Writing packet to output stream
B.write(t_out);
if (t2.last) {
eos = true;
}
} while (eos == false);
}
}
After synthesis, both the A and B arguments are implemented with data ports, the standard AXI4-Stream protocol ports, TVALID and TREADY and all of the optional ports described in the struct.
Coding Style for Array to Stream
You should perform all the operations on temp variables. Read the input stream, process the temp variable, and write the output stream, as shown in the example below. This approach lets you preserve the sequential reading and writing of the stream of data, rather than attempting multiple or random reads or writes.
struct A {
short varA;
int varB;
};
void dut(A in[N], A out[N], bool flag) {
#pragma HLS interface mode=axis port=in,out
for (unsigned i=0; i<N; i++) {
A tmp = in[i];
if (flag)
tmp.varB += 5;
out[i] = tmp;
}
}
If this coding style is not adhered to, it will lead to functional failures of the stream processing.
Port-Level I/O Protocols
By default input pointers and pass-by-value arguments are implemented as
simple wire ports with no associated handshaking signal. For example, in the vadd
function discussed in Interfaces for Vivado IP Flow, the input ports are implemented without an I/O
protocol, only a data port. If the port has no I/O protocol, (by default or by design)
the input data must be held stable until it is read.
vadd
function example, the output port is implemented with an associated
output valid port (out_r_o_ap_vld
) which indicates
when the data on the port is valid and can be read. If there is no I/O protocol
associated with the output port, it is difficult to know when to read the data. Function arguments which are both read from and written to are split into
separate input and output ports. In the vadd
function
example, the out_r
argument is implemented as both an
input port out_r_i
, and an output port out_r_o
with associated I/O protocol port out_r_o_ap_vld
.
If the function has a return value, an output port ap_return is implemented to provide the return value. When the RTL design completes one transaction, this is equivalent to one execution of the C/C++ function, the block-level protocols indicate the function is complete with the ap_done signal. This also indicates the data on port ap_return is valid and can be read.
For the example code shown the timing behavior is shown in the following figure (assuming that the target technology and clock frequency allow a single addition per clock cycle).
- The design starts when ap_start is asserted High.
- The ap_idle signal is asserted Low to indicate the design is operating.
- The input data is read at any clock after the first cycle. Vitis HLS schedules when the reads occur. The ap_ready signal is asserted High when all inputs have been read.
- When output
sum
is calculated, the associated output handshake (sum_o_ap_vld
) indicates that the data is valid. - When the function completes, ap_done is asserted. This also indicates that the data on ap_return is valid.
- Port ap_idle is asserted High to indicate that the design is waiting start again.
Port-Level I/O: No Protocol
The ap_none
specifies that no I/O protocol
be added to the port. When this is specified the argument is implemented as a data port with
no other associated signals. The ap_none
mode is the default
for scalar inputs.
ap_none
The ap_none
port-level I/O protocol is
the simplest interface type and has no other signals associated with it. Neither the input
nor output data signals have associated control ports that indicate when data is read or
written. The only ports in the RTL design are those specified in the source code.
An ap_none
interface does not require
additional hardware overhead. However, the ap_none
interface
does requires the following:
- Producer blocks to do one of the following:
- Provide data to the input port at the correct time
- Hold data for the length of a transaction until the design completes
- Consumer blocks to read output ports at the correct time
ap_none
interface cannot be used with array arguments.Port-Level I/O: Wire Handshakes
Interface mode ap_hs includes a two-way handshake signal with the data port. The handshake is an industry standard valid and acknowledge handshake. Mode ap_vld is the same but only has a valid port and ap_ack only has a acknowledge port.
Mode ap_ovld is for use with in-out arguments. When the in-out is split into separate input and output ports, mode ap_none is applied to the input port and ap_vld applied to the output port. This is the default for pointer arguments that are both read and written.
The ap_hs mode can be applied to arrays that are read or written in sequential order. If Vitis HLS can determine the read or write accesses are not sequential, it will halt synthesis with an error. If the access order cannot be determined, Vitis HLS will issue a warning.
ap_hs (ap_ack, ap_vld, and ap_ovld)
The ap_hs port-level I/O protocol provides the greatest flexibility in the development process, allowing both bottom-up and top-down design flows. Two-way handshakes safely perform all intra-block communication, and manual intervention or assumptions are not required for correct operation. The ap_hs port-level I/O protocol provides the following signals:
- Data port
- Valid signal to indicate when the data signal is valid and can be read
- Acknowledge signal to indicate when the data has been read
The following figure shows how an ap_hs interface behaves for both an input and output port. In this example, the input port is named in, and the output port is named out.
For inputs, the following occurs:
- After start is applied, the block begins normal operation.
- If the design is ready for input data but the input valid is Low, the design stalls and waits for the
input valid to be asserted to indicate a new
input value is present.Note: The preceding figure shows this behavior. In this example, the design is ready to read data input in on clock cycle 4 and stalls waiting for the input valid before reading the data.
- When the input valid is asserted High, an output acknowledge is asserted High to indicate the data was read.
For outputs, the following occurs:
- After start is applied, the block begins normal operation.
- When an output port is written to, its associated output valid signal is simultaneously asserted to indicate valid data is present on the port.
- If the associated input acknowledge is Low, the design stalls and waits for the input acknowledge to be asserted.
- When the input acknowledge is asserted, indicating the data has been read, the output valid is deasserted on the next clock edge.
ap_ack
The ap_ack port-level I/O protocol is a subset of the ap_hs interface type. The ap_ack port-level I/O protocol provides the following signals:
- Data port
- Acknowledge signal to indicate when data is consumed
- For input arguments, the design generates an output acknowledge that is active-High in the cycle the input is read.
- For output arguments, Vitis HLS implements an input acknowledge port to confirm the output was read.
Note: After a write operation, the design stalls and waits until the input acknowledge is asserted High, which indicates the output was read by a consumer block. However, there is no associated output port to indicate when the data can be consumed.
ap_vld
The ap_vld is a subset of the ap_hs interface type. The ap_vld port-level I/O protocol provides the following signals:
- Data port
- Valid signal to indicate when the data signal is valid and can
be read
- For input arguments, the design reads the data port as soon as the valid is active. Even if the design is not ready to read new data, the design samples the data port and holds the data internally until needed.
- For output arguments, Vitis HLS implements an output valid port to indicate when the data on the output port is valid.
ap_ovld
The ap_ovld is a subset of the ap_hs interface type. The ap_ovld port-level I/O protocol provides the following signals:
- Data port
- Valid signal to indicate when the data signal is valid and can
be read
- For input arguments and the input half of inout arguments, the design defaults to type ap_none.
- For output arguments and the output half of inout arguments, the design implements type ap_vld.
Port-Level I/O: Memory Interface Protocol
Array arguments are implemented by default as an ap_memory interface. This is a standard block RAM interface with data, address, chip-enable, and write-enable ports.
An ap_memory interface can be implemented as a single-port of dual-port interface. If Vitis HLS can determine that using a dual-port interface will reduce the initial interval, it will automatically implement a dual-port interface. The BIND_STORAGE pragma or directive is used to specify the memory resource and if this directive is specified on the array with a single-port block RAM, a single-port interface will be implemented. Conversely, if a dual-port interface is specified using the BIND_STORAGE pragma and Vitis HLS determines this interface provides no benefit it will automatically implement a single-port interface.
If the array is accessed in a sequential manner an ap_fifo interface can be used. As with the ap_hs interface, Vitis HLS will halt if it determines the data access is not sequential, report a warning if it cannot determine if the access is sequential or issue no message if it determines the access is sequential. The ap_fifo interface can only be used for reading or writing, not both.
ap_memory, bram
The ap_memory and bram interface port-level I/O protocols are used to implement array arguments. This type of port-level I/O protocol can communicate with memory elements (for example, RAMs and ROMs) when the implementation requires random accesses to the memory address locations.
The ap_memory and bram interface port-level I/O protocols are identical. The only difference is the way Vivado IP integrator shows the blocks:
- The ap_memory interface appears as discrete ports.
- The bram interface appears as a single, grouped port. In IP integrator, you can use a single connection to create connections to all ports.
When using an ap_memory interface, specify the array targets using the BIND_STORAGE pragma. If no target is specified for the arrays, Vitis HLS determines whether to use a single or dual-port RAM interface.
The following figure shows an array named d specified as a single-port block RAM. The port names are based on the C/C++ function argument. For example, if the C/C++ argument is d, the chip-enable is d_ce, and the input data is d_q0 based on the output/q port of the BRAM.
After reset, the following occurs:
- After start is applied, the block begins normal operation.
- Reads are performed by applying
an address on the output address ports while asserting the output signal
d_ce
.Note: For a default block RAM, the design expects the input data d_q0 to be available in the next clock cycle. You can use the BIND_STORAGE pragma to indicate the RAM has a longer read latency. - Write operations are performed by asserting output ports d_ce and d_we while simultaneously applying the address and output data d_d0.
ap_fifo
When an output port is written to, its associated output valid signal interface is the most hardware-efficient approach when the design requires access to a memory element and the access is always performed in a sequential manner, that is, no random access is required. The ap_fifo port-level I/O protocol supports the following:
- Allows the port to be connected to a FIFO
- Enables complete, two-way
empty-full
communication - Works for arrays, pointers, and pass-by-reference argument types
volatile
qualifier when using this coding style, see Multi-Access Pointers on the Interface.In the following example, in1
is a pointer that
accesses the current address, then two addresses
above the current address, and finally one address
below.
void foo(int* in1, ...) {
int data1, data2, data3;
...
data1= *in1;
data2= *(in1+2);
data3= *(in1-1);
...
}
If in1
is
specified as an ap_fifo interface, Vitis HLS checks the accesses, determines the accesses are not in sequential
order, issues an error, and halts. To read from non-sequential address locations, use an
ap_memory or bram interface.
You cannot specify an ap_fifo interface on an argument that is both read from and written to. You can only specify an ap_fifo interface on an input or an output argument. A design with input argument in and output argument out specified as ap_fifo interfaces behaves as shown in the following figure.
For inputs, the following occurs:
- After ap_start is applied, the block begins normal operation.
- If the input port is ready to be read but the FIFO is empty as indicated by input port in_empty_n Low, the design stalls and waits for data to become available.
- When the FIFO contains data as indicated by input port in_empty_n High, an output acknowledge in_read is asserted High to indicate the data was read in this cycle.
For outputs, the following occurs:
- After start is applied, the block begins normal operation.
- If an output port is ready to be written to but the FIFO is full as indicated by out_full_n Low, the data is placed on the output port but the design stalls and waits for the space to become available in the FIFO.
- When space becomes available in the FIFO as indicated by out_full_n High, the output acknowledge signal out_write is asserted to indicate the output data is valid.
- If the top-level function or
the top-level loop is pipelined using the
-rewind
option, Vitis HLS creates an additional output port with the suffix _lwr. When the last write to the FIFO interface completes, the _lwr port goes active-High.
Block-Level Control Protocols
- Pipelined execution (
ap_ctrl_chain
) permitting overlapping kernel runs to begin processing additional data as soon as the kernel is ready. - Sequential execution (
ap_ctrl_hs
) requiring the kernel to complete one cycle before beginning another. - Data driven execution (
ap_ctrl_none
) which enables the kernel to run when data is available, and stall when data is not.
The ap_ctrl_hs block-level control protocol is the default for the Vivado IP flow. Interfaces for Vivado IP Flow shows the resulting RTL ports and behavior when Vitis HLS implements ap_ctrl_hs on a function.
The ap_ctrl_chain control protocol is the default for the Vitis kernel flow as explained in Interfaces for Vitis Kernel Flow. It is similar to ap_ctrl_hs but provides an additional input signal ap_continue to apply back pressure. Xilinx recommends using the ap_ctrl_chain block-level I/O protocol when chaining Vitis HLS blocks together.
ap_ctrl_hs
The following figure shows the behavior of the block-level handshake signals created by the ap_ctrl_hs control protocol for a non-pipelined design.
After reset, the following occurs:
- The block waits for ap_start to go High before it begins operation.
- Output ap_idle goes Low immediately to indicate the design is no longer idle.
- The ap_start signal must remain High
until ap_ready goes High. Once ap_ready goes High:
- If ap_start remains High the design will start the next transaction.
- If ap_start is taken Low, the design will complete the current transaction and halt operation.
- Data can be read on the input ports.
- Data can be written to the output ports.Note: The input and output ports can also specify a port-level I/O protocol that is independent of the control protocol. For details, see Port-Level I/O Protocols.
- Output ap_done goes High when the
block completes operation.Note: If there is an ap_return port, the data on this port is valid when ap_done is High. Therefore, the ap_done signal also indicates when the data on output ap_return is valid.
- When the design is ready to accept new inputs, the ap_ready signal goes High. Following is additional information
about the ap_ready signal:
- The ap_ready signal is inactive until the design starts operation.
- In non-pipelined designs, the ap_ready signal is asserted at the same time as ap_done.
- In pipelined designs, the ap_ready signal might go High at any cycle after ap_start is sampled High. This depends on how the design is pipelined.
- If the ap_start signal is Low when ap_ready is High, the design executes until ap_done is High and then stops operation.
- If the ap_start signal is High when ap_ready is High, the next transaction starts immediately, and the design continues to operate.
- The ap_idle signal indicates when the
design is idle and not operating. Following is additional information about the ap_idle signal:
- If the ap_start signal is Low when ap_ready is High, the design stops operation, and the ap_idle signal goes High one cycle after ap_done.
- If the ap_start signal is High when ap_ready is High, the design continues to operate, and the ap_idle signal remains Low.
ap_ctrl_chain
The ap_ctrl_chain control protocol is similar to the ap_ctrl_hs protocol but provides an additional input port named ap_continue. An active-High ap_continue signal indicates that the downstream block that consumes the output data is ready for new data inputs. If the downstream block is not able to consume new data inputs, the ap_continue signal is Low, which prevents upstream blocks from generating additional data.
The ap_ready port of the downstream block can directly drive the ap_continue port. Following is additional information about the ap_continue port:
- If the ap_continue signal is High when ap_done is High, the design continues operating. The behavior of the other block-level control signals is identical to those described in the ap_ctrl_hs block-level I/O protocol.
- If the ap_continue signal is Low when ap_done is High, the design stops operating, the ap_done signal remains High, and data remains valid on the ap_return port if the ap_return port is present.
In the following figure, the first transaction completes, and the second transaction starts immediately because ap_continue is High when ap_done is High. However, the design halts at the end of the second transaction until ap_continue is asserted High.
ap_ctrl_none
If you specify the ap_ctrl_none control
protocol, the handshake signal ports (ap_start, ap_idle
, ap_ready, and ap_done) are not created. You can use this protocol to create a
block without control signals as used in data driven kernels.
@E [SIM-345] Cosim only supports the following 'ap_ctrl_none' designs: (1)
combinational designs; (2) pipelined design with task interval of 1; (3) designs with
array streaming or hls_stream ports.
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***
Managing Interfaces with SSI Technology Devices
Certain Xilinx devices use stacked silicon interconnect (SSI) technology. In these devices, the total available resources are divided over multiple super logic regions (SLRs). The connections between SLRs use super long line (SSL) routes. SSL routes incur delays costs that are typically greater than standard FPGA routing. To ensure designs operate at maximum performance, use the following guidelines:
- Register all signals that cross between SLRs at both the SLR output and SLR input.
- You do not need to register a signal if it enters or exits an SLR via an I/O buffer.
- Ensure that the logic created by Vitis HLS fits within a single SLR.
If the logic is contained within a single SLR device, Vitis HLS provides a -register_all_io
option to
the config_rtl
command. If the
option is enabled, all inputs and outputs are registered. If
disabled, none of the inputs or outputs are registered.