Managing Interface Synthesis
Introduction to Interface Synthesis
The parameters of the software functions defined in a Vitis HLS design are synthesized into ports in the
RTL code. The parameters of the top-level function are synthesized into interfaces
and ports that group multiple signals to encapsulate the communication protocol
between the HLS design and things external to the design. Vitis HLS defines interfaces automatically, using industry standards
to specify the protocol used. The type of interfaces that Vitis HLS creates depends on the data type and direction of the
parameters of the top-level function, the target flow for the active solution, the
default interface configuration settings as specified by config_interface
, and any specified INTERFACE pragmas or directives.
- The Vivado IP flow which is the default flow for the tool
- The Vitis Kernel flow for the Vitis Application Acceleration Development flow
open_solution -flow_target [vitis | vivado]
The interface provides channels for data from outside the IP or kernel to flow into or out of the design. Data can flow from a variety of sources external to the kernel, such as a host application, an external camera or sensor, or from another kernel or IP implemented on the Xilinx device. However, the interface also provides a control scheme for the IP or kernel to be used in the operation of the block as a unit, and to manage the flow of data across a specific channel or port. These control signals are defined by the block protocol and the port protocol as explained in Block and Port Interface Protocols. The interface defines the size and performance characteristics of the data channel into the hardware design, controls the flow of data through the channels, and also controls the operation of the block.
You can see that the choice and configuration of interfaces is a key to the success of your design. However, Vitis HLS tries to simplify the process by selecting default interfaces for the target flows. For more information on the defaults used refer to Default Interfaces for Vivado IP Flow or Default Interfaces for Vitis Kernel Flow as appropriate to your design.
After synthesis completes you can review the mapping of the software arguments of your C/C++ code to hardware ports or interfaces in the SW I/O Information section of the Synthesis Summary report.
Block and Port Interface Protocols
Block-Level Control Protocols
Block-level interface protocols provide a mechanism for controlling the operation of the Vivado IP or Vitis kernel from other RTL modules, from software applications or drivers, or from the Xilinx Run Time (XRT) in the Vitis Application Acceleration Development flow. Port-level interface protocols provide a similar mechanism for controlling the flow of data through individual ports on the IP or kernel. The specified protocols are keywords directing the HLS tool as to which protocol to implement when generating the RTL output.
Vitis HLS uses block-level control protocols ap_ctrl_chain, ap_ctrl_hs, and ap_ctrl_none to specify if the RTL is implemented with block-level handshake signals, and what signals to include. Block-level handshake signals specify the following:
- When the design can start to perform the operation
- When the operation ends
- When the design is idle and ready for new inputs
By default, Vitis HLS adds a
block-level interface protocol to the synthesized design to control the block. The
ports of a block-level interface control when the block can start processing data
(ap_start
), indicate when it is ready to accept
new inputs (ap_ready
), and indicate if the design is
idle (ap_idle
) or has completed operation (ap_done
). These are discussed in detail in Block-Level I/O Protocols.
Port-Level Control Protocols
Port-level I/O protocols are control signals assigned to the data ports of the Vivado IP or Vitis kernel. The I/O protocol created depends on the type of C/C++ argument and on the target flow. While block-level protocols control when the kernel is started and when it can accept data, the port-level I/O protocols are used to control the flow of data through specific ports.
Port-level protocols come in a variety of standards implemented to
support the different designs and applications supported by Xilinx devices. For example, by default in the
Vivado IP flow input pointers and
pass-by-value arguments are implemented as simple wire ports with no associated
handshaking signal, indicated by port protocol ap_none
. Output pointers are implemented with an associated signal to
indicate when the data is valid, using interface mode ap_vld
.
As described in Default Interfaces for Vitis Kernel Flow,
AXI4 interfaces are the default. The
m_axi
interface mode specifies an AXI4 master I/O protocol for arrays and pointers
(and references in C++) only. The s_axilite
interface mode specifies an AXI4-Lite slave I/O
protocol for most other types. However, the ports assigned to these interfaces can
also have port protocols to indicate when data is valid and when it has been
consumed. Additional details are provided in Port-Level I/O Protocols.
Clock and Reset Ports
If the design takes more than 1 cycle to complete operation, a clock-enable port
(ap_ce
) can optionally be added to the entire block
using the config_interface
command, or in the Vitis HLS GUI using the command.
The operation of the reset is described in Controlling the Reset Behavior, and can be modified using the config_rtl
command, also available in the Solutions
Settings dialog box.
Default Interfaces for Vivado IP Flow
The Vivado IP flow supports a wide variety of I/O protocols and handshakes due to the requirement of supporting FPGA design for a wide variety of applications. This flow implements the following interfaces by default:
- Scalar inputs:
ap_none
interface mode - Array:
ap_memory
interface mode - Pointers or Reference:
- Input:
ap_none
interface mode - InOut:
ap_ovld
interface mode - Output:
ap_vld
interface mode
- Input:
- Arguments specified as
hls::stream
:ap_fifo
interface mode - Function Return:
ap_return
port using theap_none
interface mode - Block Protocol:
ap_ctrl_hs
The sum_io
function in the following
code provides an example of interface synthesis.
#include "sum_io.h"
dout_t sum_io(din_t in1, din_t in2, dio_t *sum) {
dout_t temp;
*sum = in1 + in2 + *sum;
temp = in1 + in2;
return temp;
}
The sum_io
function includes:
- Two pass-by-value inputs:
in1
andin2
. - A pointer:
sum
that is both read from and written to. - A function
return
assigned the value oftemp
.
With the default interface synthesis settings used for the Vivado IP flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.
In the default Vivado IP flow the tool creates two types of interface ports on the RTL design to handle the flow of both data and control.
- Block-level interface protocol: The ap_ctrl interface in the preceding figure has been expanded to show the
signals provided by the default
ap_ctrl_hs
protocol: ap_start, ap_done, ap_ready, and ap_idle. - Port-level interface protocols: These are created for each argument in
the top-level function and the function return (if the function returns a value). As
explained above most of the arguments use a port protocol of
ap_none
, and so have no control signals. In thesum_io
example above these ports include: in1, in2, sum_i, and ap_return. However, the output port uses theap_vld
protocol and so the sum_o output is associated with the sum_o_ap_vld signal.Note: Notice that the inout argument,sum
, has been split into input and output ports.
Default Interfaces for Vitis Kernel Flow
The Vitis kernel flow provides support for compiled kernel objects (.xo) for control by the Xilinx Run Time (XRT) and integration with a host application. This flow has very specific interface requirements that Vitis HLS must meet. This flow implements the following interfaces by default:
- Scalar inputs: AXI4-Lite interface
(
s_axilite
) - Pointers to an Array: AXI4 memory
mapped interface (
m_axi
) to access the memory, and thes_axilite
interface to receive the offset into the memory address space. - Arguments specified as
hls::stream
: AXI4-Stream interface (axis
) - Function Return:
ap_return
port added to thes_axilite
interface - Block Protocol:
ap_ctrl_chain
specified on thes_axilite
interface.
The s_axilite
interface is special in
the Vitis kernel flow. It handles the input of scalar
arguments from the software function into the kernel as well as any function return value;
but it also specifies offsets for m_axi
interfaces and
handles the block control protocol.
The sum_io
function in the following
code provides an example of interface synthesis.
#include "sum_io.h"
dout_t sum_io(din_t in1, din_t in2, dio_t *sum) {
dout_t temp;
*sum = in1 + in2 + *sum;
temp = in1 + in2;
return temp;
}
The sum_io
function includes:
- Two pass-by-value inputs:
in1
andin2
. - A pointer:
sum
that is both read from and written to. - A function
return
assigned the value oftemp
.
With the default interface synthesis settings used by Vitis HLS for the Vivado IP flow, the design is synthesized into an RTL block with the ports and interfaces shown in the following figure.
In the default Vitis kernel flow the tool creates three types of interface ports on the RTL design to handle the flow of both data and control.
- Clock, Reset, and Interrupt ports:
ap_clk
andap_rst_n
andinterrupt
are added to the kernel. - AXI4-Lite interface:
s_axi_control
interface to handle data values for scalar argumentsin1
andin2
, and to handle the functionreturn
value. The interface is expanded to show the various ports associated with it. - AXI4 memory mapped interface:
m_axi_gmem
interface to handle thesum
argument. - Block-Level interface protocol: The default
ap_ctrl_chain
protocol is associated with thes_axi_control
interface.
Details of Interface Synthesis
The following sections provide additional details of how to add and configure interfaces, and details of the available block-level and port-level protocols, including waveform diagrams.
Specifying Interfaces
As discussed previously, the type of interfaces that Vitis HLS creates depends on the data type and direction of the arguments of the top-level function, the target flow for the active solution, the default interface configuration settings, and any specified INTERFACE pragmas or directives.
The configuration settings are defined by the config_interface command. You can change the defaults defined in the configuration settings by selecting the menu command as described in Setting Configuration Options.
The INTERFACE
pragma or directive lets
you specify details for a specific function argument, or interface port, augmenting the
default configuration or overriding it as needed. To specify the interface mode for
arguments, open the source code editor to open the Directives view. Right-click the
argument on the top-level function in the Directives view in the Vitis HLS IDE, and select Insert
Directive to open the Vitis HLS
Directive Editor dialog box. Select INTERFACE for the
Directive as shown in the following
figure.
The various options displayed for the INTERFACE directive change depending on the specific interface mode you select. Refer to set_directive_interface for details on the various options, or select the Help command in the Directive Editor dialog box. Following are some items of interest:
- Destination
- Specifies whether the INTERFACE is added as a directive to the directives.tcl script, or as a pragma to the source code. Refer to Using Directives in Scripts vs. Pragmas in Code for more information.
- mode
- The interface mode specifies the port control protocol used by the selected argument. Refer to Port-Level I/O Protocols for additional information on the different available port protocols.
- register
- If you select this option, all pass-by-value reads are performed in the first cycle of operation. For output ports, the register option guarantees the output is registered. You can apply the register option to any function in the design. For memory, FIFO, and AXI4 interfaces, the register option has no effect.
- port
- Specifies the selected port or argument the INTERFACE is applied to.
- depth
- This option specifies how many samples are provided to the
design by the test bench and how many output values the test bench must store.
Use whichever number is greater, the samples or output values.Note: For cases in which a pointer is read from or written to multiple times within a single transaction, the depth option is required for C/RTL co-simulation. The depth option is not required for arrays or when using the
hls::stream
construct. It is only required when using pointers on the interface.If the depth option is set too small, the C/RTL co-simulation might deadlock as follows:
- The input reads might stall waiting for data that the test bench cannot provide.
- The output writes might stall when trying to write data, because the storage is full.
- offset
- This option is used for AXI4 interfaces, and specifies the memory address offset for the specified argument.
AXI Adapter Interface Protocols
The AXI4 interfaces supported by
Vitis HLS include the AXI4-Stream interface (axis
), AXI4-Lite (s_axilite
), and
AXI4 master (m_axi
) interfaces. For a complete description of the AXI4 interfaces, including timing and ports, see the Vivado Design Suite: AXI Reference
Guide (UG1037).
- s_axilite
- Specify this protocol on any type of argument except streams.
The
s_axilite
mode specifies an AXI4-Lite slave interface.TIP: You can bundle multiple arguments into a singles_axilite
interface. - m_axi
- Specify on arrays and pointers (and references in C++) only.
The
m_axi
mode specifies an AXI4 Memory Mapped interface.TIP: You can group bundle arguments into a singlem_axi
interface. - axis
- Specify this protocol on input arguments or output arguments
only, not on input/output arguments. The
axis
mode specifies an AXI4-Stream interface.
AXI4-Lite Interface
Overview
An HLS IP or kernel can be controlled by a host application,
or embedded processor using the Slave AXI4-Lite interface (s_axilite
) which acts as a system bus for communication
between the processor and the kernel. Using the s_axilite
interface the host or an embedded processor can
start and stop the kernel, and read or write data to it. When Vitis HLS synthesizes the design the
s_axilite
interface is implemented
as an adapter that captures the data that was communicated from the host in
registers on the adapter.
The AXI4-Lite interface performs several functions within a Vivado IP or Vitis kernel:
- It maps a block-level control mechanism which can be used to start and stop the kernel.
- It provides a channel for passing scalar arguments,
function return values, and address offsets for
m_axi
interfaces from the host to the IP or kernel - For the Vitis
Kernel flow:
- The tool will automatically infer the
s_axilite
interface pragma for pointer arguments, scalars, and function return type. - Bundle: Do not specify the
bundle
option for thes_axilite
adapter in the Vitis Kernel flow. The tool will create a singles_axilite
interface that will serve for the whole design.IMPORTANT: HLS will return an error if multiple bundles are specified for the Vitis Kernel flow. - Offset: The tool will automatically choose the offsets for the interface. Do not specify any offsets in this flow.
- The tool will automatically infer the
- For the Vivado IP
flow:
- This flow will not use the
s_axilite
interface by default. - To use the
s_axilite
as a communication channel for scalar arguments,m_axi
pointer address, and function return type, you must manually specify the INTERFACE pragma or directive. - Bundle: This flow supports multiple
s_axilite
interfaces, specified by bundle. Refer to S_AXILITE Bundle Rules for more information. - Offset: By default the tool will place the arguments in a sequential order starting from 0x10 in the control register map. Refer to S_AXILITE Offset Option for additional details.
- This flow will not use the
S_AXILITE Example
The following example shows how Vitis HLS implements multiple arguments, including the function
return, as an s_axilite
interface. Because each
pragma uses the same name for the bundle
option,
each of the ports is grouped into a single interface.
void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=c bundle=BUS_A
#pragma HLS INTERFACE ap_vld port=b
*c += *a + *b;
}
bundle
option, Vitis HLS groups all arguments into a single s_axilite
bundle and automatically names the
port.- Host application running on an x86 or embedded processor interacting with the IP or kernel
- SAXI Lite Adapter: The INTERFACE pragma implements an
s_axilite
adapter. The adapter has two primary functions: implementing the interface protocol to communicate with the host, and providing a Control Register Map to the IP or kernel. - The HLS engine or function that implements the design logic
By default, Vitis HLS
automatically assigns the address for each port that is grouped into an s_axilite
interface. The size, or range of
addresses assigned to a port is dependent on the argument data type and the port
protocol used, as described below. You can also explicitly define the address
using the offset
option as discussed in S_AXILITE Offset Option.
- Port a: By default, is implemented as
ap_none
. 1-word for the data signal is assigned and only 3 bits are used as the argument data type ischar
. Remaining bits are unused. - Port b: is implemented as
ap_vld
defined by the INTERFACE pragma in the example. The corresponding control register is of size 2 bytes (16-bits) and is divided into two sections as follows:- (0x1c) Control signal : 1-word for the control signal is assigned.
- (0x18) Data signal: 1-word for the data signal is
assigned and only 3 bits are used as the argument data type is
char
. Remaining bits are unused.
- Port C: By default, is implemented as
ap_ovld
as an output. The corresponding control register is of size 4 bytes (32 bits) and is divided into three sections:- (0x20) Data signal of
c_i
: 1-word for the input data signal is assigned, and only 3 bits are used as the argument data type ischar
, the rest are not used - (0x24) Reserved Space
- (0x28) Data signal of
c_o
: 1-word for the output data signal is assigned. - (0x2c) Control signal of
c_o
: 1-word for control signalap_ovld
is assigned and only 3 bits are used as the argument data type ischar
. Remaining bits are unused.
- (0x20) Data signal of
In operation the host application will initially start the kernel by writing into the Control address space (0x00). The host/CPU completes the initial setup by writing into the other address spaces which are associated with the various function arguments as defined in the example.
The control signal for port b is asserted and only then can the HLS
engine read ports a and b (port a is ap_none
and
does not have a control signal). Until that time the design is stalled and waiting
for the valid register to be set for port b.
Each time port b is read by the HLS engine the input valid register is cleared and the register resets to logic
0.
After the HLS engine finishes its computation, the output value on
port C is stored in the control register and the corresponding valid bit is set for the host to read. After the
host reads the data, the HLS engine will write the ap_done
bit in the Control register (0x00) to mark the end of the
IP computation.
Vitis HLS reports the assigned
addresses in the in the S_AXILITE Control Register Map, and
also provides them in C Driver Files to aid in your software
development. Using the s_axilite
interface, you
can output C driver files for use with code running on an embedded or x86
processor using provided C application program interface (API) functions, to let
you control the hardware from your software.
S_AXILITE Control Register Map
s_axilite
interface. The register map, which
is added to the generated RTL files, can be divided into two sections: - Block-level control signals
- Function arguments mapped into the
s_axilite
interface
s_axilite
interface by default. To change the default block
protocol, specify the interface pragma as follows:
#pragma HLS INTERFACE ap_ctrl_hs port=return
ap_ctrl
, as seen in Default Interfaces for Vivado IP Flow. However, if you are using an s_axilite
interface in your IP, you can also
assign the block control protocol to that interface using the following
INTERFACE pragmas, as an example:
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE ap_ctrl_hs port=return bundle=BUS_A
In the Control Register Map, Vitis HLS reserves addresses0x00
through 0x0C
for the
block-level protocol signals and interrupt controls, as shown below:
Address | Description |
---|---|
0x00 | Control signals |
0x04 | Global Interrupt Enable Register |
0x08 | IP Interrupt Enable Register (Read/Write) |
0x0c | IP Interrupt Status Register (Read/TOW) |
The Control signals (0X00) contains ap_start
, ap_done
, ap_ready
, and ap_idle
; and in the case of ap_ctrl_chain
the block protocol also contains ap_continue
. These are the block-level interface signals which are accessed
through the s_axilite
adapter.
To start the block operation theap_start
bit in the Control register must be set to 1. The
HLS engine will then proceed and read any inputs grouped into
theAXI4-Liteslave interface from the register in the interface.
When the block completes the operation, theap_done
,ap_idle
andap_ready
registers will be set by the hardware output ports and the results for any
output ports grouped into the s_axilite
interface read from the appropriate register.
For function arguments, Vitis HLS automatically assigns the address for each
argument or port that is assigned to the s_axilite
interface. The tool will assign each port an
offset starting from 0x10
, the lower
addresses being reserved for control signals. The size, or range of
addresses assigned to a port is dependent on the argument data type and the
port protocol used.
Because the variables grouped into an AXI4-Lite interface are function arguments which do not have
a default value in the C code, none of the argument registers in the s_axilite
interface can be assigned a
default value. The registers can be implemented with a reset using the
config_rtl
command, but they cannot
be assigned any other default value.
The Control Register Map generated by Vitis HLS for the previous example is provided below:
//------------------------Address Info-------------------
// 0x00 : Control signals
// bit 0 - ap_start (Read/Write/COH)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read)
// bit 7 - auto_restart (Read/Write)
// others - reserved
// 0x04 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
// bit 0 - enable ap_done interrupt (Read/Write)
// bit 1 - enable ap_ready interrupt (Read/Write)
// others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
// bit 0 - ap_done (COR/TOW)
// bit 1 - ap_ready (COR/TOW)
// others - reserved
// 0x10 : Data signal of a
// bit 7~0 - a[7:0] (Read/Write)
// others - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
// bit 7~0 - b[7:0] (Read/Write)
// others - reserved
// : Control signal of b
// bit 0 - b_ap_vld (Read/Write/SC)
// others - reserved
// 0x20 : Data signal of c_i
// bit 7~0 - c_i[7:0] (Read/Write)
// others - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
// bit 7~0 - c_o[7:0] (Read)
// others - reserved
// 0x2c : Control signal of c_o
// bit 0 - c_o_ap_vld (Read/COR)
// others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake)
S_AXILITE and Port-Level Protocols
s_axilite
interface. Only the default assignments are
supported.s_axilite
interface.
Port-level I/O protocols sequence data into and out of the HLS engine from
the s_axilite
adapter as seen in S_AXILITE Example. The tool assigns a
default port protocol to a port depending on the type and direction of the
argument associated with it. The port can contain one or more of the
following: - Data signal for the argument
- Valid signal (
ap_vld
/ap_ovld
) to indicate when the data can be read - Acknowledge signal (
ap_ack
) to indicate when the data has been read
The default port protocol assignments for various argument types are as follows:
Argument Type | Default | Supported |
---|---|---|
scalar | ap_none |
ap_ack and ap_vld can also be used |
Pointers/References | ||
Inputs | ap_none |
ap_ack and ap_vld |
Outputs | ap_vld |
ap_none , ap_ack , and ap_ovld can also be used |
Inouts | ap_ovld |
ap_none , ap_ack , and ap_vld are also supported |
ap_memory
. The bram
port
protocol is not supported for arrays in an s_axilite
interface.The example groups port b
into the s_axilite
interface and specifies
port b as using the ap_vld
protocol with
INTERFACE pragmas. As a result, the s_axilite
adapter contains a register for the port b
data, and a register for the port b
input valid signal.
If the input valid register is not set to logic 1, the data in
the b
data register is not considered
valid, and the design stalls and waits for the valid register to be set.
Each time port b
is read, Vitis HLS automatically clears the input
valid register and resets the register to logic 0.
s_axilite
interface.
S_AXILITE Bundle Rules
In the S_AXILITE Example all
the function arguments are grouped into a single s_axilite
interface adapter specified by the bundle=BUS_A
option in the INTERFACE pragma.
The bundle
option simply lets you group
ports together into one interface.
s_axi_control
by the tool. So you should not
specify the bundle
option in that flow, or
you will probably encounter an error during synthesis. However, in the
Vivado IP flow you can specify
multiple bundles using the s_axilite
interface, and this will create a separate interface adapter for each bundle
you have defined. The following example shows this:
void example(char *a, char *b, char *c)
{
#pragma HLS INTERFACE s_axilite port=a bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=b bundle=BUS_A
#pragma HLS INTERFACE s_axilite port=c bundle=OUT
#pragma HLS INTERFACE s_axilite port=return bundle=BUS_A
#pragma HLS INTERFACE ap_vld port=b
*c += *a + *b;
}
After synthesis completes, the Synthesis Summary report
provides feedback regarding the number of s_axilite
adapters generated, The SW-to-HW Mapping section of the
report contains the HW info showing the control register offset and the
address range for each port.
However, there are some rules related to using bundles with
the s_axilite
interface.
- Default Bundle Names: This rule explicitly groups all
interface ports with no bundle name into the same AXI4-Lite interface port, uses the
tool default bundle name, and names the RTL port
s_axi_<default>
, typicallys_axi_control
.In this example all ports are mapped to the default bundle:void top(char *a, char *b, char *c) { #pragma HLS INTERFACE s_axilite port=a #pragma HLS INTERFACE s_axilite port=b #pragma HLS INTERFACE s_axilite port=c *c += *a + *b; }
- User-Specified Bundle Names: This rule explicitly
groups all interface ports with the same
bundle
name into the same AXI4-Lite interface port, and names the RTL port the value specified bys_axi_<string>
The following example results in interfaces nameds_axi_BUS_A
,s_axi_BUS_B
, ands_axi_OUT
:void example(char *a, char *b, char *c) { #pragma HLS INTERFACE s_axilite port=a bundle=BUS_A #pragma HLS INTERFACE s_axilite port=b bundle=BUS_B #pragma HLS INTERFACE s_axilite port=c bundle=OUT #pragma HLS INTERFACE s_axilite port=return bundle=OUT #pragma HLS INTERFACE ap_vld port=b *c += *a + *b; }
- Partially Specified Bundle Names: If you specify
bundle
names for some arguments, but leave other arguments unassigned, then the tool will bundle the arguments as follows:- Group all ports into the specified bundles as indicated by the INTERFACE pragmas.
- Group any ports without bundle assignments into a default named bundle. The default name can either be the standard tool default, or an alternative default name if the tool default has already been specified by the user.
In the following example the user has specified
bundle=control
, which is the tool default name. In this case, port c will be assigned tos_axi_control
as specified by the user, and the remaining ports will be bundled unders_axi_control_r
, which is an alternative default name used by the tool.void top(char *a, char *b, char *c) { #pragma HLS INTERFACE s_axilite port=a #pragma HLS INTERFACE s_axilite port=b #pragma HLS INTERFACE s_axilite port=c bundle=control }
S_AXILITE Offset Option
offset
option in that flow. In the Vivado IP flow,
Vitis HLS defines the size, or
range of addresses assigned to a port in the S_AXILITE Control Register Map depending on the argument data type and
the port protocol used. However, the INTERFACE pragma also contains an
offset
option that lets you specify
the address offset in the AXI4-Lite
interface.
When specifying the offset for your argument, you must
consider the size of your data and reserve some extra for the port control
protocol. The range of addresses you reserve should be based on a 32-bit
word. You should reserve enough 32-bit words to fit your argument data type,
and add reserve one additional word for the control protocol, even for
ap_none
.
ap_memory
protocol for arrays,
you do not need to reserve the extra word for the control protocol. In this
case, simply reserve enough 32-bit words to fit your argument data type. For example, to reserve enough space for a double you need to reserve 2 32-bit words for the 64-bit data type, and then reserve an additional 32-bit word for the control protocol. So you need to reserve a total of 3 32-bit words, or 96 bits. If your argument offset starts at 0x020, then the next available offset would begin at 0x02c, in order to reserve the required address range for your argument.
If you make a mistake in setting the offset of your arguments, by not reserving enough address range to fit your data type and the control protocol, Vitis HLS will recognize the error, will warn you of the issue, and will recover by moving your misplaced argument register to the end of the Control Register Map. This will allow your build to proceed, but may not work with your host application or driver if they were written to your specified offset.
C Driver Files
When an AXI4-Lite slave interface is implemented, a set of C driver files are automatically created. These C driver files provide a set of APIs that can be integrated into any software running on a CPU and used to communicate with the device via the AXI4-Lite slave interface.
The C driver files are created when the design is packaged as IP in the IP catalog.
Driver files are created for standalone and Linux modes. In standalone mode the drivers are used in the same way as any other Xilinx standalone drivers. In Linux mode, copy all the C files (.c) and header files (.h) files into the software project.
The driver files and API functions derive their name from the top-level function
for synthesis. In the above example, the top-level function is called “example”. If
the top-level function was named “DUT” the name “example” would be replaced by “DUT”
in the following description. The driver files are created in the packaged IP
(located in the impl
directory inside the
solution).
File Path | Usage Mode | Description |
---|---|---|
data/example.mdd | Standalone | Driver definition file. |
data/example.tcl | Standalone | Used by SDK to integrate the software into an SDK project. |
src/xexample_hw.h | Both | Defines address offsets for all internal registers. |
src/xexample.h | Both | API definitions |
src/xexample.c | Both | Standard API implementations |
src/xexample_sinit.c | Standalone | Initialization API implementations |
src/xexample_linux.c | Linux | Initialization API implementations |
src/Makefile | Standalone | Makefile |
In file xexample.h, two structs are defined.
- XExample_Config
- This is used to hold the configuration information (base address of each AXI4-Lite slave interface) of the IP instance.
- XExample
- This is used to hold the IP instance pointer. Most APIs take this instance pointer as the first argument.
The standard API implementations are provided in files xexample.c, xexample_sinit.c, xexample_linux.c, and provide functions to perform the following operations.
- Initialize the device
- Control the device and query its status
- Read/write to the registers
- Set up, monitor, and control the interrupts
Refer to Vitis HLS C Driver Reference for a description of the API functions provided in the C driver files.
C Driver Files and Float Types
C driver files always use a data 32-bit unsigned integer (U32) for data
transfers. In the following example, the function uses float type arguments
a
and r1
. It sets the value
of a
and returns the value of r1
:
float caculate(float a, float *r1)
{
#pragma HLS INTERFACE ap_vld register port=r1
#pragma HLS INTERFACE s_axilite port=a
#pragma HLS INTERFACE s_axilite port=r1
#pragma HLS INTERFACE s_axilite port=return
*r1 = 0.5f*a;
return (a>0);
}
After synthesis, Vitis HLS groups all ports into the default AXI4-Lite interface and creates C driver files. However, as shown in the following example, the driver files use type U32:
// API to set the value of A
void XCaculate_SetA(XCaculate *InstancePtr, u32 Data) {
Xil_AssertVoid(InstancePtr != NULL);
Xil_AssertVoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);
XCaculate_WriteReg(InstancePtr->Hls_periph_bus_BaseAddress,
XCACULATE_HLS_PERIPH_BUS_ADDR_A_DATA, Data);
}
// API to get the value of R1
u32 XCaculate_GetR1(XCaculate *InstancePtr) {
u32 Data;
Xil_AssertNonvoid(InstancePtr != NULL);
Xil_AssertNonvoid(InstancePtr->IsReady == XIL_COMPONENT_IS_READY);
Data = XCaculate_ReadReg(InstancePtr->Hls_periph_bus_BaseAddress,
XCACULATE_HLS_PERIPH_BUS_ADDR_R1_DATA);
return Data;
}
If these functions work directly with float types, the write and read values are not consistent with expected float type. When using these functions in software, you can use the following casts in the code:
float a=3.0f,r1;
u32 ua,ur1;
// cast float “a” to type U32
XCaculate_SetA(&calculate,*((u32*)&a));
ur1=XCaculate_GetR1(&caculate);
// cast return type U32 to float type for “r1”
r1=*((float*)&ur1);
Controlling Hardware
In this example, the hardware header file xexample_hw.h
provides a complete list of the memory
mapped locations for the ports grouped into the AXI4-Lite slave interface, as described in S_AXILITE Control Register Map.
// 0x00 : Control signals
// bit 0 - ap_start (Read/Write/SC)
// bit 1 - ap_done (Read/COR)
// bit 2 - ap_idle (Read)
// bit 3 - ap_ready (Read)
// bit 7 - auto_restart (Read/Write)
// others - reserved
// 0x04 : Global Interrupt Enable Register
// bit 0 - Global Interrupt Enable (Read/Write)
// others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
// bit 0 - Channel 0 (ap_done)
// bit 1 - Channel 1 (ap_ready)
// 0x0c : IP Interrupt Status Register (Read/TOW)
// bit 0 - Channel 0 (ap_done)
// others - reserved
// 0x10 : Data signal of a
// bit 7~0 - a[7:0] (Read/Write)
// others - reserved
// 0x14 : reserved
// 0x18 : Data signal of b
// bit 7~0 - b[7:0] (Read/Write)
// others - reserved
// 0x1c : reserved
// 0x20 : Data signal of c_i
// bit 7~0 - c_i[7:0] (Read/Write)
// others - reserved
// 0x24 : reserved
// 0x28 : Data signal of c_o
// bit 7~0 - c_o[7:0] (Read)
// others - reserved
// 0x2c : Control signal of c_o
// bit 0 - c_o_ap_vld (Read/COR)
// others - reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on
Handshake)
To correctly program the registers in the s_axilite
interface, you must understand how the hardware
ports operate with the default port protocols, or the custom protocols as
described in S_AXILITE and Port-Level Protocols.
For example, to start the block operation the ap_start
register must be set to 1. The
device will then proceed and read any inputs grouped into the AXI4-Lite slave interface from the
register in the interface. When the block completes operation, the ap_done
, ap_idle
and ap_ready
registers will be set by the hardware output ports and the results for any
output ports grouped into the AXI4-Lite
slave interface read from the appropriate register.
The implementation of function argument c
in the example highlights the importance of some
understanding how the hardware ports operate. Function argument c
is both read and written to, and is
therefore implemented as separate input and output ports c_i
and c_o
, as explained in S_AXILITE Example.
The first recommended flow for programing the s_axilite
interface is for a one-time
execution of the function:
- Use the interrupt function standard API implementations provided in the C Driver Files to determine how you want the interrupt to operate.
- Load the register values for the block input ports. In
the above example this is performed using API functions
XExample_Set_a
,XExample_Set_b
, andXExample_Set_c_i
. - Set the
ap_start
bit to 1 usingXExample_Start
to start executing the function. This register is self-clearing as noted in the header file above. After one transaction, the block will suspend operation. - Allow the function to execute. Address any interrupts which are generated.
- Read the output registers. In the above example this is
performed using API functions
XExample_Get_c_o_vld
, to confirm the data is valid, andXExample_Get_c_o
.Note: The registers in thes_axilite
interface obey the same I/O protocol as the ports. In this case, the output valid is set to logic 1 to indicate if the data is valid. - Repeat for the next transaction.
The second recommended flow is for continuous execution of the block. In this mode, the input ports included in the AXI4-Lite interface should only be ports which perform configuration. The block will typically run much faster than a CPU. If the block must wait for inputs, the block will spend most of its time waiting:
- Use the interrupt function to determine how you wish the interrupt to operate.
- Load the register values for the block input ports. In
the above example this is performed using API functions
XExample_Set_a
,XExample_Set_a
andXExample_Set_c_i
. - Set the auto-start function using API
XExample_EnableAutoRestart
. - Allow the function to execute. The individual port I/O protocols will synchronize the data being processed through the block.
- Address any interrupts which are generated. The output registers could be accessed during this operation but the data may change often.
- Use the API function
XExample_DisableAutoRestart
to prevent any more executions. - Read the output registers. In the above example this is
performed using API functions
XExample_Get_c_o
andXExample_Set_c_o_vld
.
Controlling Software
The API functions can be used in the software running on the CPU to control the hardware block. An overview of the process is:
- Create an instance of the hardware
- Look Up the device configuration
- Initialize the device
- Set the input parameters of the HLS block
- Start the device and read the results
An example application is shown below.
#include "xexample.h" // Device driver for HLS HW block
#include "xparameters.h"
// HLS HW instance
XExample HlsExample;
XExample_Config *ExamplePtr
int main() {
int res_hw;
// Look Up the device configuration
ExamplePtr = XExample_LookupConfig(XPAR_XEXAMPLE_0_DEVICE_ID);
if (!ExamplePtr) {
print("ERROR: Lookup of accelerator configuration failed.\n\r");
return XST_FAILURE;
}
// Initialize the Device
status = XExample_CfgInitialize(&HlsExample, ExamplePtr);
if (status != XST_SUCCESS) {
print("ERROR: Could not initialize accelerator.\n\r");
exit(-1);
}
//Set the input parameters of the HLS block
XExample_Set_a(&HlsExample, 42);
XExample_Set_b(&HlsExample, 12);
XExample_Set_c_i(&HlsExample, 1);
// Start the device and read the results
XExample_Start(&HlsExample);
do {
res_hw = XExample_Get_c_o(&HlsExample);
} while (XExample_Get_c_o(&HlsExample) == 0); // wait for valid data output
print("Detected HLS peripheral complete. Result received.\n\r");
}
Control Clock and Reset in AXI4-Lite Interfaces
By default, Vitis HLS uses the same
clock for the AXI4-Lite
interface and the synthesized design. Vitis HLS connects all registers in the AXI4-Lite interface to the clock used for
the synthesized logic (ap_clk
).
Optionally, you can use the INTERFACE directive clock
option to specify a
separate clock for each AXI4-Lite port. When connecting the clock
to the AXI4-Lite interface,
you must use the following protocols:
- AXI4-Lite
interface clock must be synchronous to the clock
used for the synthesized logic (
ap_clk
). That is, both clocks must be derived from the same master generator clock. - AXI4-Lite
interface clock frequency must be equal to or less
than the frequency of the clock used for the
synthesized logic (
ap_clk
).
If you use the clock
option with the INTERFACE directive, you only need to specify
the clock
option on one
function argument in each bundle. Vitis HLS implements all other function
arguments in the bundle with the same clock and reset. Vitis HLS names the
generated reset signal with the prefix ap_rst_
followed by the clock name. The
generated reset signal is active-Low independent of the config_rtl
command.
The following example shows how Vitis HLS groups function arguments
a
and b
into an AXI4-Lite port with a
clock named AXI_clk1
and an
associated reset port.
// Default AXI-Lite interface implemented with independent clock called AXI_clk1
#pragma HLS interface s_axilite port=a clock=AXI_clk1
#pragma HLS interface s_axilite port=b
In the following example, Vitis HLS groups function arguments
c
and d
into AXI4-Lite port CTRL1
with a separate clock
called AXI_clk2
and an
associated reset port.
// CTRL1 AXI-Lite bundle implemented with a separate clock (called AXI_clk2)
#pragma HLS interface s_axilite port=c bundle=CTRL1 clock=AXI_clk2
#pragma HLS interface s_axilite port=d bundle=CTRL1
Customizing AXI4-Lite Slave Interfaces in IP Integrator
When an HLS RTL design using an AXI4-Lite slave interface is incorporated into a design in Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click with the mouse button and select Customize Block.
The address width is by default configured to the minimum required size. Modify this to connect to blocks with address sizes less than 32-bit.
AXI4 Master Interface
m_axi
) interfaces allow kernels to read and write data in global
memory (DDR, HBM, PLRAM), Memory-mapped interfaces are a convenient way of sharing data across
different elements of the accelerated application, such as between the host and kernel, or
between kernels on the accelerator card. The main advantages for m_axi
interfaces are listed below: - The interface has a separate and independent read and write channels
- It supports burst-based accesses with potential performance of ~19 GB/s
- It provides support for outstanding transactions
In the Vitis Kernel flow the m_axi
interface is assigned by default to pointer and array
arguments. In this flow it supports the following default features:
- Pointer and array arguments are automatically mapped to the
m_axi
interface - The default mode of operation is offset=slave in the Vitis flow and should not be changed
- All pointer and array arguments are mapped to a single interface bundle to conserve device resources, and ports share read and write access across the time it is active
- The default alignment in the Vitis flow is set to 64 bytes
- The maximum read/write burst length is set to 16 by default
m_axi
interface is specified it has the
following default features:- The default operation mode is offset=off but you can change it as described in Offset and Modes of Operation
- Assigned pointer and array arguments are mapped to a single interface bundle to conserve device resources, and share the interface across the time it is active
- The default alignment in Vivado IP flow is set to 1 byte
- The maximum read/write burst length is set to 16 by default
In both the Vivado IP flow and Vitis kernel flow, the INTERFACE pragma or directive can be used to modify default values as needed.
You can use an AXI4 master interface on array or pointer/reference arguments, which Vitis HLS implements in one of the following modes:
- Individual data transfers
- Burst mode data transfers
With individual data transfers, Vitis HLS reads or writes a single element of data for each address. The following example shows a single read and single write operation. In this example, Vitis HLS generates an address on the AXI interface to read a single data value and an address to write a single data value. The interface transfers one data value per address.
void bus (int *d) {
static int acc = 0;
acc += *d;
*d = acc;
}
With burst mode transfers, Vitis HLS reads
or writes data using a single base address followed by multiple sequential data samples, which
makes this mode capable of higher data throughput. Burst mode of operation is possible when
you use the C memcpy
function or a pipelined for
loop. Refer to Optimizing Burst Transfers for more information.
memcpy
function is only supported for synthesis when used to
transfer data to or from a top-level function argument specified with an AXI4 master interface.The following example shows a copy of burst mode using the memcpy
function. The top-level function argument a
is specified as an AXI4
master interface.
void example(volatile int *a){
//Port a is assigned to an AXI4 master interface
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE s_axilite port=return
int i;
int buff[50];
//memcpy creates a burst access to memory
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
memcpy((int *)a,buff,50*sizeof(int));
}
When this example is synthesized, it results in the interface shown in the following figure.
The following example shows the same code as the preceding example but uses a
for
loop to copy the data out:
void example(volatile int *a){
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE s_axilite port=return
//Port a is assigned to an AXI4 master interface
int i;
int buff[50];
//memcpy creates a burst access to memory
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
for(i=0; i < 50; i++){
#pragma HLS PIPELINE
a[i] = buff[i];
}
}
When using a for
loop to implement burst
reads or writes, follow these requirements:
- Pipeline the loop
- Access addresses in increasing order
- Do not place accesses inside a conditional statement
- For nested loops, do not flatten loops, because this inhibits the burst operation
for
loop unless the ports are bundled in different AXI
ports. The following example shows how to perform two reads in burst mode using different AXI
interfaces.In the following example, Vitis HLS
implements the port reads as burst transfers. Port a
is
specified without using the bundle
option and is implemented
in the default AXI interface. Port b
is specified using a
named bundle and is implemented in a separate AXI interface called d2_port
.
void example(volatile int *a, int *b){
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE m_axi depth=50 port=b bundle=d2_port
int i;
int buff[50];
//copy data in
for(i=0; i < 50; i++){
#pragma HLS PIPELINE
buff[i] = a[i] + b[i];
}
...
}
Offset and Modes of Operation
The AXI4 Master interface has a
read/write address channel that can be used to read/write specific addresses. By default the
m_axi
interface starts all read and write operations
from the address 0x00000000
. For example, given the
following code, the design reads data from addresses 0x00000000
to 0x000000C7
(50 32-bit words,
gives 200 bytes), which represents 50 address values. The design then writes data back to
the same addresses.
#include <stdio.h>
#include <string.h>
void example(volatile int *a){
#pragma HLS INTERFACE m_axi port=a depth=50
int i;
int buff[50];
//memcpy creates a burst access to memory
//multiple calls of memcpy cannot be pipelined and will be scheduled sequentially
//memcpy requires a local buffer to store the results of the memory transaction
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
memcpy((int *)a,buff,50*sizeof(int));
}
The tool provides the capability to let the base address be configured statically in the Vivado IP for instance, or dynamically by the application or another IP during run time.
The m_axi
interface can be both a master
initiating transactions, and also a slave interface that receives the data and sends
acknowledgment. Depending on the mode specified with the offset
option of the INTERFACE pragma, an HLS IP can use multiple approaches to
set the base address.
config_interface -m_axi_offset
command provides a global setting
for the offset, that can be overridden for specific m_axi
interfaces using the INTERFACE pragma offset
option.- Master Mode: When acting as a
master interface with different
offset
options, them_axi
interface start address can be either hard-coded or set at run time.offset=off
: Vitis HLS sets a base address for them_axi
interface when the IP is used in the Vivado IP integrator tool. One disadvantage with this approach is that you cannot change the base address during run time. See Customizing AXI4 Master Interfaces in IP Integrator for setting the base address.The following example is synthesized withoffset=off
, the default for the Vivado IP flow.void example(volatile int *a){ #pragma HLS INTERFACE m_axi depth=50 port=a offset=off int i; int buff[50]; //memcpy creates a burst access to memory //multiple calls of memcpy cannot be pipelined and will be scheduled sequentially //memcpy requires a local buffer to store the results of the memory transaction memcpy(buff,(const int*)a,50*sizeof(int)); for(i=0; i < 50; i++){ buff[i] = buff[i] + 100; } memcpy((int *)a,buff,50*sizeof(int)); }
offset=direct
: Vitis HLS generates a port on the IP for setting the address. Note the addition of thea
port as shown in the figure below. This lets you update the address at run time, so you can have onem_axi
interface reading and writing different locations. For example, an HLS module that reads data from an ADC into RAM, and an HLS module that processes that data. Since you can change the address on the module, while one HLS module is processing the initial dataset the other module can be reading more data into different address.void example(volatile int *a){ #pragma HLS INTERFACE m_axi depth=50 port=a offset=direct ... }
- Slave Mode: The slave mode for an
interface is set with
offset=slave
. In this mode the IP will be controlled by the host application, or the micro-controller through thes_axilite
interface. This is the default for the Vitis kernel flow, and can also be used in the Vivado IP flow. Here is the flow of operation:- initially, the Host/CPU will start the IP or kernel using the
block-level control protocol which is mapped to the
s_axilite
adapter. - The host will send the scalars and address offsets for the
m_axi
interfaces through thes_axilite
adapter. - The
m_axi
adapter will read the start address from thes_axilite
adapter and store it in a queue. - The HLS design starts to read the data from the global memory.
- initially, the Host/CPU will start the IP or kernel using the
block-level control protocol which is mapped to the
As shown in the figure below, the HLS design will have both the s_axilite
adapter for the base address, and the m_axi
to perform read and write transfer to the global
memory.
The following are rules associated with the offset
option:
- Fully Specified Offset: When the user explicitly sets the offset value
the tool uses the specified settings. The user can also set different offset values for
different
m_axi
interfaces in the design, and the tool will use the specified offsets.#pragma HLS INTERFACE s_axilite port=return #pragma HLS INTERFACE m_axi bundle=BUS_A port=out_r offset=direct #pragma HLS INTERFACE m_axi bundle=BUS_B port=in1 offset=slave #pragma HLS INTERFACE m_axi bundle=BUS_C port=in2 offset=off
- No Offset Specified: If there are no offsets specified in the INTERFACE
pragma, the tool will defer to the setting specified by
config_interface -m_axi_offset
.Note: If the globalm_axi_offset
setting is specified, and the design has an s_axilite interface, the global setting is ignored andoffset=slave
is assumed.void top(int *a) { #pragma HLS interface m_axi port=a #pragma HLS interface s_axilite port=a }
Controlling the Address Offset in an AXI4 Interface
By default, the AXI4 master interface starts all read and
write operations from address 0x00000000
. For example, given the following code,
the design reads data from addresses 0x00000000
to 0x000000C7
(50 32-bit words, gives 200 bytes), which represents 50 address values. The design then writes
data back to the same addresses.
void example(volatile int *a){
#pragma HLS INTERFACE m_axi depth=50 port=a
#pragma HLS INTERFACE s_axilite port=return bundle=AXILiteS
int i;
int buff[50];
memcpy(buff,(const int*)a,50*sizeof(int));
for(i=0; i < 50; i++){
buff[i] = buff[i] + 100;
}
memcpy((int *)a,buff,50*sizeof(int));
}
To apply an address offset, use the -offset
option
with the INTERFACE directive, and specify one of the following options:
off
: Does not apply an offset address. This is the default.direct
: Adds a 32-bit port to the design for applying an address offset.slave
: Adds a 32-bit register inside the AXI4-Lite interface for applying an address offset.
In the final RTL, Vitis HLS applies the address offset directly to any read or write address generated by the AXI4 master interface. This allows the design to access any address location in the system.
If you use the slave
option in an AXI interface,
you must use an AXI4-Lite port on the design interface.
Xilinx recommends that you implement the AXI4-Lite interface using the following pragma:
#pragma HLS INTERFACE s_axilite port=return
In addition, if you use the slave
option and
you used several AXI4-Lite interfaces, you must ensure that the AXI master
port offset register is bundled into the correct AXI4-Lite interface.
In the following example, port a
is implemented
as an AXI master interface with an offset and AXI4-Lite interfaces called
AXI_Lite_1
and AXI_Lite_2
:
#pragma HLS INTERFACE m_axi port=a depth=50 offset=slave
#pragma HLS INTERFACE s_axilite port=return bundle=AXI_Lite_1
#pragma HLS INTERFACE s_axilite port=b bundle=AXI_Lite_2
The following INTERFACE directive is required to ensure that the offset register
for port a
is bundled into the AXI4-Lite interface called AXI_Lite_1
:
#pragma HLS INTERFACE s_axilite port=a bundle=AXI_Lite_1
M_AXI Bundles
Vitis HLS groups function arguments
with compatible options into a single m_axi
interface
adapter. Bundling ports into a single interface helps save FPGA resources by eliminating
AXI logic, but it can limit the performance of the kernel because all the memory
transfers have to go through a single interface. The m_axi
interface has independent READ and WRITE channels, so a single
interface can read and write simultaneously, though only at one location. Using multiple
bundles the bandwidth and throughput of the kernel can be increased by creating multiple
interfaces to connect to multiple memory banks.
In the following example all the pointer arguments are grouped into a
single m_axi
adapter using the interface option
bundle=BUS_A
, and adds a single s_axilite
adapter also named BUS_A
for the m_axi
offsets, the scalar
argument size
, and the function return.
extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
const unsigned int *in2, // Read-Only Vector 2
unsigned int *out_r, // Output Result
int size // Size in integer
) {
#pragma HLS INTERFACE m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE m_axi bundle=BUS_A port=in2
#pragma HLS INTERFACE s_axilite port=in1
#pragma HLS INTERFACE s_axilite port=in2
#pragma HLS INTERFACE s_axilite port=out_r
#pragma HLS INTERFACE s_axilite port=size
#pragma HLS INTERFACE s_axilite port=return
You can also choose to bundle function arguments into separate interface
adapters as shown in the following code. Here the argument in2
is grouped into a separate interface adapter with bundle=BUS_B
. This creates a new m_axi
interface adapter for port in2
.
extern "C" {
void vadd(const unsigned int *in1, // Read-Only Vector 1
const unsigned int *in2, // Read-Only Vector 2
unsigned int *out_r, // Output Result
int size // Size in integer
) {
#pragma HLS INTERFACE m_axi bundle=BUS_A port=out
#pragma HLS INTERFACE m_axi bundle=BUS_A port=in1
#pragma HLS INTERFACE m_axi bundle=BUS_B port=in2
#pragma HLS INTERFACE s_axilite port=in1
#pragma HLS INTERFACE s_axilite port=in2
#pragma HLS INTERFACE s_axilite port=out_r
#pragma HLS INTERFACE s_axilite port=size
#pragma HLS INTERFACE s_axilite port=return
The global configuration command config_interface
-m_axi_auto_max_ports false
will limit the number of interface bundles to
the minimum required. It will allow the tool to group compatible ports into a single
m_axi
interface. The default setting for this
command is disabled (false), but you can enable it to maximize bandwidth by creating a
separate m_axi
adapter for each port.
With m_axi_auto_max_ports
disabled,
the following are some rules for how the tool handles bundles under different
circumstances:
- Default Bundle Name: The tool groups all
interface ports with no bundle name into a single
m_axi
interface port using the tool default namebundle=<default>
, and names the RTL portm_axi_<default>
. The following pragmas:#pragma HLS INTERFACE m_axi port=a depth=50 #pragma HLS INTERFACE m_axi port=a depth=50 #pragma HLS INTERFACE m_axi port=a depth=50
Result in the following messages:
INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
- User-Specified Bundle Names: The tool groups all interface ports
with the same user-specified
bundle=<string>
into the samem_axi
interface port, and names the RTL port the value specified bym_axi_<string>
. Ports withoutbundle
assignments are grouped into the default bundle as described above. The following pragmas:#pragma HLS INTERFACE m_axi port=a depth=50 bundle=BUS_A #pragma HLS INTERFACE m_axi port=b depth=50 #pragma HLS INTERFACE m_axi port=c depth=50
Result in the following messages:
INFO: [RTGEN 206-500] Setting interface mode on port 'example/BUS_A' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'example/gmem' to 'm_axi'.
IMPORTANT: If you bundle incompatible interfaces Vitis HLS issues a message and ignores the bundle assignment.
Controlling AXI4 Burst Behavior
An optimal AXI4 interface is one in which the design never stalls while waiting to access the bus, and after bus access is granted, the bus never stalls while waiting for the design to read/write. To create the optimal AXI4 interface, the following options are provided in the INTERFACE pragma or directive to specify the behavior of the bursts and optimize the efficiency of the AXI4 interface. Refer to Optimizing Burst Transfers for more information on burst transfers.
Some of these options use internal storage to buffer data and may have an impact on area and resources:
latency
: Specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request a number of cycles (latency) before the read or write is expected. If this figure is too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be granted but the bus may stall waiting on the design to start the access.max_read_burst_length
: Specifies the maximum number of data values read during a burst transfer.num_read_outstanding
: Specifies how many read requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size:num_read_outstanding
*max_read_burst_length
*word_size
.max_write_burst_length
: Specifies the maximum number of data values written during a burst transfer.num_write_outstanding
: Specifies how many write requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, a FIFO of size:num_read_outstanding
*max_read_burst_length
*word_size
The following example can be used to help explain these options:
#pragma HLS interface m_axi port=input offset=slave bundle=gmem0
depth=1024*1024*16/(512/8)
latency=100
num_read_outstanding=32
num_write_outstanding=32
max_read_burst_length=16
max_write_burst_length=16
The interface is specified as having a latency of 100. Vitis HLS seeks to schedule the request for burst access 100 clock
cycles before the design is ready to access the AXI4 bus. To further
improve bus efficiency, the options num_write_outstanding
and num_read_outstanding
ensure the design contains enough buffering to store
up to 32 read and write accesses. This allows the design to continue processing until
the bus requests are serviced. Finally, the options max_read_burst_length
and max_write_burst_length
ensure the maximum burst size is 16 and that the
AXI4 interface does not hold the bus for longer than this.
These options allow the behavior of the AXI4 interface to be optimized for the system in which it will operate. The efficiency of the operation does depend on these values being set accurately.
Automatic Port Width Resizing
In the Vitis tool flow Vitis HLS provides the ability to automatically re-size
m_axi
interface ports to 512-bits to improve burst
access. However, automatic port width resizing only supports standard C data types and
does not support non-aggregate types such as ap_int
,
ap_uint
, struct
,
or array
.
Vitis HLS controls automatic port width resizing using the following two commands:
config_interface -m_axi_max_widen_bitwidth <N>
: Directs the tool to automatically widen bursts on M-AXI interfaces up to the specified bitwidth. The value of <N> must be a power-of-two between 0 and 1024.config_interface -m_axi_alignment_byte_size <N>
: Note that burst widening also requires strong alignment properties. Assume pointers that are mapped tom_axi
interfaces are at least aligned to the provided width in bytes (power of two). This can help automatic burst widening.
config_interface -m_axi_max_widen_bitwidth 512
config_interface -m_axi_alignment_byte_size 64
config_interface -m_axi_max_widen_bitwidth 0
config_interface -m_axi_alignment_byte_size 0
Automatic port width resizing will only re-size the port if a burst access can be seen by the tool. Therefore all the preconditions needed for bursting, as described in Optimizing Burst Transfers, are also needed for port resizing. These conditions include:
- Must be a monotonically increasing order of access (both in terms of the memory location being accessed as well as in time). You cannot access a memory location that is in between two previously accessed memory locations- aka no overlap.
- The access pattern from the global memory should be in sequential
order, and with the following additional requirements:
- The sequential accesses need to be on a non-vector type
- The start of the sequential accesses needs to be aligned to the widen word size
- The length of the sequential accesses needs to be divisible by the widen factor
The following code example is used in the calculations that follow:
vadd_pipeline:
for (int i = 0; i < iterations; i++) {
#pragma HLS LOOP_TRIPCOUNT min = c_len/c_n max = c_len/c_n
// Pipelining loops that access only one variable is the ideal way to
// increase the global memory bandwidth.
read_a:
for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
result[x] = a[i * N + x];
}
read_b:
for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
result[x] += b[i * N + x];
}
write_c:
for (int x = 0; x < N; ++x) {
#pragma HLS LOOP_TRIPCOUNT min = c_n max = c_n
#pragma HLS PIPELINE II = 1
c[i * N + x] = result[x];
}
}
}
}
The width of the automatic optimization for the code above is performed in three steps:
- First, the tool checks for the number of access patterns in the read_a loop. There is one access during one loop iteration, so the optimization determines the interface bit-width as 32= 32 *1 (bitwidth of the int variable * accesses).
- The tool tries to reach the default max specified by the
config_interface m_axi_max_widen_bitwidth 512
, using the following expression terms:length = (ceil((loop-bound of index inner loops) * (loop-bound of index - outer loops)) * #(of access-patterns))
- In the above code, the outer loop is an imperfect loop so
there will not be burst transfers on the outer-loop. Therefore the length
will only include the inner-loop. Therefore the formula will be shortened
to:
length = (ceil((loop-bound of index inner loops)) * #(of access-patterns))
or: length = ceil(128) *32 = 4096
- In the above code, the outer loop is an imperfect loop so
there will not be burst transfers on the outer-loop. Therefore the length
will only include the inner-loop. Therefore the formula will be shortened
to:
- Finally, is the calculated length a power of 2? If Yes, then the
length will be capped to the width specified by the
m_axi_max_widen_bitwidth
.
There are some pros and cons to using the automatic port width resizing which you should consider when using this feature. This feature improves the read latency from the DDR as the tool is reading a big vector, instead of the data type size. It also adds more resources as it needs to buffer the huge vector and shift the data accordingly to the data path size.
Creating an AXI4 Interface with 32-bit Address Capability
m_axi_addr64
interface configuration option as follows:- Select .
- In the Solution Settings dialog box, click the General category, and Edit the existing
config_interface
command, or click Add to add one. - In the Edit or Add dialog box, select config_interface, and disable m_axi_addr64.
Customizing AXI4 Master Interfaces in IP Integrator
When you incorporate an HLS RTL design that uses an AXI4 master interface into a design in the Vivado IP integrator, you can customize the block. From the block diagram in IP integrator, select the HLS block, right-click, and select Customize Block to customize any of the settings provided. A complete description of the AXI4 parameters is provided in this link in the Vivado Design Suite: AXI Reference Guide (UG1037).
The following figure shows the Re-Customize IP dialog box for the design shown below. This design includes an AXI4-Lite port.
AXI4-Stream Interfaces
An AXI4-Stream interface can be applied to any input argument and any array or pointer output argument. Because an AXI4-Stream interface transfers data in a sequential streaming manner, it cannot be used with arguments that are both read and written. In terms of data layout, the data type of the AXI4-Stream is aligned to the next byte. For example, if the size of the data type is 12 bits, it will be extended to 16 bits. Depending on whether a signed/unsigned interface is selected, the extended bits are either sign-extended or zero-extended. If the stream data type is a user-defined struct, the struct is aggregated and aligned to the size of the largest data element within the struct.
The following code examples show how the packed alignment depends on your struct type. If the struct contains only char type, as shown in the following example, then it will be packed with alignment of one byte. Total size of the struct will be two bytes:
struct A {
char foo;
char bar;
};
However, if the struct has elements with different data types, as shown
below, then it will be packed and aligned to the size of the largest data element, or four
bytes in this example. Element bar
will be padded with three
bytes resulting in a total size of eight bytes for the struct:
struct A {
int foo;
char bar;
};
By default, user-defined structs in streams are aggregated. However, you can disaggregate the struct and infer a stream for each element of the struct, using the following steps:
- Specify the DISAGGREGATE pragma or directive for the struct.
- Specify the AXI4-Stream INTERFACE pragma or directive for each element of the disaggregated struct.
The result will be one AXI4-Stream for every member of the struct in the interface.
How AXI4-Stream is Implemented
The AXI4-Stream interface is implemented as a struct type in Vitis HLS and has the following signature (defined in ap_axi_sdata.h):
template <typename T, size_t WUser, size_t WId, size_t WDest> struct axis { .. };
Where:
T
- Stream data type
WUser
- Width of the TUSER signal
WId
- Width of the TID signal
WDest
- Width of the TDest signal
When the stream data type (T
) are simple
integer types, there are two predefined types of AXI4-Stream implementations available:
- A signed implementation of the AXI4-Stream class (or more simply
ap_axis<Wdata, WUser, WId, WDest>
)hls::axis<ap_int<WData>, WUser, WId, WDest>
- An unsigned implementation of the AXI4-Stream class (or more simply
ap_axiu<WData, WUser, WId, WDest>
)hls::axis<ap_uint<WData>, WUser, WId, WDest>
The value specified for the WUser
, WId
, and WDest
template
parameters controls the usage of side-channel signals in the AXI4-Stream interface.
When the hls::axis
class is used, the
generated RTL will typically contain the actual data signal TDATA, and
the following additional signals: TVALID, TREADY,
TKEEP, TSTRB, TLAST,
TUSER, TID, and TDEST.
TVALID, TREADY, and TLAST are necessary control signals for the AXI4-Stream protocol. TKEEP, TSTRB, TUSER, TID, and TDEST signals are special signals that can be used to pass around additional bookkeeping data.
WUser
, WId
, and WDest
are set to 0, the generated RTL will not include the
TUSER, TID, and TDEST signals in
the interface.How AXI4-Stream Works
AXI4-Stream is a protocol designed for transporting arbitrary unidirectional data. In an AXI4-Stream, TDATA width of bits is transferred per clock cycle. The transfer is started once the producer sends the TVALID signal and the consumer responds by sending the TREADY signal (once it has consumed the initial TDATA). At this point, the producer will start sending TDATA and TLAST (TUSER if needed to carry additional user-defined sideband data). TLAST signals the last byte of the stream. So the consumer keeps consuming the incoming TDATA until TLAST is asserted.
AXI4-Stream has additional optional features like sending positional data with TKEEP and TSTRB ports which makes it possible to multiplex both the data position and data itself on the TDATA signal. Using the TID and TDIST signals, you can route streams as these fields roughly corresponds to stream identifier and stream destination identifier. Refer to Vivado Design Suite: AXI Reference Guide (UG1037) or the AMBA AXI4-Stream Protocol Specification (ARM IHI 0051A) for more information.
Registered AXI4-Stream Interfaces
As a default, AXI4-Stream interfaces are always implemented as registered interfaces to ensure that no combinational feedback paths are created when multiple HLS IP blocks with AXI4-Stream interfaces are integrated into a larger design. For AXI4-Stream interfaces, four types of register modes are provided to control how the interface registers are implemented:
- Forward
- Only the TDATA and TVALID signals are registered.
- Reverse
- Only the TREADY signal is registered.
- Both
- All signals (TDATA, TREADY, and TVALID) are registered. This is the default.
- Off
- None of the port signals are registered.
The AXI4-Stream side-channel signals are considered to be data signals and are registered whenever TDATA is registered.
There are two basic methods to use an AXI4-Stream in your design:
- Use an AXI4-Stream without side-channels.
- Use an AXI4-Stream with side-channels.
This second use model provides additional functionality, allowing the optional side-channels which are part of the AXI4-Stream standard, to be used directly in your C/C++ code.
AXI4-Stream Interfaces without Side-Channels
An AXI4-Stream is used without
side-channels when the function argument, ap_axis
or
ap_axiu
data type, does not contain any AXI4 side-channel elements (that is, when the WUser
, WId
, and WDest
parameters are set to 0). In the following example,
both interfaces are implemented using an AXI4-Stream:
#include "ap_axi_sdata.h"
#include "hls_stream.h"
typedef ap_axiu<32, 0, 0, 0> trans_pkt;
void example(hls::stream< trans_pkt > &A, hls::stream< trans_pkt > &B)
{
#pragma HLS INTERFACE axis port=A
#pragma HLS INTERFACE axis port=B
trans_pkt tmp;
A.read(tmp);
tmp.data += 5;
B.write(tmp);
}
After synthesis, both arguments are implemented with a data port (TDATA) and the standard AXI4-Stream protocol ports, TVALID, TREADY, TKEEP, TLAST, and TSTRB, as shown in the following figure.
hls::stream
object with a data type other than ap_axis
or ap_axiu
, the
tool will infer an AXI4-Stream interface without the
TLAST signal, or any of the side-channel signals.
This implementation of the AXI4-Stream interface
consumes fewer device resources, but offers no visibility into when the stream is
ending.Multiple variables can be combined into the same AXI4-Stream interface by using a struct, which is aggregated by Vitis HLS by default. Aggregating the elements of a struct into a single wide-vector, allows all elements of the struct to be implemented in the same AXI4-Stream interface.
AXI4-Stream Interfaces with Side-Channels
The following example shows how the side-channels can be used directly in the C/C++ code and implemented on the interface. The code uses #include "ap_axi_sdata.h" to provide an API to handle the side-channels of the AXI4-Stream interface. In the following example a signed 32-bit data type is used:
#include "ap_axi_sdata.h"
#include "ap_int.h"
#include "hls_stream.h"
#define DWIDTH 32
typedef ap_axiu<DWIDTH, 1, 1, 1> trans_pkt;
extern "C"{
void krnl_stream_vmult(hls::stream<trans_pkt> &A,
hls::stream<trans_pkt> &B) {
#pragma HLS INTERFACE axis port=A
#pragma HLS INTERFACE axis port=B
#pragma HLS INTERFACE s_axilite port=return bundle=control
bool eos = false;
vmult: do {
#pragma HLS PIPELINE II=1
trans_pkt t2 = A.read();
// Packet for Output
trans_pkt t_out;
// Reading data from input packet
ap_uint<DWIDTH> in2 = t2.data;
ap_uint<DWIDTH> tmpOut = in2 * 5;
// Setting data and configuration to output packet
t_out.data = tmpOut;
t_out.last = t2.last;
t_out.keep = -1; //Enabling all bytes
// Writing packet to output stream
B.write(t_out);
if (t2.last) {
eos = true;
}
} while (eos == false);
}
}
After synthesis, both the A and B arguments are implemented with data ports, the standard AXI4-Stream protocol ports, TVALID and TREADY and all of the optional ports described in the struct.
Port-Level I/O Protocols
By default input pointers and pass-by-value arguments are implemented as
simple wire ports with no associated handshaking signal. For example, in the sum_io
function discussed in Default Interfaces for Vivado IP Flow, the input ports are implemented without an I/O
protocol, only a data port. If the port has no I/O protocol, (by default or by design)
the input data must be held stable until it is read.
sum_io
function example, the output port is implemented with an associated
output valid port (sum_o_ap_vld
) which indicates when
the data on the port is valid and can be read. If there is no I/O protocol associated
with the output port, it is difficult to know when to read the data. Function arguments which are both read from and written to are split into
separate input and output ports. In the sum_io
function example, the sum
argument is implemented as
both an input port sum_i
, and an output port sum_o
with associated I/O protocol port sum_o_ap_vld
.
If the function has a return value, an output port ap_return is implemented to provide the return value. When the RTL design completes one transaction, this is equivalent to one execution of the C/C++ function, the block-level protocols indicate the function is complete with the ap_done signal. This also indicates the data on port ap_return is valid and can be read.
For the example code shown the timing behavior is shown in the following figure (assuming that the target technology and clock frequency allow a single addition per clock cycle).
- The design starts when ap_start is asserted High.
- The ap_idle signal is asserted Low to indicate the design is operating.
- The input data is read at any clock after the first cycle. Vitis HLS schedules when the reads occur. The ap_ready signal is asserted High when all inputs have been read.
- When output
sum
is calculated, the associated output handshake (sum_o_ap_vld
) indicates that the data is valid. - When the function completes, ap_done is asserted. This also indicates that the data on ap_return is valid.
- Port ap_idle is asserted High to indicate that the design is waiting start again.
Port-Level I/O: No Protocol
The ap_none
specifies that no I/O protocol
be added to the port. When this is specified the argument is implemented as a data port with
no other associated signals. The ap_none
mode is the default
for scalar inputs.
ap_none
The ap_none
port-level I/O protocol is
the simplest interface type and has no other signals associated with it. Neither the input
nor output data signals have associated control ports that indicate when data is read or
written. The only ports in the RTL design are those specified in the source code.
An ap_none
interface does not require
additional hardware overhead. However, the ap_none
interface
does requires the following:
- Producer blocks to do one of the following:
- Provide data to the input port at the correct time
- Hold data for the length of a transaction until the design completes
- Consumer blocks to read output ports at the correct time
ap_none
interface cannot be used with array arguments.Port-Level I/O: Wire Handshakes
Interface mode ap_hs includes a two-way handshake signal with the data port. The handshake is an industry standard valid and acknowledge handshake. Mode ap_vld is the same but only has a valid port and ap_ack only has a acknowledge port.
Mode ap_ovld is for use with in-out arguments. When the in-out is split into separate input and output ports, mode ap_none is applied to the input port and ap_vld applied to the output port. This is the default for pointer arguments that are both read and written.
The ap_hs mode can be applied to arrays that are read or written in sequential order. If Vitis HLS can determine the read or write accesses are not sequential, it will halt synthesis with an error. If the access order cannot be determined, Vitis HLS will issue a warning.
ap_hs (ap_ack, ap_vld, and ap_ovld)
The ap_hs port-level I/O protocol provides the greatest flexibility in the development process, allowing both bottom-up and top-down design flows. Two-way handshakes safely perform all intra-block communication, and manual intervention or assumptions are not required for correct operation. The ap_hs port-level I/O protocol provides the following signals:
- Data port
- Valid signal to indicate when the data signal is valid and can be read
- Acknowledge signal to indicate when the data has been read
The following figure shows how an ap_hs interface behaves for both an input and output port. In this example, the input port is named in, and the output port is named out.
For inputs, the following occurs:
- After start is applied, the block begins normal operation.
- If the design is ready for input data but the input valid is Low, the design stalls and waits for the
input valid to be asserted to indicate a new
input value is present.Note: The preceding figure shows this behavior. In this example, the design is ready to read data input in on clock cycle 4 and stalls waiting for the input valid before reading the data.
- When the input valid is asserted High, an output acknowledge is asserted High to indicate the data was read.
For outputs, the following occurs:
- After start is applied, the block begins normal operation.
- When an output port is written to, its associated output valid signal is simultaneously asserted to indicate valid data is present on the port.
- If the associated input acknowledge is Low, the design stalls and waits for the input acknowledge to be asserted.
- When the input acknowledge is asserted, indicating the data has been read, the output valid is deasserted on the next clock edge.
ap_ack
The ap_ack port-level I/O protocol is a subset of the ap_hs interface type. The ap_ack port-level I/O protocol provides the following signals:
- Data port
- Acknowledge signal to indicate when data is consumed
- For input arguments, the design generates an output acknowledge that is active-High in the cycle the input is read.
- For output arguments, Vitis HLS implements an input acknowledge port to confirm the output was read.
Note: After a write operation, the design stalls and waits until the input acknowledge is asserted High, which indicates the output was read by a consumer block. However, there is no associated output port to indicate when the data can be consumed.
ap_vld
The ap_vld is a subset of the ap_hs interface type. The ap_vld port-level I/O protocol provides the following signals:
- Data port
- Valid signal to indicate when the data signal is valid and can
be read
- For input arguments, the design reads the data port as soon as the valid is active. Even if the design is not ready to read new data, the design samples the data port and holds the data internally until needed.
- For output arguments, Vitis HLS implements an output valid port to indicate when the data on the output port is valid.
ap_ovld
The ap_ovld is a subset of the ap_hs interface type. The ap_ovld port-level I/O protocol provides the following signals:
- Data port
- Valid signal to indicate when the data signal is valid and can
be read
- For input arguments and the input half of inout arguments, the design defaults to type ap_none.
- For output arguments and the output half of inout arguments, the design implements type ap_vld.
Port-Level I/O: Memory Interface Protocol
Array arguments are implemented by default as an ap_memory interface. This is a standard block RAM interface with data, address, chip-enable, and write-enable ports.
An ap_memory interface can be implemented as a single-port of dual-port interface. If Vitis HLS can determine that using a dual-port interface will reduce the initial interval, it will automatically implement a dual-port interface. The BIND_STORAGE pragma or directive is used to specify the memory resource and if this directive is specified on the array with a single-port block RAM, a single-port interface will be implemented. Conversely, if a dual-port interface is specified using the BIND_STORAGE pragma and Vitis HLS determines this interface provides no benefit it will automatically implement a single-port interface.
If the array is accessed in a sequential manner an ap_fifo interface can be used. As with the ap_hs interface, Vitis HLS will halt if it determines the data access is not sequential, report a warning if it cannot determine if the access is sequential or issue no message if it determines the access is sequential. The ap_fifo interface can only be used for reading or writing, not both.
ap_memory, bram
The ap_memory and bram interface port-level I/O protocols are used to implement array arguments. This type of port-level I/O protocol can communicate with memory elements (for example, RAMs and ROMs) when the implementation requires random accesses to the memory address locations.
The ap_memory and bram interface port-level I/O protocols are identical. The only difference is the way Vivado IP integrator shows the blocks:
- The ap_memory interface appears as discrete ports.
- The bram interface appears as a single, grouped port. In IP integrator, you can use a single connection to create connections to all ports.
When using an ap_memory interface, specify the array targets using the BIND_STORAGE pragma. If no target is specified for the arrays, Vitis HLS determines whether to use a single or dual-port RAM interface.
The following figure shows an array named d specified as a single-port block RAM. The port names are based on the C/C++ function argument. For example, if the C/C++ argument is d, the chip-enable is d_ce, and the input data is d_q0 based on the output/q port of the BRAM.
After reset, the following occurs:
- After start is applied, the block begins normal operation.
- Reads are performed by applying
an address on the output address ports while asserting the output signal
d_ce
.Note: For a default block RAM, the design expects the input data d_q0 to be available in the next clock cycle. You can use the BIND_STORAGE pragma to indicate the RAM has a longer read latency. - Write operations are performed by asserting output ports d_ce and d_we while simultaneously applying the address and output data d_d0.
ap_fifo
When an output port is written to, its associated output valid signal interface is the most hardware-efficient approach when the design requires access to a memory element and the access is always performed in a sequential manner, that is, no random access is required. The ap_fifo port-level I/O protocol supports the following:
- Allows the port to be connected to a FIFO
- Enables complete, two-way
empty-full
communication - Works for arrays, pointers, and pass-by-reference argument types
volatile
qualifier when using this coding style, see Multi-Access Pointers on the Interface.In the following example, in1
is a pointer that
accesses the current address, then two addresses
above the current address, and finally one address
below.
void foo(int* in1, ...) {
int data1, data2, data3;
...
data1= *in1;
data2= *(in1+2);
data3= *(in1-1);
...
}
If in1
is
specified as an ap_fifo interface, Vitis HLS checks the accesses, determines the accesses are not in sequential
order, issues an error, and halts. To read from non-sequential address locations, use an
ap_memory or bram interface.
You cannot specify an ap_fifo interface on an argument that is both read from and written to. You can only specify an ap_fifo interface on an input or an output argument. A design with input argument in and output argument out specified as ap_fifo interfaces behaves as shown in the following figure.
For inputs, the following occurs:
- After ap_start is applied, the block begins normal operation.
- If the input port is ready to be read but the FIFO is empty as indicated by input port in_empty_n Low, the design stalls and waits for data to become available.
- When the FIFO contains data as indicated by input port in_empty_n High, an output acknowledge in_read is asserted High to indicate the data was read in this cycle.
For outputs, the following occurs:
- After start is applied, the block begins normal operation.
- If an output port is ready to be written to but the FIFO is full as indicated by out_full_n Low, the data is placed on the output port but the design stalls and waits for the space to become available in the FIFO.
- When space becomes available in the FIFO as indicated by out_full_n High, the output acknowledge signal out_write is asserted to indicate the output data is valid.
- If the top-level function or
the top-level loop is pipelined using the
-rewind
option, Vitis HLS creates an additional output port with the suffix _lwr. When the last write to the FIFO interface completes, the _lwr port goes active-High.
Block-Level I/O Protocols
The ap_ctrl_hs block-level I/O protocol is the default for the Vivado IP flow. Default Interfaces for Vivado IP Flow shows the resulting RTL ports and behavior when Vitis HLS implements ap_ctrl_hs on a function.
The ap_ctrl_chain control protocol is similar to ap_ctrl_hs but provides an additional input signal ap_continue to apply back pressure. Xilinx recommends using the ap_ctrl_chain block-level I/O protocol when chaining Vitis HLS blocks together and is the default for the Vitis Kernel flow. Refer to Supported Kernel Execution Models for more information on how XRT uses these control protocols.
ap_ctrl_hs
The following figure shows the behavior of the block-level handshake signals created by the ap_ctrl_hs I/O protocol for a non-pipelined design.
After reset, the following occurs:
- The block waits for ap_start to go High before it begins operation.
- Output ap_idle goes Low immediately to indicate the design is no longer idle.
- The ap_start signal must remain High
until ap_ready goes High. Once ap_ready goes High:
- If ap_start remains High the design will start the next transaction.
- If ap_start is taken Low, the design will complete the current transaction and halt operation.
- Data can be read on the input ports.
- Data can be written to the output ports.Note: The input and output ports can also specify a port-level I/O protocol that is independent of this block-level I/O protocol. For details, see Port-Level I/O Protocols.
- Output ap_done goes High when the
block completes operation.Note: If there is an ap_return port, the data on this port is valid when ap_done is High. Therefore, the ap_done signal also indicates when the data on output ap_return is valid.
- When the design is ready to accept new inputs, the ap_ready signal goes High. Following is additional information
about the ap_ready signal:
- The ap_ready signal is inactive until the design starts operation.
- In non-pipelined designs, the ap_ready signal is asserted at the same time as ap_done.
- In pipelined designs, the ap_ready signal might go High at any cycle after ap_start is sampled High. This depends on how the design is pipelined.
- If the ap_start signal is Low when ap_ready is High, the design executes until ap_done is High and then stops operation.
- If the ap_start signal is High when ap_ready is High, the next transaction starts immediately, and the design continues to operate.
- The ap_idle signal indicates when the
design is idle and not operating. Following is additional information about the ap_idle signal:
- If the ap_start signal is Low when ap_ready is High, the design stops operation, and the ap_idle signal goes High one cycle after ap_done.
- If the ap_start signal is High when ap_ready is High, the design continues to operate, and the ap_idle signal remains Low.
ap_ctrl_chain
The ap_ctrl_chain block-level I/O protocol is similar to the ap_ctrl_hs protocol but provides an additional input port named ap_continue. An active-High ap_continue signal indicates that the downstream block that consumes the output data is ready for new data inputs. If the downstream block is not able to consume new data inputs, the ap_continue signal is Low, which prevents upstream blocks from generating additional data.
The ap_ready port of the downstream block can directly drive the ap_continue port. Following is additional information about the ap_continue port:
- If the ap_continue signal is High when ap_done is High, the design continues operating. The behavior of the other block-level I/O signals is identical to those described in the ap_ctrl_hs block-level I/O protocol.
- If the ap_continue signal is Low when ap_done is High, the design stops operating, the ap_done signal remains High, and data remains valid on the ap_return port if the ap_return port is present.
In the following figure, the first transaction completes, and the second transaction starts immediately because ap_continue is High when ap_done is High. However, the design halts at the end of the second transaction until ap_continue is asserted High.
ap_ctrl_none
If you specify the ap_ctrl_none
block-level I/O protocol, the handshake signal ports (ap_start, ap_idle
, ap_ready, and ap_done) are not created. You
can use this protocol to create a block without control signals, as used in free-running
kernels.
@E [SIM-345] Cosim only supports the following 'ap_ctrl_none' designs: (1)
combinational designs; (2) pipelined design with task interval of 1; (3) designs with
array streaming or hls_stream ports.
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***
Managing Interfaces with SSI Technology Devices
Certain Xilinx devices use stacked silicon interconnect (SSI) technology. In these devices, the total available resources are divided over multiple super logic regions (SLRs). The connections between SLRs use super long line (SSL) routes. SSL routes incur delays costs that are typically greater than standard FPGA routing. To ensure designs operate at maximum performance, use the following guidelines:
- Register all signals that cross between SLRs at both the SLR output and SLR input.
- You do not need to register a signal if it enters or exits an SLR via an I/O buffer.
- Ensure that the logic created by Vitis HLS fits within a single SLR.
If the logic is contained within a single SLR device, Vitis HLS provides a -register_all_io
option to
the config_rtl
command. If the
option is enabled, all inputs and outputs are registered. If
disabled, none of the inputs or outputs are registered.