Optimization Directives

Directives, or the set_directive_* commands, can be specified as Tcl commands that are associated with a specific solution, or set of solutions. Allowing you to customize the synthesis results for the same source code across different solutions. This lets you preserve the original code while engaging in what-if analysis of the design.

Directives must be run in the interactive mode, vitis_hls -i, or can be run as a script using the -f option as described in vitis_hls Command.

Pragmas are directives that you can apply in the source code, rather than as a Tcl script, and so change the synthesis results for all implementations of your code. There are HLS pragmas for every set_directive command, so you can choose how you want to work with your Vitis HLS project. Refer to HLS Pragmas for information on the different pragmas.

Directives and pragmas are also available through the Vitis HLS IDE for assignment to specific elements of your source code, as described in Adding Pragmas and Directives.

TIP: When running the commands through the IDE, the Tcl commands are added to a script of your project written to solution/constraints/script.tcl.

set_directive_aggregate

Description

This directive collects the data fields of a struct into a single wide scalar. Any arrays declared within the struct and Vitis HLS performs a similar operation as set_directive_array_reshape, and completely partitions and reshapes the array into a wide scalar and packs it with other elements of the struct.
TIP: Arrays of structs are restructured as arrays of aggregated elements.

The bit alignment of the resulting new wide-word can be inferred from the declaration order of the struct elements. The first element takes the least significant sector of the word and so forth until all fields are mapped.

Note: The AGGREGATE optimization does not pack the structs, and cannot be used on structs that contain other structs.

Syntax

set_directive_aggregate [OPTIONS] <location> <variable>
  • <location> is the location (in the format function[/label]) which contains the variable which will be packed.
  • <variable> is the struct variable to be packed.

Options

-compact [bit | byte | none | auto]
Specifies the alignment of the aggregated struct. Alignment can be on the bit-level (packed), the byte-level (padded), none, or automatically determined by the tool which is the default behavior.

Examples

Aggregates struct pointer AB with three 8-bit fields (typedef struct {unsigned char R, G, B;} pixel) in function func, into a new 24-bit pointer, aligning data at the bit-level.

set_directive_aggregate func AB -compact bit

See Also

set_directive_alias

Description

Specify that two or more M_AXI pointer arguments point to the same underlying buffer in memory (DDR or HBM) and indicate any aliasing between the pointers by setting the distance or offset between them.

IMPORTANT: The ALIAS pragma applies to top-level function arguments mapped to M_AXI interfaces.

Vitis HLS considers different pointers to be independent channels and generally does not provide any dependency analysis. However, in cases where the host allocates a single buffer for multiple pointers, this relationship can be communicated through the ALIAS pragma or directive and dependency analysis can be maintained. The ALIAS pragma enables data dependence analysis in Vitis HLS by defining the distance between pointers in the buffer.

Requirements for ALIAS:

  • All ports assigned to an ALIAS pragma must be in assigned to M_AXI interfaces and assigned to different bundles, as shown in the example below
  • Each port can only be used in one ALIAS pragma or directive
  • The depth of all ports assigned to an ALIAS pragma must be the same
  • When offset is specified, the number of ports and number of offsets specified must be the same: one offset per port
  • The offset for the INTERFACE must be specified as slave or direct, offset=off is not supported

Syntax

set_directive_alias [OPTIONS] <location> <ports>
  • <location> is the location string in the format function[/label] that the ALIAS pragma applies to.
  • <ports> specifies the ports to alias.

Options

-distance <integer>
Specifies the difference between the pointer values passed to the ports in the list.
-offset <string>
Specifies the offset of the pointer passed to each port in the ports list with respect to the origin of the array.

Example

For the following function top:

void top(int *arr0, int *arr1, int *arr2, int *arr3, ...) {
  #pragma HLS interface M_AXI port=arr0 bundle=hbm0 depth=0x40000000
  #pragma HLS interface M_AXI port=arr1 bundle=hbm1 depth=0x40000000
  #pragma HLS interface M_AXI port=arr2 bundle=hbm2 depth=0x40000000
  #pragma HLS interface M_AXI port=arr3 bundle=hbm3 depth=0x40000000
The following command defines aliasing for the specified array pointers, and defines the distance between them:
set_directive_alias "top" arr0,arr1,arr2,arr3 -distance 10000000
Alternatively, the following command specifies the offset between pointers, to accomplish the same effect:
set_directive_alias top arr0,arr1,arr2,arr3 -offset 00000000,10000000,20000000,30000000

See Also

set_directive_allocation

Description

Specifies instance restrictions for resource allocation.

The ALLOCATION pragma or directive can limit the number of RTL instances and hardware resources used to implement specific functions, loops, or operations. For example, if the C/C++ source has four instances of a function foo_sub, the set_directive_allocation command can ensure that there is only one instance of foo_sub in the final RTL. All four instances are implemented using the same RTL block. This reduces resources used by the function, but negatively impacts performance by sharing those resources.

The operations in the C/C++ code, such as additions, multiplications, array reads, and writes, can also be limited by the set_directive_allocation command.

Syntax

set_directive_allocation [OPTIONS] <location> <instances>
  • <location> is the location string in the format function[/label].
  • <instances> is a function or operator.

    The function can be any function in the original C/C++ code that has not been either inlined by the set_directive_inline command or inlined automatically by Vitis HLS.

    For a complete list of operations that can be limited using the ALLOCATION pragma, refer to the config_op command.

Options

-limit <integer>

Sets a maximum limit on the number of instances (of the type defined by the -type option) to be used in the RTL design.

-type [function|operation]
The instance type can be function (default) or operation.

Examples

Given a design foo_top with multiple instances of function foo, limits the number of instances of foo in the RTL to 2.

set_directive_allocation -limit 2 -type function foo_top foo

Limits the number of multipliers used in the implementation of My_func to 1. This limit does not apply to any multipliers that might reside in sub-functions of My_func. To limit the multipliers used in the implementation of any sub-functions, specify an allocation directive on the sub-functions or inline the sub-function into function My_func.

set_directive_allocation -limit 1 -type operation My_func mul

See Also

set_directive_array_partition

Description

IMPORTANT: Array_Partition and Array_Reshape pragmas and directives are not supported for M_AXI Interfaces on the top-level function. Instead you can use the hls::vector data types as described in Vector Data Types.

Partitions an array into smaller arrays or individual elements.

This partitioning:
  • Results in RTL with multiple small memories or multiple registers instead of one large memory.
  • Effectively increases the amount of read and write ports for the storage.
  • Potentially improves the throughput of the design.
  • Requires more memory instances or registers.

Syntax

set_directive_array_partition [OPTIONS] <location> <array>
  • <location> is the location (in the format function[/label]) which contains the array variable.
  • <array> is the array variable to be partitioned.

Options

-dim <integer>
Note: Relevant for multi-dimensional arrays only.
Specifies which dimension of the array is to be partitioned.
  • If a value of 0 is used, all dimensions are partitioned with the specified options.
  • Any other value partitions only that dimension. For example, if a value 1 is used, only the first dimension is partitioned.
-factor <integer>
Note: Relevant for type block or cyclic partitioning only.
Specifies the number of smaller arrays that are to be created.
-type (block|cyclic|complete)
  • block partitioning creates smaller arrays from consecutive blocks of the original array. This effectively splits the array into N equal blocks where N is the integer defined by the -factor option.
  • cyclic partitioning creates smaller arrays by interleaving elements from the original array. For example, if -factor 3 is used:
    • Element 0 is assigned to the first new array.
    • Element 1 is assigned to the second new array.
    • Element 2 is assigned to the third new array.
    • Element 3 is assigned to the first new array again.
  • complete partitioning decomposes the array into individual elements. For a one-dimensional array, this corresponds to resolving a memory into individual registers. For multi-dimensional arrays, specify the partitioning of each dimension, or use -dim 0 to partition all dimensions.
The default is complete.

Example 1

Partitions array AB[13] in function func into four arrays. Because four is not an integer factor of 13:

  • Three arrays have three elements.
  • One array has four elements (AB[9:12]).
set_directive_array_partition -type block -factor 4 func AB

Partitions array AB[6][4] in function func into two arrays, each of dimension [6][2].

set_directive_array_partition -type block -factor 2 -dim 2 func AB

Partitions all dimensions of AB[4][10][6] in function func into individual elements.

set_directive_array_partition -type complete -dim 0 func AB

Example 2

Partitioned arrays can be addressed in your code by the new structure of the array, as shown in the following code example;

struct SS
{
  int x[N];
  int y[N];
};
  
int top(SS *a, int b[4][6], SS &c) {...}

set_directive_array_partition top b -type complete -dim 1
set_directive_interface -mode ap_memory top b[0]
set_directive_interface -mode ap_memory top b[1]
set_directive_interface -mode ap_memory top b[2]
set_directive_interface -mode ap_memory top b[3]

See Also

set_directive_array_reshape

Description

IMPORTANT: Array_Partition and Array_Reshape pragmas and directives are not supported for M_AXI Interfaces on the top-level function. Instead you can use the hls::vector data types as described in Vector Data Types.

Combines array partitioning with vertical array mapping to create a single new array with fewer elements but wider words.

The set_directive_array_reshape command has the following features:

  • Splits the array into multiple arrays (like set_directive_array_partition).
  • Automatically recombine the arrays vertically to create a new array with wider words.

Syntax

set_directive_array_reshape [OPTIONS] <location> <array>
  • <location> is the location (in the format function[/label]) that contains the array variable.
  • <array> is the array variable to be reshaped.

Options

-dim <integer>
Note: Relevant for multi-dimensional arrays only.
Specifies which dimension of the array is to be reshaped.
  • If the value is set to 0, all dimensions are partitioned with the specified options.
  • Any other value partitions only that dimension. The default is 1.
-factor <integer>
Note: Relevant for type block or cyclic reshaping only.
Specifies the number of temporary smaller arrays to be created.
-object
Note: Relevant for container arrays only.
Applies reshape on the objects within the container. If the option is specified, all dimensions of the objects will be reshaped, but all dimensions of the container will be kept.
-type (block|cyclic|complete)
  • block reshaping creates smaller arrays from consecutive blocks of the original array. This effectively splits the array into N equal blocks where N is the integer defined by the -factor option and then combines the N blocks into a single array with word-width*N. The default is complete.
  • cyclic reshaping creates smaller arrays by interleaving elements from the original array. For example, if -factor 3 is used, element 0 is assigned to the first new array, element 1 to the second new array, element 2 is assigned to the third new array, and then element 3 is assigned to the first new array again. The final array is a vertical concatenation (word concatenation, to create longer words) of the new arrays into a single array.
  • complete reshaping decomposes the array into temporary individual elements and then recombines them into an array with a wider word. For a one-dimension array this is equivalent to creating a very-wide register (if the original array was N elements of M bits, the result is a register with N*M bits). This is the default.

Example 1

Reshapes 8-bit array AB[17] in function func into a new 32-bit array with five elements.

Because four is not an integer factor of 17:

  • Index 17 of the array, AB[17], is in the lower eight bits of the reshaped fifth element.
  • The upper eight bits of the fifth element are unused.
set_directive_array_reshape -type block -factor 4 func AB

Partitions array AB[6][4] in function func, into a new array of dimension [6][2], in which dimension 2 is twice the width.

set_directive_array_reshape -type block -factor 2 -dim 2 func AB

Reshapes 8-bit array AB[4][2][2] in function func into a new single element array (a register), 4*2*2*8 (= 128)-bits wide.

set_directive_array_reshape -type complete -dim 0 func AB

Example 2

Partitioned arrays can be addressed in your code by the new structure of the array, as shown in the following code example;

struct SS
{
  int x[N];
  int y[N];
};
  
int top(SS *a, int b[4][6], SS &c) {...}

set_directive_array_reshape top b -type complete -dim 0
set_directive_interface -mode ap_memory top b[0]

See Also

set_directive_bind_op

Description

Vitis HLS implements the operations in the code using specific implementations. The set_directive_bind_op command specifies that for a specified variable, an operation (mul, add, sub) should be mapped to a specific device resource for implementation (impl) in the RTL. If this command is not specified, Vitis HLS automatically determines the resource to use.

For example, to indicate that a specific multiplier operation (mul) is implemented in the device fabric rather than a DSP, you can use the set_directive_bind_op command.

You can also specify the latency of the operation using the -latency option.

IMPORTANT: To use the -latency option, the operation must have an available multi-stage implementation. The HLS tool provides a multi-stage implementation for all basic arithmetic operations (add, subtract, multiply, and divide), and all floating-point operations.

Syntax

set_directive_bind_op [OPTIONS] <location> <variable>
  • <location> is the location (in the format function[/label]) which contains the variable.
  • <variable> is the variable to be assigned. The variable in this case is one that is assigned the result of the operation that is the target of this directive.

Options

-op <value>
Defines the operation to bind to a specific implementation resource. Supported functional operations include: mul, add, sub

Supported floating point operations include: fadd, fsub, fdiv, fexp, flog, fmul, frsqrt, frecip, fsqrt, dadd, dsub, ddiv, dexp, dlog, dmul, drsqrt, drecip, dsqrt, hadd, hsub, hdiv, hmul, and hsqrt

TIP: Floating-point operations include single precision (f), double-precision (d), and half-precision (h).
-impl <value>
Defines the implementation to use for the specified operation.
Supported implementations for functional operations include fabric and dsp.
Supported implementations for floating point operations include: fabric, meddsp, fulldsp, maxdsp, and primitivedsp.
Note: Primitive DSP is only available on Versal devices.
-latency <int>
Defines the default latency for the implementation of the operation. The valid latency varies according to the specified op and impl. The default is -1, which lets Vitis HLS choose the latency.
The tables below reflect the supported combinations of operation, implementation, and latency.
Table 1. Supported Combinations of Functional Operations, Implementation, and Latency
Operation Implementation Min Latency Max Latency
add fabric 0 4
add dsp 0 4
mul fabric 0 4
mul dsp 0 4
sub fabric 0 4
sub dsp 0 0
Table 2. Supported Combinations of Floating Point Operations, Implementation, and Latency
Operation Implementation Min Latency Max Latency
fadd fabric 0 13
fadd fulldsp 0 12
fadd primitivedsp 0 3
fsub fabric 0 13
fsub fulldsp 0 12
fsub primitivedsp 0 3
fdiv fabric 0 29
fexp fabric 0 24
fexp meddsp 0 21
fexp fulldsp 0 30
flog fabric 0 24
flog meddsp 0 23
flog fulldsp 0 29
fmul fabric 0 9
fmul meddsp 0 9
fmul fulldsp 0 9
fmul maxdsp 0 7
fmul primitivedsp 0 4
fsqrt fabric 0 29
frsqrt fabric 0 38
frsqrt fulldsp 0 33
frecip fabric 0 37
frecip fulldsp 0 30
dadd fabric 0 13
dadd fulldsp 0 15
dsub fabric 0 13
dsub fulldsp 0 15
ddiv fabric 0 58
dexp fabric 0 40
dexp meddsp 0 45
dexp fulldsp 0 57
dlog fabric 0 38
dlog meddsp 0 49
dlog fulldsp 0 65
dmul fabric 0 10
dmul meddsp 0 13
dmul fulldsp 0 13
dmul maxdsp 0 14
dsqrt fabric 0 58
drsqrt fulldsp 0 111
drecip fulldsp 0 36
hadd fabric 0 9
hadd meddsp 0 12
hadd fulldsp 0 12
hsub fabric 0 9
hsub meddsp 0 12
hsub fulldsp 0 12
hdiv fabric 0 16
hmul fabric 0 7
hmul fulldsp 0 7
hmul maxdsp 0 9
hsqrt fabric 0 16

Examples

In the following example, a two-stage pipelined multiplier using fabric logic is specified to implement the multiplication for variable <c> of the function foo.

int foo (int a, int b) {
int c, d;
c = a*b;
d = a*c;
return d;
}
And the set_directive command is as follows:
set_directive_bind_op -op mul -impl fabric -latency 2 "foo" c
TIP: The HLS tool selects the core to use for variable <d>.

See Also

set_directive_bind_storage

Description

The set_directive_bind_storage command assigns a variable (array, or function argument) in the code to a specific memory type (type) in the RTL. If the command is not specified, the Vitis HLS tool determines the memory type to assign. The HLS tool implements the memory using specified implementations (impl) in the hardware.

For example, you can use the set_directive_bind_storage command to specify which type of memory, and which implementation to use for an array variable. Also, this allows you to control whether the array is implemented as a single or a dual-port RAM. This usage is important for arrays on the top-level function interface, because the memory type associated with the array determines the number and type of ports needed in the RTL, as discussed in Arrays on the Interface.

You can use the -latency option to specify the latency of the implementation. For block RAMs on the interface, the -latency option allows you to model off-chip, non-standard SRAMs at the interface, for example supporting an SRAM with a latency of 2 or 3. For internal operations, the -latency option allows the operation to be implemented using more pipelined stages. These additional pipeline stages can help resolve timing issues during RTL synthesis.

IMPORTANT: To use the -latency option, the operation must have an available multi-stage implementation. The HLS tool provides a multi-stage implementation for all block RAMs.

For best results, Xilinx recommends that you use -std=c99 for C and -fno-builtin for C and C++. To specify the C compile options, such as -std=c99, use the Tcl command add_files with the -cflags option. Alternatively, select the Edit CFLAGs button in the Project Settings dialog box as described in Creating a New Vitis HLS Project.

Syntax

set_directive_bind_storage [OPTIONS] <location> <variable>
  • <location> is the location (in the format function[/label]) which contains the variable.
  • <variable> is the variable to be assigned.

Options

-type
Defines the type of memory to bind to the specified variable.
Supported types include: fifo, ram_1p, ram_1wnr, ram_2p, ram_s2p, ram_t2p, rom_1p, rom_2p, and rom_np.
Table 3. Storage Types
Type Description
FIFO A FIFO. Vitis HLS determines how to implement this in the RTL, unless the -impl option is specified.
RAM_1P A single-port RAM. Vitis HLS determines how to implement this in the RTL, unless the -impl option is specified.
RAM_1WNR A RAM with 1 write port and N read ports, using N banks internally.
RAM_2P A dual-port RAM that allows read operations on one port and both read and write operations on the other port.
RAM_S2P A dual-port RAM that allows read operations on one port and write operations on the other port.
RAM_T2P A true dual-port RAM with support for both read and write on both ports.
ROM_1P A single-port ROM. Vitis HLS determines how to implement this in the RTL, unless the -impl option is specified.
ROM_2P A dual-port ROM.
ROM_NP A multi-port ROM.
-impl <value>
Defines the implementation for the specified memory type. Supported implementations include: bram, bram_ecc, lutram, uram, uram_ecc, srl, memory, and auto as described below.
Table 4. Supported Implementation
Name Description
MEMORY Generic memory for FIFO, lets the Vivado tool choose the implementation.
URAM UltraRAM resource
URAM_ECC UltraRAM with ECC
SRL Shift Register Logic resource
LUTRAM Distributed RAM resource
BRAM Block RAM resource
BRAM_ECC Block RAM with ECC
AUTO Vitis HLS automatically determine the implementation of the variable.
Table 5. Supported Implementations by FIFO/RAM/ROM
Type Command/Pragma Scope Supported Implementations
FIFO bind_storage1 local BRAM, LUTRAM, URAM, MEMORY, SRL
FIFO config_storage global AUTOSRL, BRAM, LUTRAM, URAM, MEMORY, SRL
RAM* | ROM* bind_storage local AUTO BRAM, BRAM_ECC, LUTRAM, URAM, URAM_ECC
RAM* | ROM* config_storage2 global N/A
RAM_1P set_directive_interface s_axilite -storage_impl local

AUTO, BRAM, URAM

config_interface -m_axi_buffer_impl global

AUTO, BRAM, LUTRAM, URAM

  1. When no implementation is specified the directive uses AUTOSRL behavior as a default. However, this value cannot be specified.
  2. config_storage only supports FIFO types.
-latency <int>
Defines the default latency for the binding of the storage type to the implementation. The valid latency varies according to the specified type and impl. The default is -1, which lets Vitis HLS choose the latency.
Table 6. Supported Combinations of Memory Type, Implementation, and Latency
Type Implementation Min Latency Max Latency
FIFO BRAM 0 0
FIFO LUTRAM 0 0
FIFO MEMORY 0 0
FIFO SRL 0 0
FIFO URAM 0 0
RAM_1P AUTO 1 3
RAM_1P BRAM 1 3
RAM_1P LUTRAM 1 3
RAM_1P URAM 1 3
RAM_1WNR AUTO 1 3
RAM_1WNR BRAM 1 3
RAM_1WNR LUTRAM 1 3
RAM_1WNR URAM 1 3
RAM_2P AUTO 1 3
RAM_2P BRAM 1 3
RAM_2P LUTRAM 1 3
RAM_2P URAM 1 3
RAM_S2P BRAM 1 3
RAM_S2P BRAM_ECC 1 3
RAM_S2P LUTRAM 1 3
RAM_S2P URAM 1 3
RAM_S2P URAM_ECC 1 3
RAM_T2P BRAM 1 3
RAM_T2P URAM 1 3
ROM_1P AUTO 1 3
ROM_1P BRAM 1 3
ROM_1P LUTRAM 1 3
ROM_2P AUTO 1 3
ROM_2P BRAM 1 3
ROM_2P LUTRAM 1 3
ROM_NP BRAM 1 3
ROM_NP LUTRAM 1 3
IMPORTANT: Any combinations of memory type and implementation that are not listed in the prior table are not supported by set_directive_bind_storage.

Examples

In the following example, the coeffs[128] variable is an argument to the top-level function func_top. The directive specifies that coeffs uses a single port RAM implemented on a BRAM core from the library.

set_directive_bind_storage -impl bram "func_top" coeffs RAM_1P
TIP: The ports created in the RTL to access the values of coeffs are defined in the RAM_1P core.

See Also

set_directive_dataflow

Description

Specifies that dataflow optimization be performed on the functions or loops as described in Exploiting Task Level Parallelism: Dataflow Optimization, improving the concurrency of the RTL implementation.

All operations are performed sequentially in a C/C++ description. In the absence of any directives that limit resources (such as set_directive_allocation), Vitis HLS seeks to minimize latency and improve concurrency. Data dependencies can limit this. For example, functions or loops that access arrays must finish all read/write accesses to the arrays before they complete. This prevents the next function or loop that consumes the data from starting operation.

It is possible for the operations in a function or loop to start operation before the previous function or loop completes all its operations. When the DATAFLOW optimization is specified, the HLS tool analyzes the dataflow between sequential functions or loops and creates channels (based on ping-pong RAMs or FIFOs) that allow consumer functions or loops to start operation before the producer functions or loops have completed. This allows functions or loops to operate in parallel, which decreases latency and improves the throughput of the RTL.

TIP: The config_dataflow command specifies the default memory channel and FIFO depth used in DATAFLOW optimization.

If no initiation interval (number of cycles between the start of one function or loop and the next) is specified, Vitis HLS attempts to minimize the initiation interval and start operation as soon as data is available.

For the DATAFLOW optimization to work, the data must flow through the design from one task to the next. The following coding styles prevent the HLS tool from performing the DATAFLOW optimization. Refer to Dataflow Optimization Limitations for additional details.

  • Single-producer-consumer violations
  • Feedback between tasks
  • Conditional execution of tasks
  • Loops with multiple exit conditions
IMPORTANT: If any of these coding styles are present, the HLS tool issues a message and does not perform DATAFLOW optimization.

Finally, the DATAFLOW optimization has no hierarchical implementation. If a sub-function or loop contains additional tasks that might benefit from the DATAFLOW optimization, you must apply the optimization to the loop, the sub-function, or inline the sub-function.

Syntax

set_directive_dataflow <location> -disable_start_propagation
  • <location> is the location (in the format function[/label]) at which dataflow optimization is to be performed.
  • -disable_start_propagation disables the creation of a start FIFO used to propagate a start token to an internal process. Such FIFOs can sometimes be a bottleneck for performance.

Examples

Specifies dataflow optimization within function foo.

set_directive_dataflow foo

See Also

set_directive_dependence

Description

Vitis HLS detects dependencies within loops: dependencies within the same iteration of a loop are loop-independent dependencies, and dependencies between different iterations of a loop are loop-carried dependencies.

These dependencies are impacted when operations can be scheduled, especially during function and loop pipelining.

Loop-independent dependence
The same element is accessed in a single loop iteration.
for (i=0;i<N;i++) {
 A[i]=x;
 y=A[i];
}
Loop-carried dependence
The same element is accessed from a different loop iteration.
for (i=0;i<N;i++) {
 A[i]=A[i-1]*2;
}

Under certain circumstances, such as variable dependent array indexing or when an external requirement needs to be enforced (for example, two inputs are never the same index), the dependence analysis might be too conservative and fail to filter out false dependencies. The set_directive_dependence command allows you to explicitly define the dependencies and eliminate a false dependence as described in Managing Pipeline Dependencies.

IMPORTANT: Specifying a false dependency when the dependency is not false can result in incorrect hardware. Ensure dependencies are correct (true or false) before specifying them.

Syntax

set_directive_dependence -dependent <arg> [OPTIONS] <location>
-dependent (true | false)
This argument should be specified to indicate whether a dependence is true and needs to be enforced, or is false and should be removed. However, when not specified, the tool will return a warning that the value was not specified and will assume a value of false.
<location>
The location in the code, specified as function[/label], where the dependence is defined.

Options

-class (array | pointer)
Specifies a class of variables in which the dependence needs clarification. This is mutually exclusive with the -variable option.
-dependent (true | false)
Specify if a dependence needs to be enforced (true) or removed (false).
-direction (RAW | WAR | WAW)
Note: Relevant only for loop-carried dependencies.
Specifies the direction for a dependence:
RAW (Read-After-Write - true dependence)
The write instruction uses a value used by the read instruction.
WAR (Write-After-Read - anti dependence)
The read instruction gets a value that is overwritten by the write instruction.
WAW (Write-After-Write - output dependence)
Two write instructions write to the same location, in a certain order.
-distance <integer>
Note: Relevant only for loop-carried dependencies where -dependent is set to true.
Specifies the inter-iteration distance for array access.
-type (intra | inter)
Specifies whether the dependence is:
  • Within the same loop iteration (intra), or
  • Between different loop iterations (inter) (default).
-variable <variable>
Defines a specific variable to apply the dependence directive. Mutually exclusive with the -class option.
IMPORTANT: You cannot specify a dependence for function arguments that are bundled with other arguments in an m_axi interface. This is the default configuration for m_axi interfaces on the function. You also cannot specify a dependence for an element of a struct, unless the struct has been disaggregated.

Examples

Removes the dependence between Var1 in the same iterations of loop_1 in function func.

set_directive_dependence -variable Var1 -type intra \
-dependent false func/loop_1

The dependence on all arrays in loop_2 of function func informs Vitis HLS that all reads must happen after writes in the same loop iteration.

set_directive_dependence -class array -type intra \
-dependent true -direction RAW func/loop_2

See Also

set_directive_disaggregate

Description

The set_directive_disaggregate command lets you deconstruct a struct variable into its individual elements. The number and type of elements created are determined by the contents of the struct itself.

IMPORTANT: Structs used as arguments to the top-level function are aggregated by default, but can be disaggregated with this directive or pragma. Refer to AXI4-Stream Interfaces for important information about disaggregating structs associated with streams.

Syntax

set_directive_disaggregate <location> <variable>
  • <location> is the location (in the format function[/label]) where the variable to disaggregate is found.
  • <variable> specifies the struct variable name.

Options

This command has no options.

Example 1

The following example shows the struct variable a in function top will be disaggregated:

set_directive_disaggregate top a

Example 2

Disaggregated structs can be addressed in your code by the using standard C/C++ coding style as shown below. Notice the different methods for accessing the pointer element (a) versus the reference element (c);

struct SS
{
  int x[N];
  int y[N];
};
  
int top(SS *a, int b[4][6], SS &c) {

set_directive_disaggregate top a
set_directive_interface -mode s_axilite top a->x
set_directive_interface -mode s_axilite top a->y

set_directive_disaggregate top c
set_directive_interface -mode ap_memory top c.x
set_directive_interface -mode ap_memory top c.y

See Also

set_directive_expression_balance

Description

Sometimes C/C++ code is written with a sequence of operations, resulting in a long chain of operations in RTL. With a small clock period, this can increase the latency in the design. By default, the Vitis HLS tool rearranges the operations using associative and commutative properties. As described in Optimizing Logic Expressions, this rearrangement creates a balanced tree that can shorten the chain, potentially reducing latency in the design at the cost of extra hardware.

Expression balancing rearranges operators to construct a balanced tree and reduce latency.

  • For integer operations expression balancing is on by default but may be disabled.
  • For floating-point operations, expression balancing is off by default but may be enabled.

The set_directive_expression_balance command allows this expression balancing to be turned off, or on, within a specified scope.

Syntax

set_directive_expression_balance [OPTIONS] <location>
  • <location> is the location (in the format function[/label]) where expression balancing should be disabled, or enabled.

Options

-off
Turns off expression balancing at the specified location.
Specifying the set_directive_expression_balance command enables expression balancing in the specified scope. Adding the -off option disables it.

Examples

Disables expression balancing within function My_Func.

set_directive_expression_balance -off My_Func

Explicitly enables expression balancing in function My_Func2.

set_directive_expression_balance My_Func2

See Also

set_directive_function_instantiate

Description

By default:

  • Functions remain as separate hierarchy blocks in the RTL.
  • All instances of a function, at the same level of hierarchy, uses the same RTL implementation (block).

The set_directive_function_instantiate command is used to create a unique RTL implementation for each instance of a function, allowing each instance to be optimized.

By default, the following code results in a single RTL implementation of function foo_sub for all three instances.

char foo_sub(char inval, char incr)
{
  return inval + incr;
}
void foo(char inval1, char inval2, char inval3,
  char *outval1, char *outval2, char * outval3)
{
  *outval1 = foo_sub(inval1, 1);
  *outval2 = foo_sub(inval2, 2);
  *outval3 = foo_sub(inval3, 3);
}

Using the directive as shown in the example section below results in three versions of function foo_sub, each independently optimized for variable incr.

Syntax

set_directive_function_instantiate <location> <variable>
  • <location> is the location (in the format function[/label]) where the instances of a function are to be made unique.
  • <variable> <string> specifies which function argument <string> is to be specified as constant.

Options

This command has no options.

Examples

For the example code shown above, the following Tcl (or pragma placed in function foo_sub) allows each instance of function foo_sub to be independently optimized with respect to input incr.

set_directive_function_instantiate foo_sub incr

See Also

set_directive_inline

Description

Removes a function as a separate entity in the RTL hierarchy. After inlining, the function is dissolved into the calling function, and no longer appears as a separate level of hierarchy.

IMPORTANT: Inlining a child function also dissolves any pragmas or directives applied to that function. In Vitis HLS, any directives applied in the child context are ignored.

In some cases, inlining a function allows operations within the function to be shared and optimized more effectively with the calling function. However, an inlined function cannot be shared or reused, so if the parent function calls the inlined function multiple times, this can increase the area and resource utilization.

By default, inlining is only performed on the next level of function hierarchy.

Syntax

set_directive_inline [OPTIONS] <location>
  • <location> is the location (in the format function[/label]) where inlining is to be performed.

Options

-off
By default, Vitis HLS performs inlining of smaller functions in the code. Using the -off option disables inlining for the specified function.
-recursive
By default, only one level of function inlining is performed. The functions within the specified function are not inlined. The -recursive option inlines all functions recursively within the specified function hierarchy.

Examples

The following example inlines function func_sub1, but no sub-functions called by func_sub1.

set_directive_inline func_sub1
This example inlines function func_sub1, recursively down the hierarchy, except function func_sub2:
set_directive_inline -recursive func_sub1
set_directive_inline -off func_sub2

See Also

set_directive_interface

Description

Specifies how RTL ports are created from the function description during interface synthesis. For more information, see Defining Interfaces. The ports in the RTL implementation are derived from:

  • Any function-level protocol that is specified.
  • Function arguments and return.
  • Global variables (accessed by the top-level function and defined outside its scope).

Function-level handshakes:

  • Control when the function starts operation.
  • Indicate when function operation:
    • Ends
    • Is idle
    • Is ready for new inputs

The implementation of a function-level protocol:

  • Is controlled by modes ap_ctrl_none, ap_ctrl_hs, or ap_ctrl_chain.
  • Requires only the top-level function name.

Each function argument can be specified to have its own I/O protocol (such as valid handshake or acknowledge handshake).

If a global variable is accessed, but all read and write operations are local to the design, the resource is created in the design. There is no need for an I/O port in the RTL. If however, the global variable is expected to be an external source or destination, specify its interface in a similar manner as standard function arguments. See the examples below.

TIP: The Vitis HLS tool automatically determines the I/O protocol used by any sub-functions. You cannot specify the INTERFACE pragma or directive for sub-functions.

Syntax

set_directive_interface [OPTIONS] <location> <port>
  • <location> is the location (in the format function[/label]) where the function interface or registered output is to be specified.
  • <port> is the parameter (function argument or global variable) for which the interface has to be synthesized. This is not required when modes ap_ctrl_none or ap_ctrl_hs are used.

Options

-bundle <string>
By default, the HLS tool groups or bundles function arguments with compatible options into interface ports in the RTL code. All AXI4-Lite (s_axilite) interfaces are bundled into a single AXI4-Lite port whenever possible. Similarly, all function arguments specified as an AXI4 (m_axi) interface are bundled into a single AXI4 port by default.
All interface ports with compatible options, such as mode, offset, and bundle, are grouped into a single interface port. The port name is derived automatically from a combination of the mode and bundle, or is named as specified by -name.
IMPORTANT: When specifying the bundle name you should use all lower-case characters.
-clock <string>
By default, the AXI4-Lite interface clock is the same clock as the system clock. This option is used to set specify a separate clock for an AXI4-Lite interface. If the -bundle option is used to group multiple top-level function arguments into a single AXI4-Lite interface, the clock option need only be specified on one of bundle members.
-depth <int>
Specifies the maximum number of samples for the test bench to process. This setting indicates the maximum size of the FIFO needed in the verification adapter that Vitis HLS creates for RTL co-simulation. This option is required for pointer interfaces using ap_fifo mode.
-latency <value>
This option can be used on ap_memory and M_AXI interfaces.
  • In an ap_memory interface, the interface option specifies the read latency of the RAM resource driving the interface. By default, a read operation of 1 clock cycle is used. This option allows an external RAM with more than 1 clock cycle of read latency to be modeled.
  • In an M_AXI interface, this option specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request <value> number of cycles (latency) before the read or write is expected. If this figure it too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be idle waiting on the design to start the access.
-max_read_burst_length <int>
For use with the M_AXI interface, this option specifies the maximum number of data values read during a burst transfer.
-max_widen_bitwidth <int>
Specifies the maximum bit width available for the interface when automatically widening the interface.
-max_write_burst_length <int>
For use with the M_AXI interface, this option specifies the maximum number of data values written during a burst transfer.
-mode (ap_none|ap_vld|ap_ack|ap_hs|ap_ovld|ap_fifo|ap_memory|bram|axis|s_axilite|m_axi|ap_ctrl_none|ap_ctrl_hs|ap_ctrl_chain|ap_stable)
Following is a summary of how Vitis HLS implements the -mode options.
  • ap_none: No protocol. The interface is a data port.
  • ap_vld: Implements the data port with an associated valid port to indicate when the data is valid for reading or writing.
  • ap_ack: Implements the data port with an associated acknowledge port to acknowledge that the data was read or written.
  • ap_hs: Implements the data port with associated valid and acknowledge ports to provide a two-way handshake to indicate when the data is valid for reading and writing and to acknowledge that the data was read or written.
  • ap_ovld: Implements the output data port with an associated valid port to indicate when the data is valid for reading or writing.
    Note: Vitis HLS implements the input argument or the input half of any read/write arguments with mode ap_none.
  • ap_fifo: Implements the port with a standard FIFO interface using data input and output ports with associated active-Low FIFO empty and full ports.
    Note: You can only use this interface on read arguments or write arguments. The ap_fifo mode does not support bidirectional read/write arguments.
  • ap_memory: Implements array arguments as a standard RAM interface. If you use the RTL design in Vivado IP integrator, the memory interface appears as discrete ports.
  • bram: Implements array arguments as a standard RAM interface. If you use the RTL design in Vitis IP integrator, the memory interface appears as a single port.
  • axis: Implements all ports as an AXI4-Stream interface.
  • s_axilite: Implements all ports as an AXI4-Lite interface. Vitis HLS produces an associated set of C driver files during the Export RTL process.
  • m_axi: Implements all ports as an AXI4 interface. You can use the config_interface command to specify either 32-bit (default) or 64-bit address ports and to control any address offset.
  • ap_ctrl_none: No block-level I/O protocol.
    Note: Using the ap_ctrl_none mode might prevent the design from being verified using the C/C++/RTL co-simulation feature.
  • ap_ctrl_hs: Implements a set of block-level control ports to start the design operation and to indicate when the design is idle, done, and ready for new input data.
    Note: The ap_ctrl_hs mode is the default block-level I/O protocol.
  • ap_ctrl_chain: Implements a set of block-level control ports to start the design operation, continue operation, and indicate when the design is idle, done, and ready for new input data.
  • ap_stable: No protocol. The interface is a data port. Vitis HLS assumes the data port is always stable after reset, which allows internal optimizations to remove unnecessary registers.
-name <string>
Specifies a name for the port which will be used in the generated RTL.
-num_read_outstanding <int>
For use with the M_AXI interface, this option specifies how many read requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, and a FIFO of size:
num_read_outstanding*max_read_burst_length*word_size
-num_write_outstanding <int>
For use with the M_AXI interface, this option specifies how many write requests can be made to the AXI4 bus, without a response, before the design stalls. This implies internal storage in the design, and a FIFO of size:
num_read_outstanding*max_read_burst_length*word_size
-offset <string>
Controls the address offset in AXI4-Lite (s_axilite) and AXI4 memory mapped (m_axi) interfaces for the specified port.
  • In an s_axilite interface, <string> specifies the address in the register map.
  • In an m_axi interface this option overrides the global option specified by the config_interface -m_axi_offset option, and <string> is specified as:
    • off: Do not generate an offset port.
    • direct: Generate a scalar input offset port.
    • slave: Generate an offset port and automatically map it to an AXI4-Lite slave interface. This is the default offset.
-register
Registers the signal and any associated protocol signals and instructs the signals to persist until at least the last cycle of the function execution. This option applies to the following scalar interfaces for the top-level function:
  • s_axilite
  • ap_fifo
  • ap_none
  • ap_stable
  • ap_hs
  • ap_ack
  • ap_vld
  • ap_ovld
-register_mode (both|forward|reverse|off)
This option applies to AXI4-Stream interfaces, and specifies if registers are placed on the forward path (TDATA and TVALID), the reverse path (TREADY), on both paths (TDATA, TVALID, and TREADY), or if none of the ports signals are to be registered (off). The default is both. AXI4-Stream side-channel signals are considered to be data signals and are registered whenever the TDATA is registered.
-storage_impl=<impl>
For use with s_axilite only. This options defines a storage implementation to assign to the interface.
Supported implementation values include auto, bram, and uram. The default is auto.
TIP: uram is a synchronous memory with only a single clock for two ports. Therefore uram cannot be specified for an s_axilite adapter with a second clock.
-storage_type=<type>
For use with ap_memory and bram interfaces only. This options defines a storage type (for example, RAM_T2P) to assign to the variable.
Supported types include: ram_1p, ram_1wnr, ram_2p, ram_s2p, ram_t2p, rom_1p, rom_2p, and rom_np.
TIP: This can also be specified using the BIND_STORAGE pragma or directive for the variable for objects that are not defined on the interface.

Examples

Turns off function-level handshakes for function func.

set_directive_interface -mode ap_ctrl_none func

Argument InData in function func is specified to have a ap_vld interface and the input should be registered.

set_directive_interface -mode ap_vld -register func InData

Exposes global variable lookup_table used in function func as a port on the RTL design, with an ap_memory interface.

set_directive_interface -mode ap_memory func look_table

See Also

set_directive_latency

Description

Specifies a maximum or minimum latency value, or both, on a function, loop, or region.

Vitis HLS always aims for minimum latency. The behavior of the tool when minimum and maximum latency values are specified is as follows:

  • Latency is less than the minimum: If Vitis HLS can achieve less than the minimum specified latency, it extends the latency to the specified value, potentially enabling increased sharing.
  • Latency is greater than the minimum: The constraint is satisfied. No further optimizations are performed.
  • Latency is less than the maximum: The constraint is satisfied. No further optimizations are performed.
  • Latency is greater than the maximum: If Vitis HLS cannot schedule within the maximum limit, it increases effort to achieve the specified constraint. If it still fails to meet the maximum latency, it issues a warning. Vitis HLS then produces a design with the smallest achievable latency.
TIP: You can also use the LATENCY pragma or directive to limit the efforts of the tool to find an optimum solution. Specifying latency constraints for scopes within the code: loops, functions, or regions, reduces the possible solutions within that scope and can improve tool runtime. Refer to Improving Synthesis Runtime and Capacity for more information.

Syntax

set_directive_latency [OPTIONS] <location>
  • <location> is the location (function, loop or region) (in the format function[/label]) to be constrained.

Options

-max <integer>
Specifies the maximum latency.
-min <integer>
Specifies the minimum latency.

Examples

Function foo is specified to have a minimum latency of 4 and a maximum latency of 8.

set_directive_latency -min=4 -max=8 foo

In function foo, loop loop_row is specified to have a maximum latency of 12. Place the pragma in the loop body.

set_directive_latency -max=12 foo/loop_row

See Also

set_directive_loop_flatten

Description

Flattens nested loops into a single loop hierarchy.

In the RTL implementation, it costs a clock cycle to move between loops in the loop hierarchy. Flattening nested loops allows them to be optimized as a single loop. This saves clock cycles, potentially allowing for greater optimization of the loop body logic.

Note: Apply this directive to the inner-most loop in the loop hierarchy. Only perfect and semi-perfect loops can be flattened in this manner.
  • Perfect loop nests
    • Only the innermost loop has loop body content.
    • There is no logic specified between the loop statements.
    • All loop bounds are constant.
  • Semi-perfect loop nests
    • Only the innermost loop has loop body content.
    • There is no logic specified between the loop statements.
    • The outermost loop bound can be a variable.
  • Imperfect loop nests

    When the inner loop has variables bounds (or the loop body is not exclusively inside the inner loop), try to restructure the code, or unroll the loops in the loop body to create a perfect loop nest.

Syntax

set_directive_loop_flatten [OPTIONS] <location>
  • <location> is the location (inner-most loop), in the format function[/label].

Options

-off
Option to prevent loop flattening from taking place, and can prevent some loops from being flattened while all others in the specified location are flattened.
IMPORTANT: The presence of the LOOP_FLATTEN pragma or directive enables the optimization. The addition of -off disables it.

Examples

Flattens loop_1 in function foo and all (perfect or semi-perfect) loops above it in the loop hierarchy, into a single loop. Place the pragma in the body of loop_1.

set_directive_loop_flatten foo/loop_1
#pragma HLS loop_flatten

Prevents loop flattening in loop_2 of function foo. Place the pragma in the body of loop_2.

set_directive_loop_flatten -off foo/loop_2
#pragma HLS loop_flatten off

See Also

set_directive_loop_merge

Description

Merges all loops into a single loop. Merging loops:

  • Reduces the number of clock cycles required in the RTL to transition between the loop-body implementations.
  • Allows the loops be implemented in parallel (if possible).

The rules for loop merging are:

  • If the loop bounds are variables, they must have the same value (number of iterations).
  • If loops bounds are constants, the maximum constant value is used as the bound of the merged loop.
  • Loops with both variable bound and constant bound cannot be merged.
  • The code between loops to be merged cannot have side effects. Multiple execution of this code should generate the same results.
    • a=b is allowed
    • a=a+1 is not allowed.
  • Loops cannot be merged when they contain FIFO reads. Merging changes the order of the reads. Reads from a FIFO or FIFO interface must always be in sequence.

Syntax

set_directive_loop_merge <location>
  • <location> is the location (in the format function[/label]) at which the loops reside.

Options

-force
Forces loops to be merged even when Vitis HLS issues a warning. You must assure that the merged loop will function correctly.

Examples

Merges all consecutive loops in function foo into a single loop.

set_directive_loop_merge foo

All loops inside loop_2 of function foo (but not loop_2 itself) are merged by using the -force option.

set_directive_loop_merge -force foo/loop_2

See Also

set_directive_loop_tripcount

Description

The loop tripcount is the total number of iterations performed by a loop. Vitis HLS reports the total latency of each loop (the number of cycles to execute all iterations of the loop). This loop latency is therefore a function of the tripcount (number of loop iterations).

IMPORTANT: The LOOP_TRIPCOUNT pragma or directive is for analysis only, and does not impact the results of synthesis.

The tripcount can be a constant value. It might depend on the value of variables used in the loop expression (for example, x<y) or control statements used inside the loop.

Vitis HLS cannot determine the tripcount in some cases. These cases include, for example, those in which the variables used to determine the tripcount are:

  • Input arguments, or
  • Variables calculated by dynamic operation

In the following example, the maximum iteration of the for-loop is determined by the value of input num_samples. The value of num_samples is not defined in the C function, but comes into the function from the outside.

void foo (num_samples, ...) {
   int i;
   ...
   loop_1: for(i=0;i< num_samples;i++) {
     ...
     result = a + b;
   }
}

In cases where the loop latency is unknown or cannot be calculated, set_directive_loop_tripcount allows you to specify minimum, maximum, and average iterations for a loop. This lets the tool analyze how the loop latency contributes to the total design latency in the reports and helps you determine appropriate optimizations for the design.

TIP: If a C assert macro is used to limit the size of a loop variable, Vitis HLS can use it to both define loop limits for reporting and create hardware that is exactly sized to these limits.

Syntax

set_directive_loop_tripcount [OPTIONS] <location>
  • <location> is the location of the loop (in the format function[/label]) at which the tripcount is specified.

Options

-avg <integer>
Specifies the average number of iterations.
-max <integer>
Specifies the maximum number of iterations.
-min <integer>
Specifies the minimum number of iterations.

Examples

loop_1 in function foo is specified to have a minimum tripcount of 12, and a maximum tripcount of 16:
set_directive_loop_tripcount -min 12 -max 16 -avg 14 foo/loop_1

See Also

set_directive_occurrence

Description

When pipelining functions or loops, the OCCURRENCE directive specifies that the code in a location is executed at a lower rate than the surrounding function or loop. This allows the code that is executed at the lower rate to be pipelined at a slower rate, and potentially shared within the top-level pipeline. For example:

  • A loop iterates N times.
  • Part of the loop is protected by a conditional statement and only executes M times, where N is an integer multiple of M.
  • The code protected by the conditional is said to have an occurrence of N/M.

Identifying a region with an OCCURRENCE rate allows the functions and loops in this region to be pipelined with an initiation interval that is slower than the enclosing function or loop.

Syntax

set_directive_occurrence [OPTIONS] <location>
  • <location> specifies the location with a slower rate of execution.

Options

-cycle <int>
Specifies the occurrence N/M where:
  • N is the number of times the enclosing function or loop is executed.
  • M is the number of times the conditional region is executed.
IMPORTANT: N must be an integer multiple of M.

Examples

Region Cond_Region in function foo has an occurrence of 4. It executes at a rate four times slower than the code that encompasses it.

set_directive_occurrence -cycle 4 foo/Cond_Region

See Also

set_directive_pipeline

Description

Reduces the initiation interval (II) for a function or loop by allowing the concurrent execution of operations as described in Function and Loop Pipelining. A pipelined function or loop can process new inputs every N clock cycles, where N is the initiation interval (II). An II of 1 processes a new input every clock cycle.

As a default behavior, with the PIPELINE pragma or directive Vitis HLS will generate the minimum II for the design according to the specified clock period constraint. The emphasis will be on meeting timing, rather than on achieving II unless the -II option is specified.

The default type of pipeline is defined by the config_compile -pipeline_style command, but can be overridden in the PIPELINE pragma or directive.

If Vitis HLS cannot create a design with the specified II, it:

  • Issues a warning.
  • Creates a design with the lowest possible II.

You can then analyze this design with the warning messages to determine what steps must be taken to create a design that satisfies the required initiation interval.

Syntax

set_directive_pipeline [OPTIONS] <location>

Where:

  • <location> is the location (in the format function[/label]) to be pipelined.

Options

-II <integer>
Specifies the desired initiation interval for the pipeline. Vitis HLS tries to meet this request. Based on data dependencies, the actual result might have a larger II.
-off
Turns off pipeline for a specific loop or function. This can be used when config_compile -pipeline_loops is used to globally pipeline loops.
-rewind
Note: Applicable only to a loop.
Optional keyword. Enables rewinding as described in Rewinding Pipelined Loops for Performance. This enables continuous loop pipelining with no pause between one execution of the loop ending and the next execution starting. Rewinding is effective only if there is one single loop (or a perfect loop nest) inside the top-level function. The code segment before the loop:
  • Is considered as initialization.
  • Is executed only once in the pipeline.
  • Cannot contain any conditional operations (if-else).
-style <stp | frp | flp>
Specifies the type of pipeline to use for the specified function or loop. For more information on pipeline styles refer to Flushing Pipelines. The types of pipelines include:
stp
Stall pipeline. Runs only when input data is available otherwise it stalls. This is the default setting, and is the type of pipeline used by Vitis HLS for both loop and function pipelining. Use this when a flushable pipeline is not required. For example, when there are no performance or deadlock issue due to stalls.
flp
This option defines the pipeline as a flushable pipeline as described in Flushing Pipelines. This type of pipeline typically consumes more resources and/or can have a larger II because resources cannot be shared among pipeline iterations.
frp
Free-running, flushable pipeline. Runs even when input data is not available. Use this when you need better timing due to reduced pipeline control signal fanout, or when you need improved performance to avoid deadlocks. However, this pipeline style can consume more power as the pipeline registers are clocked even if there is no data.
IMPORTANT: This is a hint not a hard constraint. The tool checks design conditions for enabling pipelining. Some loops might not conform to a particular style and the tool reverts to the default style (stp) if necessary.

Examples

Function func is pipelined with the specified initiation interval.

set_directive_pipeline func II=1

See Also

set_directive_protocol

Description

This commands specifies a region of code, a protocol region, in which no clock operations will be inserted by Vitis HLS unless explicitly specified in the code. Vitis HLS will not insert any clocks between operations in the region, including those which read from or write to function arguments. The order of read and writes will therefore be strictly followed in the synthesized RTL.

A region of code can be created in the C/C++ code by enclosing the region in braces "{ }" and naming it. The following defines a region named io_section:
io_section:{
...
lines of code
...
}

A clock operation can be explicitly specified in C code using an ap_wait() statement, and can be specified in C++ code by using the wait() statement. The ap_wait and wait statements have no effect on the simulation of the design.

Syntax

set_directive_protocol [OPTIONS] <location>

The <location> specifies the location (in the format function[/label]) at which the protocol region is defined.

Options

-mode [floating | fixed]
  • floating: Lets code statements outside the protocol region overlap and execute in parallel with statements in the protocol region in the final RTL. The protocol region remains cycle accurate, but outside operations can occur at the same time. This is the default mode.
  • fixed: The fixed mode ensures that statements outside the protocol region do not execute in parallel with the protocol region.

Examples

The example code defines a protocol region, io_section in function foo. The following directive defines that region as a fixed mode protocol region:
set_directive_protocol -mode fixed foo/io_section

See Also

set_directive_reset

Description

Adds or removes resets for specific state variables (global or static). The reset port is used to restore the registers and block RAM, connected to the port, to an initial value any time the reset signal is applied. The presence and behavior of the RTL reset port is controlled using the config_rtl settings.

Greater control over reset is provided through the RESET pragma. If a variable is a static or global, the RESET pragma is used to explicitly add a reset, or the variable can be removed from the reset by turning off the pragma. This can be particularly useful when static or global arrays are present in the design.

Syntax

set_directive_reset [OPTIONS] <location> <variable>
  • <location> is the location (in the format function[/label]) at which the variable is defined.
  • <variable> is the variable to which the directive is applied.

Options

-off
  • If -off is specified, reset is not generated for the specified variable.

Examples

Adds reset to variable a in function foo even when the global reset setting is none or control.

set_directive_reset foo a

Removes reset from variable static int a in function foo even when the global reset setting is state or all.

set_directive_reset -off foo a

See Also

set_directive_stable

Description

The STABLE pragma is applied to arguments of a DATAFLOW or PIPELINE region and is used to indicate that an input or output of this region can be ignored when generating the synchronizations at entry and exit of the DATAFLOW region. This means that the reading processes (resp. read accesses) of that argument do not need to be part of the “first stage” of the task-level (resp. fine-grain) pipeline for inputs, and the writing process (resp. write accesses) do not need to be part of the last stage of the task-level (resp. fine-grain) pipeline for outputs.

The pragma can be specified at any point in the hierarchy, on a scalar or an array, and automatically applies to all the DATAFLOW or PIPELINE regions below that point. The effect of STABLE for an input is that a DATAFLOW or PIPELINE region can start another iteration even though the value of the previous iteration has not been read yet. For an output, this implies that a write of the next iteration can occur although the previous iteration is not done.

Syntax

set_directive_stable <location> <variable>
  • <location> is the function name or loop name where the directive is to be constrained.
  • <variable> is the name of the array to be constrained.

Examples

In the following example, without the STABLE directive, proc1 and proc2 would be synchronized to acknowledge the reading of their inputs (including A). With the directive, A is no longer considered as an input that needs synchronization.

void dataflow_region(int A[...], int B[…] ...
    proc1(...);
    proc2(A, ...);

The directives for this example would be scripted as:

set_directive_stable dataflow_region variable=A
set_directive_dataflow dataflow_region

See Also

set_directive_stream

Description

By default, array variables are implemented as RAM:

  • Top-level function array parameters are implemented as a RAM interface port.
  • General arrays are implemented as RAMs for read-write access.
  • Arrays involved in sub-functions, or loop-based DATAFLOW optimizations are implemented as a RAM ping-pong buffer channel.

If the data stored in the array is consumed or produced in a sequential manner, a more efficient communication mechanism is to use streaming data, where FIFOs are used instead of RAMs. When an argument of the top-level function is specified as INTERFACE type ap_fifo, the array is automatically implemented as streaming. See Defining Interfaces for more information.

IMPORTANT: To preserve the accesses, it might be necessary to prevent compiler optimizations (in particular dead code elimination) by using the volatile qualifier as described in Type Qualifiers.

Syntax

set_directive_stream [OPTIONS] <location> <variable>
  • <location> is the location (in the format function[/label]) which contains the array variable.
  • <variable> is the array variable to be implemented as a FIFO.

Options

-depth <integer>
Note: Relevant only for array streaming in dataflow channels.
By default, the depth of the FIFO implemented in the RTL is the same size as the array specified in the C code. This options allows you to modify the size of the FIFO.

When the array is implemented in a DATAFLOW region, it is common to the use the -depth option to reduce the size of the FIFO. For example, in a DATAFLOW region where all loops and functions are processing data at a rate of II = 1, there is no need for a large FIFO because data is produced and consumed in each clock cycle. In this case, the -depth option may be used to reduce the FIFO size to 2 to substantially reduce the area of the RTL design.

This same functionality is provided for all arrays in a DATAFLOW region using the config_dataflow command with the -depth option. The -depth option used with set_directive_stream overrides the default specified using config_dataflow.

-type <arg>
Specify a mechanism to select between FIFO, PIPO, synchronized shared (shared), and un-synchronized shared (unsync). The supported types include:
  • fifo: A FIFO buffer with the specified depth.
  • pipo: A regular Ping-Pong buffer, with depth but without a duplication of the array data. Consistency can be ensured by setting the depth small enough, which acts as the distance of synchronization between the producer and consumer.
  • shared: Specifies that an array local variable or argument in a given scope is viewed as a single shared memory, distributing the available ports to the processes that access it.
    TIP: The default depth for shared is 1.
  • unsync: Does not have any synchronization except for individual memory reads and writes. Consistency (read-write and write-write order) must be ensured by the design itself.

Examples

Specifies array A[10] in function func to be streaming and implemented as a FIFO.

set_directive_stream func A -type fifo

Array B in named loop loop_1 of function func is set to streaming with a FIFO depth of 12. In this case, place the pragma inside loop_1.

set_directive_stream -depth 12 -type fifo func/loop_1 B

Array C has streaming implemented as a PIPO.

set_directive_stream -type pipo func C

See Also

set_directive_top

Description

Attaches a name to a function, which can then be used by the set_top command to set the named function as the top. This is typically used to synthesize member functions of a class in C++.

Note: Specify the directive in an active solution. Use the set_top command with the new name.

Syntax

set_directive_top [OPTIONS] <location>
  • <location> is the function to be renamed.

Options

-name <string>
Specifies the name of the function to be used by the set_top command.

Examples

Function foo_long_name is renamed to DESIGN_TOP, which is then specified as the top-level. If the pragma is placed in the code, the set_top command must still be issued in the top-level specified in the GUI project settings.

set_directive_top -name DESIGN_TOP foo_long_name

Followed by the set_top DESIGN_TOP command.

See Also

set_directive_unroll

Description

Transforms loops by creating multiples copies of the loop body.

A loop is executed for the number of iterations specified by the loop induction variable. The number of iterations may also be impacted by logic inside the loop body (for example, break or modifications to any loop exit variable). The loop is implemented in the RTL by a block of logic representing the loop body, which is executed for the same number of iterations.

The set_directive_unroll command allows the loop to be fully unrolled. Unrolling the loop creates as many copies of the loop body in the RTL as there are loop iterations, or partially unrolled by a factor N, creating N copies of the loop body and adjusting the loop iteration accordingly.

If the factor N used for partial unrolling is not an integer multiple of the original loop iteration count, the original exit condition must be checked after each unrolled fragment of the loop body.

To unroll a loop completely, the loop bounds must be known at compile time. This is not required for partial unrolling.

Syntax

set_directive_unroll [OPTIONS] <location>
  • <location> is the location of the loop (in the format function[/label]) to be unrolled.

Options

-factor <integer>
Specifies a non-zero integer indicating that partial unrolling is requested.

The loop body is repeated this number of times. The iteration information is adjusted accordingly.

-skip_exit_check
Effective only if a factor is specified (partial unrolling).
  • Fixed bounds

    No exit condition check is performed if the iteration count is a multiple of the factor.

    If the iteration count is not an integer multiple of the factor, the tool:
    • Prevents unrolling.
    • Issues a warning that the exit check must be performed to proceed.
  • Variable bounds

    The exit condition check is removed. You must ensure that:

    • The variable bounds is an integer multiple of the factor.
    • No exit check is in fact required.

Examples

Unrolls loop L1 in function foo. Place the pragma in the body of loop L1.

set_directive_unroll foo/L1

Specifies an unroll factor of 4 on loop L2 of function foo. Removes the exit check. Place the pragma in the body of loop L2.

set_directive_unroll -skip_exit_check -factor 4 foo/L2

Unrolls all loops inside loop L3 in function foo, but not loop L3 itself. The -region option specifies the location be considered an enclosing region and not a loop label.

set_directive_unroll -region foo/L3

See Also