Optimization Directives
Directives, or the set_directive_*
commands,
can be specified as Tcl commands that are associated with a specific solution, or set of
solutions. Allowing you to customize the synthesis results for the same source code
across different solutions. This lets you preserve the original code while engaging in
what-if analysis of the design.
Directives must be run in the interactive mode, vitis_hls -i
, or can be run as a script using the -f
option as described in vitis_hls Command.
Pragmas are directives that you can apply in the source code, rather than as a
Tcl script, and so change the synthesis results for all implementations of your code.
There are HLS pragmas for every set_directive
command,
so you can choose how you want to work with your Vitis HLS project.
Refer to HLS Pragmas for information on the
different pragmas.
Directives and pragmas are also available through the Vitis HLS IDE for assignment to specific elements of your source code, as described in Adding Pragmas and Directives.
set_directive_aggregate
Description
set_directive_array_reshape
, and completely partitions
and reshapes the array into a wide scalar and packs it with other elements of the
struct.The bit alignment of the resulting new wide-word can be inferred from the declaration order of the struct elements. The first element takes the least significant sector of the word and so forth until all fields are mapped.
Syntax
set_directive_aggregate [OPTIONS] <location> <variable>
<location>
is the location (in the formatfunction[/label]
) which contains the variable which will be packed.<variable>
is the struct variable to be packed.
Options
-compact [bit | byte | none | auto]
- Specifies the alignment of the aggregated struct. Alignment can be on the bit-level (packed), the byte-level (padded), none, or automatically determined by the tool which is the default behavior.
Examples
Aggregates struct pointer AB
with
three 8-bit fields (typedef struct {unsigned char R, G,
B;}
pixel) in function func
, into a
new 24-bit pointer, aligning data at the bit-level.
set_directive_aggregate func AB -compact bit
See Also
set_directive_alias
Description
Specify that two or more M_AXI
pointer arguments point to the same
underlying buffer in memory (DDR or HBM) and indicate any aliasing between the
pointers by setting the distance or offset between them.
M_AXI
interfaces. Vitis HLS considers different pointers to be independent channels and generally does not provide any dependency analysis. However, in cases where the host allocates a single buffer for multiple pointers, this relationship can be communicated through the ALIAS pragma or directive and dependency analysis can be maintained. The ALIAS pragma enables data dependence analysis in Vitis HLS by defining the distance between pointers in the buffer.
Requirements for ALIAS:
- All ports assigned to an ALIAS pragma must be in assigned to
M_AXI
interfaces and assigned to different bundles, as shown in the example below - Each port can only be used in one ALIAS pragma or directive
- The
depth
of all ports assigned to an ALIAS pragma must be the same - When offset is specified, the number of ports and number of offsets specified must be the same: one offset per port
- The offset for the INTERFACE must be specified as slave or
direct,
offset=off
is not supported
Syntax
set_directive_alias [OPTIONS] <location> <ports>
<location>
is the location string in the formatfunction[/label]
that the ALIAS pragma applies to.<ports>
specifies the ports to alias.
Options
-distance <integer>
- Specifies the difference between the pointer values passed to the ports in the list.
-offset <string>
- Specifies the offset of the pointer passed to each port in
the
ports
list with respect to the origin of the array.
Example
For the following function top
:
void top(int *arr0, int *arr1, int *arr2, int *arr3, ...) {
#pragma HLS interface M_AXI port=arr0 bundle=hbm0 depth=0x40000000
#pragma HLS interface M_AXI port=arr1 bundle=hbm1 depth=0x40000000
#pragma HLS interface M_AXI port=arr2 bundle=hbm2 depth=0x40000000
#pragma HLS interface M_AXI port=arr3 bundle=hbm3 depth=0x40000000
distance
between
them:set_directive_alias "top" arr0,arr1,arr2,arr3 -distance 10000000
offset
between
pointers, to accomplish the same
effect:set_directive_alias top arr0,arr1,arr2,arr3 -offset 00000000,10000000,20000000,30000000
See Also
set_directive_allocation
Description
Specifies instance restrictions for resource allocation.
The ALLOCATION pragma or directive can limit the number of RTL
instances and hardware resources used to implement specific functions, loops, or
operations. For example, if the C/C++ source has four instances of a function
foo_sub
, the set_directive_allocation
command can ensure that there is only one
instance of foo_sub
in the final RTL. All four
instances are implemented using the same RTL block. This reduces resources used by
the function, but negatively impacts performance by sharing those resources.
The operations in the C/C++ code, such as additions,
multiplications, array reads, and writes, can also be limited by the set_directive_allocation
command.
Syntax
set_directive_allocation [OPTIONS] <location> <instances>
<location>
is the location string in the formatfunction[/label]
.<instances>
is a function or operator.The function can be any function in the original C/C++ code that has not been either inlined by the
set_directive_inline
command or inlined automatically by Vitis HLS.For a complete list of operations that can be limited using the
ALLOCATION
pragma, refer to the config_op command.
Options
-limit <integer>
-
Sets a maximum limit on the number of instances (of the type defined by the
-type
option) to be used in the RTL design. -type [function|operation]
- The instance type can be
function
(default) oroperation
.
Examples
Given a design foo_top
with multiple instances of function
foo
, limits the number of instances of foo
in
the RTL to 2.
set_directive_allocation -limit 2 -type function foo_top foo
Limits the number of multipliers used in the implementation of My_func
to 1. This limit does not apply to any multipliers that might reside
in sub-functions of My_func
. To limit the multipliers
used in the implementation of any sub-functions, specify an allocation
directive on the sub-functions or inline the sub-function into function
My_func
.
set_directive_allocation -limit 1 -type operation My_func mul
See Also
set_directive_array_partition
Description
Array_Partition
and Array_Reshape
pragmas and directives are not supported for M_AXI
Interfaces on the top-level function. Instead
you can use the hls::vector
data types as
described in Vector Data Types.Partitions an array into smaller arrays or individual elements.
- Results in RTL with multiple small memories or multiple registers instead of one large memory.
- Effectively increases the amount of read and write ports for the storage.
- Potentially improves the throughput of the design.
- Requires more memory instances or registers.
Syntax
set_directive_array_partition [OPTIONS] <location> <array>
<location>
is the location (in the formatfunction[/label]
) which contains the array variable.<array>
is the array variable to be partitioned.
Options
-dim <integer>
- Note: Relevant for multi-dimensional arrays only.Specifies which dimension of the array is to be partitioned.
- If a value of 0 is used, all dimensions are partitioned with the specified options.
- Any other value partitions only that dimension. For example, if a value 1 is used, only the first dimension is partitioned.
-factor <integer>
- Note: Relevant for typeSpecifies the number of smaller arrays that are to be created.
block
orcyclic
partitioning only. -type (block|cyclic|complete)
block
partitioning creates smaller arrays from consecutive blocks of the original array. This effectively splits the array into N equal blocks where N is the integer defined by the-factor
option.cyclic
partitioning creates smaller arrays by interleaving elements from the original array. For example, if-factor 3
is used:- Element 0 is assigned to the first new array.
- Element 1 is assigned to the second new array.
- Element 2 is assigned to the third new array.
- Element 3 is assigned to the first new array again.
complete
partitioning decomposes the array into individual elements. For a one-dimensional array, this corresponds to resolving a memory into individual registers. For multi-dimensional arrays, specify the partitioning of each dimension, or use-dim 0
to partition all dimensions.
complete
.
Example 1
Partitions array AB[13] in function func
into four arrays. Because four is not an integer factor of
13:
- Three arrays have three elements.
- One array has four elements (AB[9:12]).
set_directive_array_partition -type block -factor 4 func AB
Partitions array AB[6][4] in function func
into two arrays, each of dimension [6][2].
set_directive_array_partition -type block -factor 2 -dim 2 func AB
Partitions all dimensions of AB[4][10][6] in function func
into individual elements.
set_directive_array_partition -type complete -dim 0 func AB
Example 2
Partitioned arrays can be addressed in your code by the new structure of the array, as shown in the following code example;
struct SS
{
int x[N];
int y[N];
};
int top(SS *a, int b[4][6], SS &c) {...}
set_directive_array_partition top b -type complete -dim 1
set_directive_interface -mode ap_memory top b[0]
set_directive_interface -mode ap_memory top b[1]
set_directive_interface -mode ap_memory top b[2]
set_directive_interface -mode ap_memory top b[3]
See Also
set_directive_array_reshape
Description
Array_Partition
and Array_Reshape
pragmas and directives are not supported for M_AXI
Interfaces on the top-level function. Instead
you can use the hls::vector
data types as
described in Vector Data Types.Combines array partitioning with vertical array mapping to create a single new array with fewer elements but wider words.
The set_directive_array_reshape
command has the following features:
- Splits the array into multiple arrays (like
set_directive_array_partition
). - Automatically recombine the arrays vertically to create a new array with wider words.
Syntax
set_directive_array_reshape [OPTIONS] <location> <array>
<location>
is the location (in the formatfunction[/label]
) that contains the array variable.<array>
is the array variable to be reshaped.
Options
-dim <integer>
- Note: Relevant for multi-dimensional arrays only.Specifies which dimension of the array is to be reshaped.
- If the value is set to 0, all dimensions are partitioned with the specified options.
- Any other value partitions only that dimension. The default is 1.
-factor <integer>
- Note: Relevant for typeSpecifies the number of temporary smaller arrays to be created.
block
orcyclic
reshaping only. -object
- Note: Relevant for container arrays only.Applies reshape on the objects within the container. If the option is specified, all dimensions of the objects will be reshaped, but all dimensions of the container will be kept.
-type (block|cyclic|complete)
-
block
reshaping creates smaller arrays from consecutive blocks of the original array. This effectively splits the array into N equal blocks where N is the integer defined by the-factor
option and then combines the N blocks into a single array withword-width*N
. The default iscomplete
.cyclic
reshaping creates smaller arrays by interleaving elements from the original array. For example, if-factor 3
is used, element 0 is assigned to the first new array, element 1 to the second new array, element 2 is assigned to the third new array, and then element 3 is assigned to the first new array again. The final array is a vertical concatenation (word concatenation, to create longer words) of the new arrays into a single array.complete
reshaping decomposes the array into temporary individual elements and then recombines them into an array with a wider word. For a one-dimension array this is equivalent to creating a very-wide register (if the original array was N elements of M bits, the result is a register withN*M
bits). This is the default.
Example 1
Reshapes 8-bit array AB[17] in function func
into a new 32-bit array with five elements.
Because four is not an integer factor of 17:
- Index 17 of the array, AB[17], is in the lower eight bits of the reshaped fifth element.
- The upper eight bits of the fifth element are unused.
set_directive_array_reshape -type block -factor 4 func AB
Partitions array AB[6][4] in function func
, into a new array of dimension [6][2], in which dimension 2 is
twice the width.
set_directive_array_reshape -type block -factor 2 -dim 2 func AB
Reshapes 8-bit array AB[4][2][2] in function func
into a new single element array (a register), 4*2*2*8 (=
128)-bits wide.
set_directive_array_reshape -type complete -dim 0 func AB
Example 2
Partitioned arrays can be addressed in your code by the new structure of the array, as shown in the following code example;
struct SS
{
int x[N];
int y[N];
};
int top(SS *a, int b[4][6], SS &c) {...}
set_directive_array_reshape top b -type complete -dim 0
set_directive_interface -mode ap_memory top b[0]
See Also
set_directive_bind_op
Description
Vitis HLS implements the operations in
the code using specific implementations. The set_directive_bind_op
command specifies that for a specified variable,
an operation (mul
, add
, sub
) should be mapped to a
specific device resource for implementation (impl
)
in the RTL. If this command is not specified, Vitis HLS automatically determines the resource to use.
For example, to indicate that a specific multiplier operation (mul
) is implemented in the device fabric rather than a
DSP, you can use the set_directive_bind_op
command.
You can also specify the latency of the operation using the -latency
option.
-latency
option, the operation must have an
available multi-stage implementation. The HLS tool provides a multi-stage
implementation for all basic arithmetic operations (add, subtract, multiply, and
divide), and all floating-point operations.Syntax
set_directive_bind_op [OPTIONS] <location> <variable>
<location>
is the location (in the formatfunction[/label]
) which contains the variable.<variable>
is the variable to be assigned. The variable in this case is one that is assigned the result of the operation that is the target of this directive.
Options
-op <value>
- Defines the operation to bind to a specific implementation
resource. Supported functional operations include:
mul
,add
,sub
-impl <value>
- Defines the implementation to use for the specified operation.
-latency <int>
- Defines the default latency for the implementation of the
operation. The valid latency varies according to the specified
op
andimpl
. The default is -1, which lets Vitis HLS choose the latency.
Operation | Implementation | Min Latency | Max Latency |
---|---|---|---|
add | fabric | 0 | 4 |
add | dsp | 0 | 4 |
mul | fabric | 0 | 4 |
mul | dsp | 0 | 4 |
sub | fabric | 0 | 4 |
sub | dsp | 0 | 0 |
Operation | Implementation | Min Latency | Max Latency |
---|---|---|---|
fadd | fabric | 0 | 13 |
fadd | fulldsp | 0 | 12 |
fadd | primitivedsp | 0 | 3 |
fsub | fabric | 0 | 13 |
fsub | fulldsp | 0 | 12 |
fsub | primitivedsp | 0 | 3 |
fdiv | fabric | 0 | 29 |
fexp | fabric | 0 | 24 |
fexp | meddsp | 0 | 21 |
fexp | fulldsp | 0 | 30 |
flog | fabric | 0 | 24 |
flog | meddsp | 0 | 23 |
flog | fulldsp | 0 | 29 |
fmul | fabric | 0 | 9 |
fmul | meddsp | 0 | 9 |
fmul | fulldsp | 0 | 9 |
fmul | maxdsp | 0 | 7 |
fmul | primitivedsp | 0 | 4 |
fsqrt | fabric | 0 | 29 |
frsqrt | fabric | 0 | 38 |
frsqrt | fulldsp | 0 | 33 |
frecip | fabric | 0 | 37 |
frecip | fulldsp | 0 | 30 |
dadd | fabric | 0 | 13 |
dadd | fulldsp | 0 | 15 |
dsub | fabric | 0 | 13 |
dsub | fulldsp | 0 | 15 |
ddiv | fabric | 0 | 58 |
dexp | fabric | 0 | 40 |
dexp | meddsp | 0 | 45 |
dexp | fulldsp | 0 | 57 |
dlog | fabric | 0 | 38 |
dlog | meddsp | 0 | 49 |
dlog | fulldsp | 0 | 65 |
dmul | fabric | 0 | 10 |
dmul | meddsp | 0 | 13 |
dmul | fulldsp | 0 | 13 |
dmul | maxdsp | 0 | 14 |
dsqrt | fabric | 0 | 58 |
drsqrt | fulldsp | 0 | 111 |
drecip | fulldsp | 0 | 36 |
hadd | fabric | 0 | 9 |
hadd | meddsp | 0 | 12 |
hadd | fulldsp | 0 | 12 |
hsub | fabric | 0 | 9 |
hsub | meddsp | 0 | 12 |
hsub | fulldsp | 0 | 12 |
hdiv | fabric | 0 | 16 |
hmul | fabric | 0 | 7 |
hmul | fulldsp | 0 | 7 |
hmul | maxdsp | 0 | 9 |
hsqrt | fabric | 0 | 16 |
Examples
In the following example, a two-stage pipelined multiplier using fabric logic
is specified to implement the multiplication for variable <c>
of the function foo
.
int foo (int a, int b) {
int c, d;
c = a*b;
d = a*c;
return d;
}
set_directive
command is as
follows:set_directive_bind_op -op mul -impl fabric -latency 2 "foo" c
<d>
.See Also
set_directive_bind_storage
Description
The set_directive_bind_storage
command
assigns a variable (array, or function argument) in the code to a specific memory
type (type
) in the RTL. If the command is not
specified, the Vitis HLS tool determines the
memory type to assign. The HLS tool implements the memory using specified
implementations (impl
) in the hardware.
For example, you can use the set_directive_bind_storage
command to specify which type of memory,
and which implementation to use for an array variable. Also, this allows you to
control whether the array is implemented as a single or a dual-port RAM. This usage
is important for arrays on the top-level function interface, because the memory type
associated with the array determines the number and type of ports needed in the RTL,
as discussed in Arrays on the Interface.
You can use the -latency
option to
specify the latency of the implementation. For block RAMs on the interface, the
-latency
option allows you to model off-chip,
non-standard SRAMs at the interface, for example supporting an SRAM with a latency
of 2 or 3. For internal operations, the -latency
option allows the operation to be implemented using more pipelined stages. These
additional pipeline stages can help resolve timing issues during RTL synthesis.
-latency
option, the operation must have an
available multi-stage implementation. The HLS tool provides a multi-stage
implementation for all block RAMs.For best results, Xilinx
recommends that you use -std=c99
for C and -fno-builtin
for C and C++. To specify the C compile
options, such as -std=c99
, use the Tcl command
add_files
with the -cflags
option. Alternatively, select the Edit CFLAGs button in the Project Settings dialog box as described in Creating a New Vitis HLS Project.
Syntax
set_directive_bind_storage [OPTIONS] <location> <variable>
<location>
is the location (in the formatfunction[/label]
) which contains the variable.<variable>
is the variable to be assigned.
Options
-type
- Defines the type of memory to bind to the specified variable.
-impl <value>
- Defines the implementation for the specified memory type.
Supported implementations include:
bram
,bram_ecc
,lutram
,uram
,uram_ecc
,srl
,memory
, andauto
as described below. -latency <int>
- Defines the default latency for the binding of the storage
type to the implementation. The valid latency varies according to the
specified
type
andimpl
. The default is -1, which lets Vitis HLS choose the latency.
Type | Implementation | Min Latency | Max Latency |
---|---|---|---|
FIFO | BRAM | 0 | 0 |
FIFO | LUTRAM | 0 | 0 |
FIFO | MEMORY | 0 | 0 |
FIFO | SRL | 0 | 0 |
FIFO | URAM | 0 | 0 |
RAM_1P | AUTO | 1 | 3 |
RAM_1P | BRAM | 1 | 3 |
RAM_1P | LUTRAM | 1 | 3 |
RAM_1P | URAM | 1 | 3 |
RAM_1WNR | AUTO | 1 | 3 |
RAM_1WNR | BRAM | 1 | 3 |
RAM_1WNR | LUTRAM | 1 | 3 |
RAM_1WNR | URAM | 1 | 3 |
RAM_2P | AUTO | 1 | 3 |
RAM_2P | BRAM | 1 | 3 |
RAM_2P | LUTRAM | 1 | 3 |
RAM_2P | URAM | 1 | 3 |
RAM_S2P | BRAM | 1 | 3 |
RAM_S2P | BRAM_ECC | 1 | 3 |
RAM_S2P | LUTRAM | 1 | 3 |
RAM_S2P | URAM | 1 | 3 |
RAM_S2P | URAM_ECC | 1 | 3 |
RAM_T2P | BRAM | 1 | 3 |
RAM_T2P | URAM | 1 | 3 |
ROM_1P | AUTO | 1 | 3 |
ROM_1P | BRAM | 1 | 3 |
ROM_1P | LUTRAM | 1 | 3 |
ROM_2P | AUTO | 1 | 3 |
ROM_2P | BRAM | 1 | 3 |
ROM_2P | LUTRAM | 1 | 3 |
ROM_NP | BRAM | 1 | 3 |
ROM_NP | LUTRAM | 1 | 3 |
set_directive_bind_storage
.Examples
In the following example, the coeffs[128]
variable is an argument to the top-level function func_top
. The directive specifies that coeffs
uses a single port RAM implemented on a BRAM core from the
library.
set_directive_bind_storage -impl bram "func_top" coeffs RAM_1P
coeffs
are defined in
the RAM_1P core.See Also
set_directive_dataflow
Description
Specifies that dataflow optimization be performed on the functions or loops as described in Exploiting Task Level Parallelism: Dataflow Optimization, improving the concurrency of the RTL implementation.
All operations are performed sequentially in a C/C++ description. In
the absence of any directives that limit resources (such as set_directive_allocation
), Vitis HLS seeks to minimize latency and improve concurrency. Data dependencies can limit
this. For example, functions or loops that access arrays must finish all read/write
accesses to the arrays before they complete. This prevents the next function or loop
that consumes the data from starting operation.
It is possible for the operations in a function or loop to start operation before the previous function or loop completes all its operations. When the DATAFLOW optimization is specified, the HLS tool analyzes the dataflow between sequential functions or loops and creates channels (based on ping-pong RAMs or FIFOs) that allow consumer functions or loops to start operation before the producer functions or loops have completed. This allows functions or loops to operate in parallel, which decreases latency and improves the throughput of the RTL.
config_dataflow
command specifies the default memory
channel and FIFO depth used in DATAFLOW optimization.If no initiation interval (number of cycles between the start of one function or loop and the next) is specified, Vitis HLS attempts to minimize the initiation interval and start operation as soon as data is available.
For the DATAFLOW optimization to work, the data must flow through the
design from one task to the next. The following coding styles prevent the HLS tool
from performing the DATAFLOW
optimization. Refer to
Dataflow Optimization Limitations for additional details.
- Single-producer-consumer violations
- Feedback between tasks
- Conditional execution of tasks
- Loops with multiple exit conditions
Finally, the DATAFLOW optimization has no hierarchical implementation. If a sub-function or loop contains additional tasks that might benefit from the DATAFLOW optimization, you must apply the optimization to the loop, the sub-function, or inline the sub-function.
Syntax
set_directive_dataflow <location> -disable_start_propagation
<location>
is the location (in the formatfunction[/label]
) at which dataflow optimization is to be performed.-disable_start_propagation
disables the creation of a start FIFO used to propagate a start token to an internal process. Such FIFOs can sometimes be a bottleneck for performance.
Examples
Specifies dataflow optimization within function foo
.
set_directive_dataflow foo
See Also
set_directive_dependence
Description
Vitis HLS detects dependencies within loops: dependencies within the same iteration of a loop are loop-independent dependencies, and dependencies between different iterations of a loop are loop-carried dependencies.
These dependencies are impacted when operations can be scheduled, especially during function and loop pipelining.
- Loop-independent dependence
- The same element is accessed in a single loop
iteration.
for (i=0;i<N;i++) { A[i]=x; y=A[i]; }
- Loop-carried dependence
- The same element is accessed from a different loop
iteration.
for (i=0;i<N;i++) { A[i]=A[i-1]*2; }
Under certain circumstances, such as variable dependent array indexing or when
an external requirement needs to be enforced (for example, two inputs are never the same
index), the dependence analysis might be too conservative and fail to filter out false
dependencies. The set_directive_dependence
command allows
you to explicitly define the dependencies and eliminate a false dependence as described in
Managing Pipeline Dependencies.
Syntax
set_directive_dependence -dependent <arg> [OPTIONS] <location>
-dependent (true | false)
- This argument should be specified to indicate whether a dependence is
true
and needs to be enforced, or isfalse
and should be removed. However, when not specified, the tool will return a warning that the value was not specified and will assume a value offalse
. <location>
- The location in the code, specified as
function[/label]
, where the dependence is defined.
Options
-class (array | pointer)
- Specifies a class of variables in which the dependence needs
clarification. This is mutually exclusive with the
-variable
option. -dependent (true | false)
- Specify if a dependence needs to be enforced (true) or removed (false).
-direction (RAW | WAR | WAW)
- Note: Relevant only for loop-carried dependencies.Specifies the direction for a dependence:
RAW
(Read-After-Write - true dependence)- The write instruction uses a value used by the read instruction.
WAR
(Write-After-Read - anti dependence)- The read instruction gets a value that is overwritten by the write instruction.
WAW
(Write-After-Write - output dependence)- Two write instructions write to the same location, in a certain order.
-distance <integer>
- Note: Relevant only for loop-carried dependencies whereSpecifies the inter-iteration distance for array access.
-dependent
is set totrue
. -type (intra | inter)
- Specifies whether the dependence is:
- Within the same loop iteration (
intra
), or - Between different loop iterations (
inter
) (default).
- Within the same loop iteration (
-variable <variable>
- Defines a specific variable to apply the dependence directive.
Mutually exclusive with the
-class
option.IMPORTANT: You cannot specify adependence
for function arguments that are bundled with other arguments in anm_axi
interface. This is the default configuration form_axi
interfaces on the function. You also cannot specify a dependence for an element of a struct, unless the struct has been disaggregated.
Examples
Removes the dependence between Var1
in the
same iterations of loop_1
in function func
.
set_directive_dependence -variable Var1 -type intra \
-dependent false func/loop_1
The dependence on all arrays in loop_2
of
function func
informs Vitis HLS that all reads must happen after
writes in the same loop iteration.
set_directive_dependence -class array -type intra \
-dependent true -direction RAW func/loop_2
See Also
set_directive_disaggregate
Description
The set_directive_disaggregate
command lets you deconstruct a struct
variable into
its individual elements. The number and type of elements created are determined by
the contents of the struct itself.
Syntax
set_directive_disaggregate <location> <variable>
- <
location
> is the location (in the formatfunction[/label]
) where the variable to disaggregate is found. - <
variable
> specifies the struct variable name.
Options
This command has no options.
Example 1
The following example shows the struct variable a
in function top
will
be disaggregated:
set_directive_disaggregate top a
Example 2
Disaggregated structs can be addressed in your code by the using standard C/C++ coding style as shown below. Notice the different methods for accessing the pointer element (a) versus the reference element (c);
struct SS
{
int x[N];
int y[N];
};
int top(SS *a, int b[4][6], SS &c) {
set_directive_disaggregate top a
set_directive_interface -mode s_axilite top a->x
set_directive_interface -mode s_axilite top a->y
set_directive_disaggregate top c
set_directive_interface -mode ap_memory top c.x
set_directive_interface -mode ap_memory top c.y
See Also
set_directive_expression_balance
Description
Sometimes C/C++ code is written with a sequence of operations, resulting in a long chain of operations in RTL. With a small clock period, this can increase the latency in the design. By default, the Vitis HLS tool rearranges the operations using associative and commutative properties. As described in Optimizing Logic Expressions, this rearrangement creates a balanced tree that can shorten the chain, potentially reducing latency in the design at the cost of extra hardware.
Expression balancing rearranges operators to construct a balanced tree and reduce latency.
- For integer operations expression balancing is on by default but may be disabled.
- For floating-point operations, expression balancing is off by default but may be enabled.
The set_directive_expression_balance
command allows this expression balancing to be turned off, or on, within a specified
scope.
Syntax
set_directive_expression_balance [OPTIONS] <location>
<location>
is the location (in the formatfunction[/label]
) where expression balancing should be disabled, or enabled.
Options
-off
- Turns off expression balancing at the specified location.
Examples
Disables expression balancing within function My_Func
.
set_directive_expression_balance -off My_Func
Explicitly enables expression balancing in function My_Func2
.
set_directive_expression_balance My_Func2
See Also
set_directive_function_instantiate
Description
By default:
- Functions remain as separate hierarchy blocks in the RTL.
- All instances of a function, at the same level of hierarchy, uses the same RTL implementation (block).
The set_directive_function_instantiate
command is used to create a unique RTL implementation for each instance of a function,
allowing each instance to be optimized.
By default, the following code results in a single RTL implementation of
function foo_sub
for all three instances.
char foo_sub(char inval, char incr)
{
return inval + incr;
}
void foo(char inval1, char inval2, char inval3,
char *outval1, char *outval2, char * outval3)
{
*outval1 = foo_sub(inval1, 1);
*outval2 = foo_sub(inval2, 2);
*outval3 = foo_sub(inval3, 3);
}
Using the directive as shown in the example section below results in three
versions of function foo_sub
, each independently optimized
for variable incr
.
Syntax
set_directive_function_instantiate <location> <variable>
<location>
is the location (in the formatfunction[/label]
) where the instances of a function are to be made unique.<variable>
<string>
specifies which function argument<string>
is to be specified as constant.
Options
This command has no options.
Examples
For the example code shown above, the following Tcl (or pragma placed in
function foo_sub
) allows each instance of function foo_sub
to be independently optimized with respect to input
incr
.
set_directive_function_instantiate foo_sub incr
See Also
set_directive_inline
Description
Removes a function as a separate entity in the RTL hierarchy. After inlining, the function is dissolved into the calling function, and no longer appears as a separate level of hierarchy.
In some cases, inlining a function allows operations within the function to be shared and optimized more effectively with the calling function. However, an inlined function cannot be shared or reused, so if the parent function calls the inlined function multiple times, this can increase the area and resource utilization.
By default, inlining is only performed on the next level of function hierarchy.
Syntax
set_directive_inline [OPTIONS] <location>
<location>
is the location (in the formatfunction[/label]
) where inlining is to be performed.
Options
-off
- By default, Vitis HLS performs inlining
of smaller functions in the code. Using the
-off
option disables inlining for the specified function. -recursive
- By default, only one level of function inlining is
performed. The functions within the specified function are not inlined. The
-recursive
option inlines all functions recursively within the specified function hierarchy.
Examples
The following example inlines function func_sub1
, but no sub-functions called by func_sub1
.
set_directive_inline func_sub1
func_sub1
, recursively down the hierarchy, except function func_sub2
:set_directive_inline -recursive func_sub1
set_directive_inline -off func_sub2
See Also
set_directive_interface
Description
Specifies how RTL ports are created from the function description during interface synthesis. For more information, see Defining Interfaces. The ports in the RTL implementation are derived from:
- Any function-level protocol that is specified.
- Function arguments and return.
- Global variables (accessed by the top-level function and defined outside its scope).
Function-level handshakes:
- Control when the function starts operation.
- Indicate when function operation:
- Ends
- Is idle
- Is ready for new inputs
The implementation of a function-level protocol:
- Is controlled by modes
ap_ctrl_none
,ap_ctrl_hs
, orap_ctrl_chain
. - Requires only the top-level function name.
Each function argument can be specified to have its own I/O protocol (such as valid handshake or acknowledge handshake).
If a global variable is accessed, but all read and write operations are local to the design, the resource is created in the design. There is no need for an I/O port in the RTL. If however, the global variable is expected to be an external source or destination, specify its interface in a similar manner as standard function arguments. See the examples below.
Syntax
set_directive_interface [OPTIONS] <location> <port>
<location>
is the location (in the formatfunction[/label
]) where the function interface or registered output is to be specified.<port>
is the parameter (function argument or global variable) for which the interface has to be synthesized. This is not required when modesap_ctrl_none
orap_ctrl_hs
are used.
Options
-bundle <string>
- By default, the HLS tool groups or bundles function
arguments with compatible options into interface ports in the RTL code. All
AXI4-Lite (s_axilite) interfaces are bundled into a single AXI4-Lite port whenever possible. Similarly,
all function arguments specified as an AXI4 (
m_axi
) interface are bundled into a single AXI4 port by default. -clock <string>
- By default, the AXI4-Lite interface clock is the same clock as the system clock. This option is
used to set specify a separate clock for an AXI4-Lite interface. If the
-bundle
option is used to group multiple top-level function arguments into a single AXI4-Lite interface, the clock option need only be specified on one of bundle members. -depth <int>
- Specifies the maximum number of samples for the test bench to process. This setting indicates the maximum size of the FIFO needed in the verification adapter that Vitis HLS creates for RTL co-simulation. This option is required for pointer interfaces using ap_fifo mode.
-latency <value>
- This option can be used on ap_memory and M_AXI interfaces.
- In an ap_memory interface, the interface option specifies the read latency of the RAM resource driving the interface. By default, a read operation of 1 clock cycle is used. This option allows an external RAM with more than 1 clock cycle of read latency to be modeled.
- In an M_AXI interface, this option specifies the expected latency of the AXI4 interface, allowing the design to initiate a bus request <value> number of cycles (latency) before the read or write is expected. If this figure it too low, the design will be ready too soon and may stall waiting for the bus. If this figure is too high, bus access may be idle waiting on the design to start the access.
-max_read_burst_length <int>
- For use with the M_AXI interface, this option specifies the maximum number of data values read during a burst transfer.
-max_widen_bitwidth <int>
- Specifies the maximum bit width available for the interface when automatically widening the interface.
-max_write_burst_length <int>
- For use with the M_AXI interface, this option specifies the maximum number of data values written during a burst transfer.
-mode (ap_none|ap_vld|ap_ack|ap_hs|ap_ovld|ap_fifo|ap_memory|bram|axis|s_axilite|m_axi|ap_ctrl_none|ap_ctrl_hs|ap_ctrl_chain|ap_stable)
- Following is a summary of how Vitis HLS implements the
-mode
options.- ap_none: No protocol. The interface is a data port.
- ap_vld: Implements
the data port with an associated
valid
port to indicate when the data is valid for reading or writing. - ap_ack: Implements
the data port with an associated
acknowledge
port to acknowledge that the data was read or written. - ap_hs: Implements
the data port with associated
valid
andacknowledge
ports to provide a two-way handshake to indicate when the data is valid for reading and writing and to acknowledge that the data was read or written. - ap_ovld:
Implements the output data port with an associated
valid
port to indicate when the data is valid for reading or writing.Note: Vitis HLS implements the input argument or the input half of any read/write arguments with mode ap_none. - ap_fifo:
Implements the port with a standard FIFO interface using data input
and output ports with associated active-Low FIFO
empty
andfull
ports.Note: You can only use this interface on read arguments or write arguments. Theap_fifo
mode does not support bidirectional read/write arguments. - ap_memory: Implements array arguments as a standard RAM interface. If you use the RTL design in Vivado IP integrator, the memory interface appears as discrete ports.
- bram: Implements array arguments as a standard RAM interface. If you use the RTL design in Vitis IP integrator, the memory interface appears as a single port.
- axis: Implements all ports as an AXI4-Stream interface.
- s_axilite: Implements all ports as an AXI4-Lite interface. Vitis HLS produces an associated set of C driver files during the Export RTL process.
- m_axi: Implements
all ports as an AXI4 interface.
You can use the
config_interface
command to specify either 32-bit (default) or 64-bit address ports and to control any address offset. - ap_ctrl_none: No
block-level I/O protocol.Note: Using the
ap_ctrl_none
mode might prevent the design from being verified using the C/C++/RTL co-simulation feature. - ap_ctrl_hs:
Implements a set of block-level control ports to
start
the design operation and to indicate when the design isidle
,done
, andready
for new input data.Note: Theap_ctrl_hs
mode is the default block-level I/O protocol. - ap_ctrl_chain:
Implements a set of block-level control ports to
start
the design operation,continue
operation, and indicate when the design isidle
,done
, andready
for new input data. - ap_stable: No protocol. The interface is a data port. Vitis HLS assumes the data port is always stable after reset, which allows internal optimizations to remove unnecessary registers.
-name <string>
- Specifies a name for the port which will be used in the generated RTL.
-num_read_outstanding <int>
- For use with the M_AXI interface, this option specifies how
many read requests can be made to the AXI4 bus, without a response, before the design stalls. This
implies internal storage in the design, and a FIFO of
size:
num_read_outstanding*max_read_burst_length*word_size
-num_write_outstanding <int>
- For use with the M_AXI interface, this option specifies how
many write requests can be made to the AXI4 bus, without a response, before the design stalls. This
implies internal storage in the design, and a FIFO of
size:
num_read_outstanding*max_read_burst_length*word_size
-offset <string>
- Controls the address offset in AXI4-Lite (
s_axilite
) and AXI4 memory mapped (m_axi
) interfaces for the specified port.- In an
s_axilite
interface,<string>
specifies the address in the register map. - In an
m_axi
interface this option overrides the global option specified by theconfig_interface -m_axi_offset
option, and<string>
is specified as:- off: Do not generate an offset port.
- direct: Generate a scalar input offset port.
- slave: Generate an offset port and automatically map it to an AXI4-Lite slave interface. This is the default offset.
- In an
-register
- Registers the signal and any associated protocol signals
and instructs the signals to persist until at least the last cycle of the
function execution. This option applies to the following scalar interfaces
for the top-level function:
- s_axilite
- ap_fifo
- ap_none
- ap_stable
- ap_hs
- ap_ack
- ap_vld
- ap_ovld
-register_mode (both|forward|reverse|off)
- This option applies to AXI4-Stream interfaces, and specifies if registers are placed
on the forward path (TDATA and TVALID), the reverse path (TREADY), on both paths (TDATA, TVALID, and TREADY), or if
none of the ports signals are to be registered (off). The default is
both
. AXI4-Stream side-channel signals are considered to be data signals and are registered whenever the TDATA is registered. -storage_impl=<impl>
- For use with s_axilite only. This options defines a storage implementation to assign to the interface.
-storage_type=<type>
- For use with ap_memory
and
bram
interfaces only. This options defines a storage type (for example, RAM_T2P) to assign to the variable.
Examples
Turns off function-level handshakes for function func
.
set_directive_interface -mode ap_ctrl_none func
Argument InData
in function
func
is specified to have a ap_vld interface and the input should be
registered.
set_directive_interface -mode ap_vld -register func InData
Exposes global variable lookup_table
used in function func
as a port on the RTL design, with an ap_memory
interface.
set_directive_interface -mode ap_memory func look_table
See Also
set_directive_latency
Description
Specifies a maximum or minimum latency value, or both, on a function, loop, or region.
Vitis HLS always aims for minimum latency. The behavior of the tool when minimum and maximum latency values are specified is as follows:
- Latency is less than the minimum: If Vitis HLS can achieve less than the minimum specified latency, it extends the latency to the specified value, potentially enabling increased sharing.
- Latency is greater than the minimum: The constraint is satisfied. No further optimizations are performed.
- Latency is less than the maximum: The constraint is satisfied. No further optimizations are performed.
- Latency is greater than the maximum: If Vitis HLS cannot schedule within the maximum limit, it increases effort to achieve the specified constraint. If it still fails to meet the maximum latency, it issues a warning. Vitis HLS then produces a design with the smallest achievable latency.
Syntax
set_directive_latency [OPTIONS] <location>
<location>
is the location (function, loop or region) (in the formatfunction[/label]
) to be constrained.
Options
-max <
integer
>- Specifies the maximum latency.
-min <
integer
>- Specifies the minimum latency.
Examples
Function foo
is specified to
have a minimum latency of 4 and a maximum latency of 8.
set_directive_latency -min=4 -max=8 foo
In function foo
, loop
loop_row
is specified to have a maximum
latency of 12. Place the pragma in the loop body.
set_directive_latency -max=12 foo/loop_row
See Also
set_directive_loop_flatten
Description
Flattens nested loops into a single loop hierarchy.
In the RTL implementation, it costs a clock cycle to move between loops in the loop hierarchy. Flattening nested loops allows them to be optimized as a single loop. This saves clock cycles, potentially allowing for greater optimization of the loop body logic.
- Perfect loop nests
- Only the innermost loop has loop body content.
- There is no logic specified between the loop statements.
- All loop bounds are constant.
- Semi-perfect loop nests
- Only the innermost loop has loop body content.
- There is no logic specified between the loop statements.
- The outermost loop bound can be a variable.
- Imperfect loop nests
When the inner loop has variables bounds (or the loop body is not exclusively inside the inner loop), try to restructure the code, or unroll the loops in the loop body to create a perfect loop nest.
Syntax
set_directive_loop_flatten [OPTIONS] <location>
<location>
is the location (inner-most loop), in the formatfunction[/label]
.
Options
-off
- Option to prevent loop flattening from taking place, and can
prevent some loops from being flattened while all others in the specified
location are flattened.IMPORTANT: The presence of the LOOP_FLATTEN pragma or directive enables the optimization. The addition of
-off
disables it.
Examples
Flattens loop_1
in function
foo
and all (perfect or semi-perfect)
loops above it in the loop hierarchy, into a single loop. Place the pragma in the
body of loop_1
.
set_directive_loop_flatten foo/loop_1
#pragma HLS loop_flatten
Prevents loop flattening in loop_2
of function foo
.
Place the pragma in the body of loop_2
.
set_directive_loop_flatten -off foo/loop_2
#pragma HLS loop_flatten off
See Also
set_directive_loop_merge
Description
Merges all loops into a single loop. Merging loops:
- Reduces the number of clock cycles required in the RTL to transition between the loop-body implementations.
- Allows the loops be implemented in parallel (if possible).
The rules for loop merging are:
- If the loop bounds are variables, they must have the same value (number of iterations).
- If loops bounds are constants, the maximum constant value is used as the bound of the merged loop.
- Loops with both variable bound and constant bound cannot be merged.
- The code between loops to be merged cannot have side effects.
Multiple execution of this code should generate the same results.
a=b
is alloweda=a+1
is not allowed.
- Loops cannot be merged when they contain FIFO reads. Merging changes the order of the reads. Reads from a FIFO or FIFO interface must always be in sequence.
Syntax
set_directive_loop_merge <location>
<location>
is the location (in the formatfunction[/label]
) at which the loops reside.
Options
-force
- Forces loops to be merged even when Vitis HLS issues a warning. You must assure that the merged loop will function correctly.
Examples
Merges all consecutive loops in function foo
into a single loop.
set_directive_loop_merge foo
All loops inside loop_2
of function
foo
(but not loop_2
itself) are merged by using the -force
option.
set_directive_loop_merge -force foo/loop_2
See Also
set_directive_loop_tripcount
Description
The loop tripcount is the total number of iterations performed by a loop. Vitis HLS reports the total latency of each loop (the number of cycles to execute all iterations of the loop). This loop latency is therefore a function of the tripcount (number of loop iterations).
The tripcount can be a constant value. It might depend on the value
of variables used in the loop expression (for example, x<y
) or control statements used inside the loop.
Vitis HLS cannot determine the tripcount in some cases. These cases include, for example, those in which the variables used to determine the tripcount are:
- Input arguments, or
- Variables calculated by dynamic operation
In the following example, the maximum iteration of the for-loop is
determined by the value of input num_samples
. The
value of num_samples
is not defined in the C
function, but comes into the function from the outside.
void foo (num_samples, ...) {
int i;
...
loop_1: for(i=0;i< num_samples;i++) {
...
result = a + b;
}
}
In cases where the loop latency is unknown or cannot be calculated,
set_directive_loop_tripcount
allows you to
specify minimum, maximum, and average iterations for a loop. This lets the tool
analyze how the loop latency contributes to the total design latency in the reports
and helps you determine appropriate optimizations for the design.
Syntax
set_directive_loop_tripcount [OPTIONS] <location>
<location>
is the location of the loop (in the formatfunction[/label]
) at which the tripcount is specified.
Options
-avg <integer>
- Specifies the average number of iterations.
-max <integer>
- Specifies the maximum number of iterations.
-min <integer>
- Specifies the minimum number of iterations.
Examples
loop_1
in function foo
is specified to have a minimum tripcount of 12, and
a maximum tripcount of 16:
set_directive_loop_tripcount -min 12 -max 16 -avg 14 foo/loop_1
See Also
set_directive_occurrence
Description
When pipelining functions or loops, the OCCURRENCE directive specifies that the code in a location is executed at a lower rate than the surrounding function or loop. This allows the code that is executed at the lower rate to be pipelined at a slower rate, and potentially shared within the top-level pipeline. For example:
- A loop iterates N times.
- Part of the loop is protected by a conditional statement and only executes M times, where N is an integer multiple of M.
- The code protected by the conditional is said to have an occurrence of N/M.
Identifying a region with an OCCURRENCE rate allows the functions and loops in this region to be pipelined with an initiation interval that is slower than the enclosing function or loop.
Syntax
set_directive_occurrence [OPTIONS] <location>
<location>
specifies the location with a slower rate of execution.
Options
-cycle <int>
- Specifies the occurrence N/M where:
- N is the number of times the enclosing function or loop is executed.
- M is the number of times the conditional region is executed.
IMPORTANT: N must be an integer multiple of M.
Examples
Region Cond_Region
in function
foo
has an occurrence of 4. It executes at a
rate four times slower than the code that encompasses it.
set_directive_occurrence -cycle 4 foo/Cond_Region
See Also
set_directive_pipeline
Description
Reduces the initiation interval (II) for a function or loop by
allowing the concurrent execution of operations as described in Function and Loop Pipelining. A pipelined function or loop can
process new inputs every N clock cycles, where N is the initiation interval (II
). An II of 1 processes a new input every clock
cycle.
As a default behavior, with the PIPELINE pragma or directive
Vitis HLS will generate the minimum II for
the design according to the specified clock period constraint. The emphasis will be
on meeting timing, rather than on achieving II unless the -II
option is specified.
The default type of pipeline is defined by the config_compile -pipeline_style
command, but can be
overridden in the PIPELINE pragma or directive.
If Vitis HLS cannot create a design with the specified II, it:
- Issues a warning.
- Creates a design with the lowest possible II.
You can then analyze this design with the warning messages to determine what steps must be taken to create a design that satisfies the required initiation interval.
Syntax
set_directive_pipeline [OPTIONS] <location>
Where:
<location>
is the location (in the formatfunction[/label]
) to be pipelined.
Options
-II <integer>
- Specifies the desired initiation interval for the pipeline. Vitis HLS tries to meet this request. Based on data dependencies, the actual result might have a larger II.
-off
- Turns off pipeline for a specific loop or function. This can
be used when
config_compile -pipeline_loops
is used to globally pipeline loops. -rewind
-
Note: Applicable only to a loop.
-style <stp | frp | flp>
- Specifies the type of pipeline to use for the specified
function or loop. For more information on pipeline styles refer to Flushing Pipelines. The types of pipelines include:
stp
- Stall pipeline. Runs only when input data is available otherwise it stalls. This is the default setting, and is the type of pipeline used by Vitis HLS for both loop and function pipelining. Use this when a flushable pipeline is not required. For example, when there are no performance or deadlock issue due to stalls.
flp
- This option defines the pipeline as a flushable pipeline as described in Flushing Pipelines. This type of pipeline typically consumes more resources and/or can have a larger II because resources cannot be shared among pipeline iterations.
frp
- Free-running, flushable pipeline. Runs even when input data is not available. Use this when you need better timing due to reduced pipeline control signal fanout, or when you need improved performance to avoid deadlocks. However, this pipeline style can consume more power as the pipeline registers are clocked even if there is no data.
IMPORTANT: This is a hint not a hard constraint. The tool checks design conditions for enabling pipelining. Some loops might not conform to a particular style and the tool reverts to the default style (stp
) if necessary.
Examples
Function func
is pipelined with the
specified initiation interval.
set_directive_pipeline func II=1
See Also
set_directive_protocol
Description
This commands specifies a region of code, a protocol region, in which no clock operations will be inserted by Vitis HLS unless explicitly specified in the code. Vitis HLS will not insert any clocks between operations in the region, including those which read from or write to function arguments. The order of read and writes will therefore be strictly followed in the synthesized RTL.
io_section
:io_section:{
...
lines of code
...
}
A clock operation can be explicitly specified in C code using an ap_wait()
statement, and can be specified in C++ code
by using the wait()
statement. The ap_wait
and wait
statements have no effect on the simulation of the design.
Syntax
set_directive_protocol [OPTIONS] <location>
The <location>
specifies the location
(in the format function[/label]
) at which the
protocol region is defined.
Options
-mode [floating | fixed]
-
floating
: Lets code statements outside the protocol region overlap and execute in parallel with statements in the protocol region in the final RTL. The protocol region remains cycle accurate, but outside operations can occur at the same time. This is the default mode.fixed
: The fixed mode ensures that statements outside the protocol region do not execute in parallel with the protocol region.
Examples
io_section
in function foo
. The
following directive defines that region as a fixed mode protocol
region:set_directive_protocol -mode fixed foo/io_section
See Also
set_directive_reset
Description
Adds or removes resets for specific state variables (global or static). The reset
port is used to restore the registers and block RAM, connected to the port, to an
initial value any time the reset signal is applied. The presence and behavior of the
RTL reset port is controlled using the config_rtl
settings.
Greater control over reset is provided through the RESET pragma. If a
variable is a static or global, the RESET pragma is used to explicitly add a reset,
or the variable can be removed from the reset by turning off
the pragma. This can be particularly useful when static or global
arrays are present in the design.
Syntax
set_directive_reset [OPTIONS] <location> <variable>
<location>
is the location (in the formatfunction[/label]
) at which the variable is defined.<variable>
is the variable to which the directive is applied.
Options
-off
-
- If
-off
is specified, reset is not generated for the specified variable.
- If
Examples
Adds reset to variable a
in function
foo
even when the global reset setting is
none
or control
.
set_directive_reset foo a
Removes reset from variable static int
a
in function foo
even when
the global reset setting is state
or
all
.
set_directive_reset -off foo a
See Also
set_directive_stable
Description
The STABLE pragma is applied to arguments of a DATAFLOW or PIPELINE region and is used to indicate that an input or output of this region can be ignored when generating the synchronizations at entry and exit of the DATAFLOW region. This means that the reading processes (resp. read accesses) of that argument do not need to be part of the “first stage” of the task-level (resp. fine-grain) pipeline for inputs, and the writing process (resp. write accesses) do not need to be part of the last stage of the task-level (resp. fine-grain) pipeline for outputs.
The pragma can be specified at any point in the hierarchy, on a scalar or an array, and automatically applies to all the DATAFLOW or PIPELINE regions below that point. The effect of STABLE for an input is that a DATAFLOW or PIPELINE region can start another iteration even though the value of the previous iteration has not been read yet. For an output, this implies that a write of the next iteration can occur although the previous iteration is not done.
Syntax
set_directive_stable <location> <variable>
<location>
is the function name or loop name where the directive is to be constrained.<variable>
is the name of the array to be constrained.
Examples
In the following example, without the STABLE directive, proc1
and proc2
would
be synchronized to acknowledge the reading of their inputs (including A
). With the directive, A
is no longer considered as an input that needs synchronization.
void dataflow_region(int A[...], int B[…] ...
proc1(...);
proc2(A, ...);
The directives for this example would be scripted as:
set_directive_stable dataflow_region variable=A
set_directive_dataflow dataflow_region
See Also
set_directive_stream
Description
By default, array variables are implemented as RAM:
- Top-level function array parameters are implemented as a RAM interface port.
- General arrays are implemented as RAMs for read-write access.
- Arrays involved in sub-functions, or loop-based DATAFLOW optimizations are implemented as a RAM ping-pong buffer channel.
If the data stored in the array is consumed or produced in a sequential manner, a
more efficient communication mechanism is to use streaming data, where FIFOs are
used instead of RAMs. When an argument of the top-level function is specified as
INTERFACE type ap_fifo
, the array is automatically
implemented as streaming. See Defining Interfaces for
more information.
volatile
qualifier as described
in Type Qualifiers.Syntax
set_directive_stream [OPTIONS] <location> <variable>
<location>
is the location (in the formatfunction[/label]
) which contains the array variable.<variable>
is the array variable to be implemented as a FIFO.
Options
-depth <integer>
- Note: Relevant only for array streaming in dataflow channels.By default, the depth of the FIFO implemented in the RTL is the same size as the array specified in the C code. This options allows you to modify the size of the FIFO.
When the array is implemented in a DATAFLOW region, it is common to the use the
-depth
option to reduce the size of the FIFO. For example, in aDATAFLOW
region where all loops and functions are processing data at a rate of II = 1, there is no need for a large FIFO because data is produced and consumed in each clock cycle. In this case, the-depth
option may be used to reduce the FIFO size to 2 to substantially reduce the area of the RTL design.This same functionality is provided for all arrays in a DATAFLOW region using the
config_dataflow
command with the-depth
option. The-depth
option used withset_directive_stream
overrides the default specified usingconfig_dataflow
. - -type <arg>
- Specify a mechanism to select between FIFO, PIPO,
synchronized shared (
shared
), and un-synchronized shared (unsync
). The supported types include:fifo
: A FIFO buffer with the specifieddepth
.pipo
: A regular Ping-Pong buffer, withdepth
but without a duplication of the array data. Consistency can be ensured by setting thedepth
small enough, which acts as the distance of synchronization between the producer and consumer.shared
: Specifies that an array local variable or argument in a given scope is viewed as a single shared memory, distributing the available ports to the processes that access it.TIP: The default depth for shared is 1.unsync
: Does not have any synchronization except for individual memory reads and writes. Consistency (read-write and write-write order) must be ensured by the design itself.
Examples
Specifies array A[10]
in function
func
to be streaming and implemented as a
FIFO.
set_directive_stream func A -type fifo
Array B in named loop loop_1
of
function func
is set to streaming with a FIFO depth
of 12. In this case, place the pragma inside loop_1
.
set_directive_stream -depth 12 -type fifo func/loop_1 B
Array C has streaming implemented as a PIPO.
set_directive_stream -type pipo func C
See Also
set_directive_top
Description
Attaches a name to a function, which can then be used by the set_top
command to set the named function as the top.
This is typically used to synthesize member functions of a class in C++.
set_top
command with the new name.Syntax
set_directive_top [OPTIONS] <location>
<location>
is the function to be renamed.
Options
-name <string>
- Specifies the name of the function to be used by the
set_top
command.
Examples
Function foo_long_name
is
renamed to DESIGN_TOP
, which is then
specified as the top-level. If the pragma is placed in the code, the set_top
command must still be issued in the
top-level specified in the GUI project settings.
set_directive_top -name DESIGN_TOP foo_long_name
Followed by the set_top DESIGN_TOP
command.
See Also
set_directive_unroll
Description
Transforms loops by creating multiples copies of the loop body.
A loop is executed for the number of iterations specified by the loop induction variable. The number of iterations may also be impacted by logic inside the loop body (for example, break or modifications to any loop exit variable). The loop is implemented in the RTL by a block of logic representing the loop body, which is executed for the same number of iterations.
The set_directive_unroll
command allows the loop
to be fully unrolled. Unrolling the loop creates as many copies of the loop body in
the RTL as there are loop iterations, or partially unrolled by a factor
N, creating N copies of the loop body
and adjusting the loop iteration accordingly.
If the factor N used for partial unrolling is not an integer multiple of the original loop iteration count, the original exit condition must be checked after each unrolled fragment of the loop body.
To unroll a loop completely, the loop bounds must be known at compile time. This is not required for partial unrolling.
Syntax
set_directive_unroll [OPTIONS] <location>
<location>
is the location of the loop (in theformat function[/label]
) to be unrolled.
Options
-factor <integer>
- Specifies a non-zero integer indicating that partial unrolling is
requested.
The loop body is repeated this number of times. The iteration information is adjusted accordingly.
-skip_exit_check
- Effective only if a factor is specified (partial unrolling).
- Fixed bounds
No exit condition check is performed if the iteration count is a multiple of the factor.
If the iteration count is not an integer multiple of the factor, the tool:- Prevents unrolling.
- Issues a warning that the exit check must be performed to proceed.
- Variable bounds
The exit condition check is removed. You must ensure that:
- The variable bounds is an integer multiple of the factor.
- No exit check is in fact required.
- Fixed bounds
Examples
Unrolls loop L1
in function
foo
. Place the pragma in the body of loop L1
.
set_directive_unroll foo/L1
Specifies an unroll factor of 4 on loop L2
of function foo
. Removes
the exit check. Place the pragma in the body of loop L2
.
set_directive_unroll -skip_exit_check -factor 4 foo/L2
Unrolls all loops inside loop L3
in
function foo
, but not loop L3
itself. The -region
option
specifies the location be considered an enclosing region and not a loop label.
set_directive_unroll -region foo/L3