Synthesizing the Code

To synthesize the active solution of the project, select the Run C Synthesis command in the Flow Navigator, or select the command on the toolbar menu.

Note: When your project has multiple solutions as described in Creating Additional Solutions, you can Run C Synthesis on the active solution, all solutions, or selected solutions using the Solution > Run C Synthesis from the main menu.

The C/C++ source code is synthesized into an RTL implementation. During the synthesis process messages are transcripted to the console window, and to the vitis_hls.log file.

INFO: [HLS 200-1470] Pipelining result : Target II = 1, Final II = 4, Depth = 6.
INFO: [SCHED 204-11] Finished scheduling.
INFO: [HLS 200-111]  Elapsed time: 19.38 seconds; current allocated memory: 397.747 MB.
INFO: [BIND 205-100] Starting micro-architecture generation ...
INFO: [BIND 205-101] Performing variable lifetime analysis.
INFO: [BIND 205-101] Exploring resource sharing.
INFO: [BIND 205-101] Binding ...
INFO: [BIND 205-100] Finished micro-architecture generation.
INFO: [HLS 200-111]  Elapsed time: 0.57 seconds; current allocated memory: 400.218 MB.
INFO: [HLS 200-10] ----------------------------------------------------------------
INFO: [HLS 200-10] -- Generating RTL for module 'dct' 

Within the Vitis HLS IDE, some messages contain links to additional information. The links are highlighted in blue underlined text, and open help messages, source code files, or documents with additional information in some cases. Clicking the messages provides more details on why the message was issued and possible resolutions.

When synthesis completes, the Simplified Synthesis report for the top-level function opens automatically in the information pane as shown in the following figure.

Figure 1: Synthesis Summary Report

You can quickly review the performance metrics displayed in the Simplified Synthesis report to determine if the design meets your requirements. The synthesis report contains information on the following performance metrics:

Issue Type
Shows any issues with the results.
Latency
Number of clock cycles required for the function to compute all output values.
Initiation interval (II)
Number of clock cycles before the function can accept new input data.
Loop iteration latency
Number of clock cycles it takes to complete one iteration of the loop.
Loop iteration interval
Number of clock cycles before the next iteration of the loop starts to process data.
Loop latency
Number of cycles to execute all iterations of the loop.
Resource Utilization
Amount of hardware resources required to implement the design based on the resources available in the FPGA, including look-up tables (LUT), registers, block RAMs, and DSP blocks.

If you specified the Run C Synthesis command on multiple solutions, the Console view reports the synthesis transcript for each of the solutions as they are synthesized. After synthesis has completed, instead of the Simplified Synthesis report, Vitis HLS displays a Report Comparison to compare the synthesis results for all of the synthesized solutions. A portion of this report is shown below.

Figure 2: Report Comparison

Synthesis Summary

When synthesis completes, Vitis HLS generates a Synthesis Summary report for the top-level function that opens automatically in the information pane.

The specific sections of the Synthesis Summary are detailed below.

TIP: Clicking the header line for any of the sections causes the branch to collapse or expand in the report window.

General Information

Provides information on when the report was generated, the version of the software used, the project name, the solution name and target flow, and the technology details.

Figure 3: Synthesis Summary Report

Timing Estimate

Displays a quick estimate of the timing specified by the solution, as explained in Specifying the Clock Frequency. This includes the Target clock period specified, and the period of Uncertainty. The clock period minus the uncertainty results in the Estimated clock period.

TIP: These values are only estimates provided by the user in the solution settings. More accurate estimates can be reported by selecting the Run RTL Synthesis command or Run RTL Place and Route from the Flow Navigator, as explained in Exporting the RTL Design.

Performance & Resource Estimates

The Performance Estimate columns report the latency and initiation interval for the top-level function and any sub-blocks instantiated in the top-level. Each sub-function called at this level in the C/C++ source is an instance in the generated RTL block, unless the sub-function was in-lined into the top-level function using the INLINE pragma or directive, or automatically in-lined.

The Slack column displays any timing issues in the implementation.

The Latency column displays the number of cycles it takes to produce the output, and is also displayed in time (ns). The Initiation Interval is the number of clock cycles before new inputs can be applied. In the absence of any PIPELINE directives, the latency is one cycle less than the initiation interval (the next input is read after the final output is written).

TIP: When latency is displayed as a "?" it means that Vitis HLS cannot determine the number of loop iterations. If the latency or throughput of the design is dependent on a loop with a variable index, Vitis HLS reports the latency of the loop as being unknown. In this case, use the LOOP_TRIPCOUNT pragma or directive to manually specify the number of loop iterations. The LOOP_TRIPCOUNT value is only used to ensure the generated reports show meaningful ranges for latency and interval and does not impact the results of synthesis.

The Iteration Latency is the latency of a single iteration for a loop. The Trip Count column displays the number of iterations a specific loop makes in the implemented hardware. This reflects any unrolling of the loop in hardware.

The Resource Estimate columns of the report indicates the estimated resources needed to implement the software function in the RTL code. Estimates of the BRAM, DSP, FFs, and LUTs are provided.

HW Interfaces

The HW Interfaces section of the synthesis report provides tables for the different hardware interfaces generated during synthesis. The type of hardware interfaces generated by the tool depends on the flow target specified by the solution, as well as any INTERFACE pragmas or directives applied to the code. In the following image, the solution targets the Vitis Kernel flow, and therefore generates AXI interfaces as required.

Figure 4: HW Interfaces

The following should be observed when reviewing these tables:

  • Separate tables are provided for the different interfaces.
  • Columns are provided to display different properties of the interface. For the M_AXI interface, these include the Data Width and Max Widen Bitwidth columns which indicate whether Automatic Port Width Resizing has occurred, and to what extent. In the example above, you can see that the port was widened to 512 bits from the 16 bits specified in the software.
  • The Latency column displays the latency of the interface:
    • In an ap_memory interface, the column displays the read latency of the RAM resource driving the interface.
    • For an m_axi interface, the column displays the expected latency of the AXI4 interface, allowing the design to initiate a bus request a number of cycles (latency) before the read or write is expected.
  • The Bundle column displays any specified bundle names from the INTERFACE pragma or directive.
  • Additional columns display burst and read and write properties of the M_AXI interface as described in set_directive_interface.

SW I/O Information

Highlights how the function arguments from the C/C++ source is associated with the port names in the generated RTL code. Additional details of the software and hardware ports are provided as shown below. Notice that the SW argument is expanded into multiple HW interfaces. For example, the input argument is related to three HW interfaces, the m_axi for data, and the s_axi_lite for required control signals.

Figure 5: SW I/O Information

M_AXI Burst Information

In the M_AXI Burst Information section the Burst Summary table reports the successful burst transfers, with a link to the associated source code. The reported burst length refers to either max_read_burst_length or max_write_burst_length and represents the number of data values read/written during a burst transfer. For example, in a case where the input type is integer (32 bits), and HLS auto-widens the interface to 512 bits, each burst transfers 1024 integers. Because the widened interface can carry 16 integers at a time, the result is 64 beat bursts. The Burst Missed table reports why a particular burst transfer was missed with a link to Guidance messages related to the burst failures to help with resolution.

Figure 6: M_AXI Burst Information

Bind Op and Bind Storage Reports

The Bind Op and Bind Storage reports are added to the Synthesis Summary report. Both reports can help you understand choices made by Vitis HLS when it maps operations to resources. The tool will map operations to the right resources with the right latency. You can influence this process by using the BIND_OP pragma or directive, and requesting a particular resource mapping and latency. The Bind Op report will show which of the mappings were automatically done versus those enforced by the use of a pragma. Similarly, the Bind Storage report shows the mappings of arrays to memory resources on the platform like BRAM/LUTRAM/URAM.

The Bind Op Report displays the implementation details of the kernel or IP. The hierarchy of the top-level function is displayed and variables are listed with any HLS pragmas or directives applied, the operation defined, the implementation used by the HLS tool, and any applied latency.

This report is useful for examining the programmable logic implementation details specified by the RTL design.

Figure 7: Synthesis Summary

As shown above, the Bind OP report highlights certain important characteristics in your design. Currently, it calls out the number of DSPs used in the design and shows in a hierarchy where these DSPs are used in the design. The table also highlights whether the particular resource allocation was done because of a user-specified pragma and if so, a "yes" entry will be present in the Pragma column. If no entry exists in the Pragma column, it means that the resource was auto inferred by the tool. The table also shows the RTL names of the resources allocated for each module in the user's design and you can hierarchy descend down the hierarchy to see the various resources.

It does not show all the inferred resources but instead shows resources of interest such as arithmetic, floating-point, and DSPs. The particular implementation choice of fabric (implemented using LUTs) or DSP is also shown. Finally, the latency of the resource is also shown. This is helpful in understand and increasing the latency of resources if needed to add pipeline stages to the design. This is extremely useful when attempting to break a long combinational path when trying to solve timing issues during implementation.

Each resource allocation is correlated to the source code line where the corresponding op was inferred from and the user can right-click on the resource and select the "Goto Source" option to see this correlation. Finally, the second table below the Bind Op report illustrates any global config settings that can also alter the resource allocation algorithm used by the tool. In the above example, the implementation choice for a dadd (double precision floating point addition) operation has been fixed to a fulldsp implementation. Similarly, the latency of a ddiv operation has been fixed to 2.

Similar to the BIND_OP pragma, the BIND_STORAGE pragma can be used to select a particular memory type (such as single port or dual port) and/or a particular memory implementation (such as BRAM/LUTRAM/URAM/SRL, etc.) and a latency value. The Bind Storage report highlights the storage mappings used in the design. Currently, it calls out the number of BRAMs and URAMs used in the design. The table also highlights whether the particular storage resource allocation was done because of a user-specified pragma and if so, a "yes" entry will be present in the Pragma column. If no entry exists in the Pragma column, then this means that the storage resource was auto inferred by the tool. The particular storage type, as well as the implementation choice, are also shown along with the variable name and latency.

Using this information, you can review the storage resource allocation in the design and make design choices by altering the eventual storage implementation depending upon availability. Finally, a second table below the Bind Storage report will be shown if there are any global config settings that can also alter the storage resource allocation algorithm used by the tool.

Output of C Synthesis

When synthesis completes, the syn folder is created inside the solution folder. This folder contains the following elements:

  • The verilog and vhdl folders contain the output RTL files.
    • The top-level file has the same name as the top-level function for synthesis.
    • There is one RTL file created for each sub-function that has not been inlined into a higher level function.
    • There could be additional RTL files to implement sub-blocks of the RTL hierarchy, such as block RAM, and pipelined multipliers.
  • The report folder contains a report file for the top-level function and one for every sub-function that has not been in-lined into a higher level function by Vitis HLS. The report for the top-level function provides details on the entire design.
IMPORTANT: You should not use the RTL files generated in the syn/verilog or syn/vhdl folder for synthesis in the Vivado tool. You should instead use the packaged output files for use with the Vitis application acceleration development flow, or the Vivado Design Suite as described in Exporting the RTL Design. In cases where Vitis HLS uses Xilinx IP in the generated RTL code, such as with floating point designs, the verilog and vhdl folders contain a script to create that IP during RTL synthesis by the Xilinx tools. If you use the files in the syn/verilog or syn/vhdl folder directly for RTL synthesis, you must also correctly use any script files present in those folders. If the packaged output is used, this process is performed automatically by the Xilinx tools.

Improving Synthesis Runtime and Capacity

Vitis HLS schedules operations hierarchically. The operations within a loop are scheduled, then the loop, the sub-functions and operations with a function are scheduled. Runtime for Vitis HLS increases when:

  • There are more objects to schedule.
  • There is more freedom and more possibilities to explore.

Vitis HLS schedules objects. Whether the object is a floating-point multiply operation or a single register, it is still an object to be scheduled. The floating-point multiply may take multiple cycles to complete and use many resources to implement but at the level of scheduling it is still one object.

Unrolling loops and partitioning arrays creates more objects to schedule and potentially increases the runtime. Inlining functions creates more objects to schedule at this level of hierarchy and also increases runtime. These optimizations may be required to meet performance but be very careful about simply partitioning all arrays, unrolling all loops and inlining all functions: you can expect a runtime increase. Use the optimization strategies provided earlier and judiciously apply these optimizations.

If the loops must be unrolled, or if the use of the PIPELINE directive in the hierarchy above has automatically unrolled the loops, consider capturing the loop body as a separate function. This will capture all the logic into one function instead of creating multiple copies of the logic when the loop is unrolled: one set of objects in a defined hierarchy will be scheduled faster. Remember to pipeline this function if the unrolled loop is used in pipelined region.

The degrees of freedom in the code can also impact runtime. Consider Vitis HLS to be an expert designer who by default is given the task of finding the design with the highest throughput, lowest latency and minimum area. The more constrained Vitis HLS is, the fewer options it has to explore and the faster it will run. Consider using latency constraints over scopes within the code: loops, functions or regions. Setting a LATENCY directive with the same minimum and maximum values reduces the possible optimization searches within that scope.