C/RTL Co-Simulation in Vitis HLS
If you added a C test bench to the project for simulation purposes, you can also use it for C/RTL co-simulation to verify that the RTL is functionally identical to the C source code. Select the
to verify the RTL results of synthesis.The Co-simulation Dialog box shown in the following figure allows you to select which type of RTL output to use for verification (Verilog or VHDL) and which HDL simulator to use for the simulation.
The dialog box features the following settings:
- Simulator
- Choose from one of the supported HDL simulators in the Vivado Design Suite. Vivado simulator is the default simulator.
- Language
- Specify the use of Verilog or VHDL as the output language for simulation.
- Setup Only
- Create the required simulation files, but do not run the simulation. The simulation executable can be run from a command shell at a later time.
- Optimizing Compile
- Enable optimization to improve the runtime performance, if possible, at the expense of compilation time.
- Input Arguments
- Specify any command-line arguments to the C test bench.
- Dump Trace
- Specifies the level of trace file output written to the
sim/Verilog or sim/VHDL directory of the current solution when
the simulation executes. Options include:
- all
- Output all port and signal waveform data being saved to the trace file.
- port
- Output waveform trace data for the top-level ports only.
- none
- Do not output trace data.
- Random Stall
- Applies a randomized stall for each data transmission.
- Compiled Library Location
- Specifies the directory for the compiled simulation library to use with third-party simulators.
- Extra Options for DATAFLOW
-
- Wave Debug
- Enables waveform visualization of all processes in the RTL simulation. This option is only supported when using Vivado logic simulator. Enabling this will launch the Simulator GUI to let you examine dataflow activity in the waveforms generated by simulation. Refer to the Vivado Design Suite User Guide: Logic Simulation (UG900) for more information on that tool.
- Disable Deadlock Detection
- Disables deadlock detection, and opening the Cosim Deadlock Viewer in co-simulation.
- Channel (PIPO/FIFO) Profiling
- Enables capturing profile data for display in the Dataflow Viewer.
- Dynamic Deadlock Prevention
- Prevent deadlocks by enabling automatic FIFO channel size tuning for dataflow profiling during co-simulation.
After the C/RTL co-simulation completes, the console displays the following messages to confirm the verification was successful:
INFO: [Common 17-206] Exiting xsim ...
INFO: [COSIM 212-316] Starting C post checking ...
...
Test passed !
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
Finished C/RTL cosimulation.
Any printf
commands in the C test bench
are also echoed to the console during simulation.
As described in Writing a Test Bench, the
test bench verifies output from the top-level function for synthesis, and returns zero
to the main()
function of the test bench if the output
is correct. Vitis HLS uses the same return value for both C
simulation and C/RTL co-simulation to determine if the results are correct. If the C
test bench returns a non-zero value, Vitis HLS reports that the
simulation failed.
The Vitis HLS GUI automatically switches to the Analysis perspective after simulation and opens the Cosimulation Report showing the pass or fail status and the measured statistics on latency and II. Any additional reports that are generated, such as the Dataflow report, are also opened in the Analysis perspective.
The Cosimulation Report displays the full design hierarchy, and if Channel (PIPO/FIFO) Profiling is enabled, you will be able to see details of the dataflow regions as well.
Output of C/RTL Co-Simulation
sim
folder is created inside the solution folder. This folder
contains the following elements: - The sim/report folder contains the report and log file for each type of RTL simulated.
- A verification folder named
sim/verilog
orvhdl
is created for each RTL language that is verified.- The RTL files used for simulation are stored in the
verilog
orvhdl
folder. - The RTL simulation is executed in the verification folder.
- Any outputs, such as trace files and waveform files, are written to
the
verilog
orvhdl
folder.
- The RTL files used for simulation are stored in the
- Additional folders
sim/autowrap
,tv
,wrap
andwrap_pc
are work folders used by Vitis HLS. There are no user files in these folders.
Automatically Verifying the RTL
C/RTL co-simulation uses a C test bench, running the main()
function, to automatically verify the RTL design running in
behavioral simulation. The C/RTL verification process consists of three phases:
- The C simulation is executed and the inputs to the top-level function, or the Design-Under-Test (DUT), are saved as “input vectors.”
- The “input vectors” are used in an RTL simulation using the RTL created by Vitis HLS in Vivado simulator, or a supported third-party HDL simulator. The outputs from the RTL, or results of simulation, are saved as “output vectors.”
- The “output vectors” from the RTL simulation are returned to the
main()
function of the C test bench to verify the results are correct. The C test bench performs verification of the results, in some cases by comparing to known good results.
The following messages are output by Vitis HLS as verification progresses:
While running C simulation:
INFO: [COSIM 212-14] Instrumenting C test bench ...
Build using ".../bin/g++"
Compiling dct_test.cpp_pre.cpp.tb.cpp
Compiling dct_inline.cpp_pre.cpp.tb.cpp
Compiling apatb_dct.cpp
Generating cosim.tv.exe
INFO: [COSIM 212-302] Starting C TB testing ...
Test passed !
At this stage, because the C simulation was executed, any messages written by the C test bench will be output to the Console window and log file.
While running RTL simulation:
INFO: [COSIM 212-333] Generating C post check test bench ...
INFO: [COSIM 212-12] Generating RTL test bench ...
INFO: [COSIM 212-1] *** C/RTL co-simulation file generation completed. ***
INFO: [COSIM 212-323] Starting verilog/vhdl simulation.
INFO: [COSIM 212-15] Starting XSIM ...
At this stage, any messages from the RTL simulation are output in console window or log file.
While checking results back in the C test bench:
INFO: [COSIM 212-316] Starting C post checking ...
Test passed !
INFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
The following are requirements of C/RTL co-simulation:
- The test bench must be self-checking as described in Writing a Test Bench, and return a value of 0 if the test passes or returns a non-zero value if the test fails.
- Any third-party simulators must be available in the search path to be launched by Vitis HLS.
- Interface Synthesis Requirements must be met.
- Any
arrays
orstructs
on the design interface cannot use the optimization directives listed in Unsupported Optimizations for Co-Simulation. - IP simulation libraries must be compiled for use with third-party simulators as described in Simulating IP Cores.
Interface Synthesis Requirements
To use the C/RTL co-simulation feature to verify the RTL design, at least one of the following conditions must be true:
- Top-level function must be synthesized using an
ap_ctrl_chain
orap_ctrl_hs
block-level protocol - Design must be purely combinational
- Top-level function must have an initiation interval of 1
- Interfaces must be all arrays that are streaming and implemented with
axis
orap_hs
interface modesNote: Thehls::stream
variables are automatically implemented asap_fifo
interfaces.
If at least one of these conditions is not met, C/RTL co-simulation halts with the following message:
@E [SIM-345] Cosim only supports the following 'ap_ctrl_none' designs: (1)
combinational designs; (2) pipelined design with task interval of 1; (3) designs with
array streaming or hls_stream ports.
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***
ap_ctrl_none
and the design
contains any hls::stream
variables which employ non-blocking behavior,
C/RTL co-simulation is not guaranteed to
complete.If any top-level function argument is specified as an AXI4-Lite interface, the function return must also be specified as an AXI4-Lite interface.
Verification of DATAFLOW and DEPENDENCE
C/RTL co-simulation automatically verifies aspects of the DATAFLOW and DEPENDENCE directives.
If the DATAFLOW directive is used to pipeline tasks, it
inserts channels between the tasks to facilitate the flow of
data between them. It is typical for the channels to be
implemented with FIFOs and the FIFO depth specified using the
STREAM directive, or the config_dataflow
command. If a FIFO depth
is too small, the RTL simulation can stall. For example, if a
FIFO is specified with a depth of 2 but the producer task writes
three values before any data values are read by the consumer
task, the FIFO blocks the producer. In some conditions this can
cause the entire design to stall as described in Cosim Deadlock Viewer.
In this case, C/RTL co-simulation issues a message as shown below, indicating the channel in the DATAFLOW region is causing the RTL simulation to stall.
//////////////////////////////////////////////////////////////////////////////
// ERROR!!! DEADLOCK DETECTED at 1292000 ns! SIMULATION WILL BE STOPPED! //
//////////////////////////////////////////////////////////////////////////////
/////////////////////////
// Dependence cycle 1:
// (1): Process: hls_fft_1kxburst.fft_rank_rad2_nr_man_9_U0
// Channel: hls_fft_1kxburst.stage_chan_in1_0_V_s_U, FULL
// Channel: hls_fft_1kxburst.stage_chan_in1_1_V_s_U, FULL
// Channel: hls_fft_1kxburst.stage_chan_in1_0_V_1_U, FULL
// Channel: hls_fft_1kxburst.stage_chan_in1_1_V_1_U, FULL
// (2): Process: hls_fft_1kxburst.fft_rank_rad2_nr_man_6_U0
// Channel: hls_fft_1kxburst.stage_chan_in1_2_V_s_U, EMPTY
// Channel: hls_fft_1kxburst.stage_chan_in1_2_V_1_U, EMPTY
/////////////////////////////////
// Total 1 cycles detected!
/////////////////////////////////////////////////////////////
If co-simulation is attempted from the Vitis HLS IDE and the simulation results in a deadlock, the Vitis HLS IDE will automatically launch the Dataflow Viewer and show the processes involved in the deadlock (displayed in red). It will also show which channels are full (in red) versus empty (in white). In this case, review the implementation of the channels between the tasks and ensure any FIFOs are large enough to hold the data being generated.
In a similar manner, the RTL test bench is also configured to automatically check the validity of false dependencies specified using the DEPENDENCE directive. A warning message during co-simulation indicates the dependency is not false, and the corresponding directive must be removed to achieve a functionally valid design.
-disable_deadlock_detection
option of the cosim_design
command disables these checks. Unsupported Optimizations for Co-Simulation
For Vivado IP mode, automatic RTL verification does not support cases where multiple transformations are performed on arrays on the interface, or arrays within structs.
In order for automatic verification to be performed, arrays on the function interface, or array inside structs on the function interface, can use any of the following optimizations, but not two or more:
- Vertical mapping on arrays of the same size
- Reshape
- Partition, for dimension 1 of the array
Automatic RTL verification does not support any of the following optimizations used on a top-level function interface:
- Horizontal mapping.
- Vertical mapping of arrays of different sizes.
- Conditional access on the AXI4-Stream with register slice enabled.
- Mapping arrays to streams.
Simulating IP Cores
When the design is implemented with floating-point cores, bit-accurate models of the floating-point cores must be made available to the RTL simulator. This is automatically accomplished if the RTL simulation is performed using the Vivado logic simulator. However, for supported third-party HDL simulators, the Xilinx floating-point library must be pre-compiled and added to the simulator libraries.
For example, to compile the Xilinx floating-point library in Verilog for use with the VCS simulator, open the Vivado IDE and enter the following command in the Tcl Console window:
compile_simlib -simulator vcs_mx -family all -language verilog
This creates the floating-point library in the current directory for VCS. See the Vivado Tcl Console window for the directory name. In this example, it is ./rev3_1.
You must refer to this library from within theVitis HLS IDE by specifying the Compiled Library Location field in the Co-simulation dialog box as described in C/RTL Co-Simulation in Vitis HLS, or by running C/RTL co-simulation using the following command:
cosim_design -tool vcs -compiled_library_dir <path_to_library>/rev3_1
Analyzing RTL Simulations
When the C/RTL co-simulation completes, the simulation report opens and shows the measured latency and II. These results may differ from values reported after HLS synthesis, which are based on the absolute shortest and longest paths through the design. The results provided after C/RTL co-simulation show the actual values of latency and II for the given simulation data set (and may change if different input stimuli is used).
In non-pipelined designs, C/RTL co-simulation measures latency between ap_start
and ap_done
signals. The II is 1 more than the latency, because the design reads new inputs 1 cycle
after all operations are complete. The design only starts the next transaction after the
current transaction is complete.
In pipelined designs, the design might read new inputs before the first
transaction completes, and there might be multiple ap_start
and ap_ready
signals before a
transaction completes. In this case, C/RTL co-simulation measures the latency as the
number of cycles between data input values and data output values. The II is the number
of cycles between ap_ready
signals, which the design
uses to requests new inputs.
Viewing Simulation Waveforms
To view waveform data during RTL co-simulation, you must enable the following in the Co-simulation Dialog box:
- Select Vivado XSIM as the RTL simulator.
- Enable Dump Trace with either the port or all options.
Vivado simulator GUI opens and displays all the processes in the RTL design. Visualizing the active processes within the HLS design allows detailed profiling of process activity and duration within each activation of the top module. The visualization helps you to analyze individual process performance, as well as the overall concurrent execution of independent processes. Processes dominating the overall execution have the highest potential to improve performance, provided process execution time can be reduced.
This visualization is divided into two sections:
- HLS process summary contains a hierarchical
representation of the activity report for all processes.
- DUT name
- <name>
- Function
- <function name>
- Dataflow analysis provides detailed activity information
about the tasks inside the dataflow region.
- DUT name
- <name>
- Function
- <function name>
- Dataflow/Pipeline Activity
- Shows the number of parallel executions of the function when implemented as a dataflow process.
- Active Iterations
- Shows the currently active iterations of the dataflow. The number of rows is dynamically incremented to accommodate for the visualization of any concurrent execution.
- StallNoContinue
- A stall signal that tells if there were any output stalls experienced by the dataflow processes (the function is done, but it has not received a continue from the adjacent dataflow process).
- RTL Signals
- The underlying RTL control signals that interpret the transaction view of the dataflow process.
After C/RTL co-simulation completes, you can reopen the RTL waveforms in the Vivado IDE by clicking the Open Wave Viewer toolbar button, or selecting .
Cosim Deadlock Viewer
A deadlock is a situation in which processes inside a DATAFLOW region share the same channels, effectively preventing each other from writing or reading from it, resulting in both processes getting stuck. This scenario is common when there are either FIFO’s or a mix of PIPOs and FIFOs as channels inside the DATAFLOW.
The deadlock viewer visualizes this deadlock scenario on the static dataflow viewer. It highlights the problematic processes and channels. The viewer also provides a cross-probing capability to link between the problematic dataflow channels and the associated source code. The user can use the information in solving the issue with less time and effort. The viewer automatically opens only after, the co-simulation detects the deadlock situation and the co-sim run has finished.
A small example is shown below. The dataflow region consists of two processes
which are communicating through PIPO and FIFO. The first loop in proc_1
writes 10 data items in data_channel1
, before writing anything in
data_array
. Because of the insufficient FIFO depth the
data_channel
loop does not complete which blocks the rest of the process.
Then proc_2
blocks because it cannot read the data from
data_channel2
(because it is empty), and cannot remove data from
data_channel1
. This creates a deadlock that requires increasing the size of
data_channel1
to at least 10.
void example(hls::stream<data_t>& A, hls::stream<data_t>& B){
#pragma HLS dataflow
..
..
hls::stream<int> data_channel;
int data_array[10];
#pragma HLS STREAM variable=data_channel depth=8 dim=1
proc_1(A, data_channel, data_array);
proc_2(B, data_channel, data_array);
}
void proc_1(hls::stream<data_t>& A, hls::stream<int>& data_channel, int data_array[10]){
…
for(i = 0; i < 10; i++){
tmp = A.read();
tmp.data = tmp.data.to_int();
data_channel.write(tmp.data);
}
for(i = 0; i < 10; i++){
data_array[i] = i + tmp.data.to_int();
}
}
void proc_2(hls::stream<data_t>& B, hls::stream<int>& data_channel, int data_array[10]){
int i;
..
..
for(i = 0; i < 10; i++){
if (i == 0){
tmp.data = data_channel.read() + data_array[5];
}
else {
tmp.data = data_channel.read();
}
B.write(tmp);
}
///////////////////////////////////////////////////////////////////////////////////
// Inter-Transaction Progress: Completed Transaction / Total Transaction
// Intra-Transaction Progress: Measured Latency / Latency Estimation * 100%
//
// RTL Simulation : "Inter-Transaction Progress" ["Intra-Transaction Progress"] @ "Simulation Time"
////////////////////////////////////////////////////////////////////////////////////
// RTL Simulation : 0 / 1 [0.00%] @ "105000"
//////////////////////////////////////////////////////////////////////////////
// ERROR!!! DEADLOCK DETECTED at 132000 ns! SIMULATION WILL BE STOPPED! //
//////////////////////////////////////////////////////////////////////////////
/////////////////////////
// Dependence cycle 1:
// (1): Process: example_example.proc_1_U0
// Channel: example_example.data_channel1_U, FULL
// (2): Process: example_example.proc_2_U0
// Channel: example_example.data_array_U, EMPTY
////////////////////////////////////////////////////////////////////////
// Totally 1 cycles detected!
////////////////////////////////////////////////////////////////////////
Debugging C/RTL Co-Simulation
When C/RTL co-simulation completes, Vitis HLS typically indicates that the simulations passed and the functionality of the RTL design matches the initial C code. When the C/RTL co-simulation fails, Vitis HLS issues the following message:
@E [SIM-4] *** C/RTL co-simulation finished: FAIL ***
Following are the primary reasons for a C/RTL co-simulation failure:
- Incorrect environment setup
- Unsupported or incorrectly applied optimization directives
- Issues with the C test bench or the C source code
To debug a C/RTL co-simulation failure, run the checks described in the following sections. If you are unable to resolve the C/RTL co-simulation failure, see Xilinx Support for support resources, such as answers, documentation, downloads, and forums.
Setting Up the Environment
Check the environment setup as shown in the following table.
Questions | Actions to Take |
---|---|
Are you using a third-party simulator? | Ensure the path to the simulator executable is specified in the system search path. When using the Vivado simulator, you do not need to specify a search path. Ensure that you have compiled the simulation libraries as discussed in Simulating IP Cores. |
Are you running Linux? | Ensure that your setup files (for example .cshrc or .bashrc ) do not
have a change directory command. When C/RTL co-simulation starts, it
spawns a new shell process. If there is a cd command in your setup files, it causes the shell to run
in a different location and eventually C/RTL co-simulation
fails. |
Optimization Directives
Check the optimization directives as shown in the following table.
Questions | Actions to Take |
---|---|
Are you using the DEPENDENCE directive? | Remove the DEPENDENCE directives from the design to see if C/RTL
co-simulation passes. If co-simulation passes, it likely indicates that the TRUE or FALSE setting for the DEPENDENCE directive is incorrect as discussed in Verification of DATAFLOW and DEPENDENCE. |
Does the design use volatile pointers on the top-level
interface? |
Ensure the DEPTH option is specified on the INTERFACE directive. When |
Are you using FIFOs with the DATAFLOW optimization? | Check to see if C/RTL co-simulation passes with the standard ping-pong
buffers. Check to see if C/RTL co-simulation passes without specifying the size for the FIFO channels. This ensures that the channel defaults to the size of the array in the C code. Reduce the size of the FIFO channels until C/RTL co-simulation stalls. Stalling indicates a channel size that is too small. Review your design to determine the optimal size for the FIFOs. You can use the STREAM directive to specify the size of individual FIFOs. |
Are you using supported interfaces? | Ensure you are using supported interface modes. For details, see Interface Synthesis Requirements. |
Are you applying multiple optimization directives to arrays on the interface? | Ensure you are using optimizations that are designed to work together. For details, see Unsupported Optimizations for Co-Simulation. |
Are you using arrays on the interface that are mapped to streams? | To use interface-level streaming (the top-level function of the DUT),
use hls::stream . |
C Test Bench and C Source Code
Check the C test bench and C source code as shown in the following table.
Questions | Actions to Take |
---|---|
Does the C test bench check the results and return the value 0 (zero) if the results are correct? | Ensure the C test bench returns the value 0 for C/RTL co-simulation. Even if the results are correct, the C/RTL co-simulation feature reports a failure if the C test bench fails to return the value 0. |
Is the C test bench creating input data based on a random number? | Change the test bench to use a fixed seed for any random number generation. If the seed for random number generation is based on a variable, such as a time-based seed, the data used for simulation is different each time the test bench is executed, and the results can vary. |
Are you using pointers on the top-level interface that are accessed multiple times? | Use a volatile pointer
for any pointer that is accessed multiple times within a single transaction
(one execution of the C function). If you do not use a volatile
pointer, everything except the first read and last write is optimized
out to adhere to the C standard. |
Does the C code contain undefined values or perform out-of-bounds array accesses? | Confirm all arrays are correctly sized to match all accesses. Loop bounds that exceed the size of the array are a common source of issues (for example, N accesses for an array sized at N-1). Confirm that the results of the C simulation are as expected and that output values were not assigned random data values. Consider using the industry-standard It is possible for a C function to execute and complete even if some variables are undefined or are out-of-bounds. In the C simulation, undefined values are assigned a random number. In the RTL simulation, undefined values are assigned an unknown or X value. |
Are you using floating-point math operations in the design? | Check that the C test bench results are within an acceptable error range instead of performing an exact comparison. For some of the floating point math operations, the RTL implementation is not identical to the C. For details, see Verification and Math Functions. Ensure that the RTL simulation models for the floating-point cores are provided to the third-party simulator. For details, see Simulating IP Cores. |
Are you using Xilinx IP blocks and a third-party simulator? | Ensure that the path to the Xilinx IP simulation models is provided to the third-party simulator. |
Are you using the hls::stream
construct in the design that changes the data rate (for example, decimation
or interpolation)? |
Analyze the design and use the STREAM directive
to increase the size of the FIFOs used to implement the By default, an |
Are you using very large data sets in the simulation? | Use the The C/RTL co-simulation feature verifies all transaction at one time. If the top-level function is called multiple times (for example, to simulate multiple frames of video), the data for the entire simulation input and output is stored on disk. Depending on the machine setup and OS, this might cause performance or execution issues. |