HLS IP Libraries

Vitis™ HLS provides C++ libraries to implement a number of Xilinx® IP blocks. The C libraries allow the following Xilinx IP blocks to be directly inferred from the C++ source code ensuring a high-quality implementation in the FPGA.

Table 1. HLS IP Libraries
Library Header File Description
hls_fft.h Allows the Xilinx LogiCORE IP FFT to be simulated in C and implemented using the Xilinx LogiCORE block.
hls_fir.h Allows the Xilinx LogiCORE IP FIR to be simulated in C and implemented using the Xilinx LogiCORE block.
hls_dds.h Allows the Xilinx LogiCORE IP DDS to be simulated in C and implemented using the Xilinx LogiCORE block.
ap_shift_reg.h Provides a C++ class to implement a shift register which is implemented directly using a Xilinx SRL primitive.

FFT IP Library

The Xilinx FFT IP block can be called within a C++ design using the library hls_fft.h. This section explains how the FFT can be configured in your C++ code.

Note: Xilinx highly recommends that you review the Fast Fourier Transform LogiCORE IP Product Guide (PG109) for information on how to implement and use the features of the IP.

To use the FFT in your C++ code:

  1. Include the hls_fft.h library in the code
  2. Set the default parameters using the predefined struct hls::ip_fft::params_t
  3. Define the runtime configuration
  4. Call the FFT function
  5. Optionally, check the runtime status

The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.

First, include the FFT library in the source code. This header file resides in the include directory in the Vitis HLS installation area which is automatically searched when Vitis HLS executes.

#include "hls_fft.h"

Define the static parameters of the FFT. This includes such things as input width, number of channels, type of architecture. which do not change dynamically. The FFT library includes a parameterization struct hls::ip_fft::params_t, which can be used to initialize all static parameters with default values.

In this example, the default values for output ordering and the widths of the configuration and status ports are over-ridden using a user-defined struct param1 based on the predefined struct.

struct param1 : hls::ip_fft::params_t {
    static const unsigned ordering_opt = hls::ip_fft::natural_order;
    static const unsigned config_width = FFT_CONFIG_WIDTH;
    static const unsigned status_width = FFT_STATUS_WIDTH;
};

Define types and variables for both the runtime configuration and runtime status. These values can be dynamic and are therefore defined as variables in the C code which can change and are accessed through APIs.

typedef hls::ip_fft::config_t<param1> config_t;
typedef hls::ip_fft::status_t<param1> status_t;
config_t fft_config1;
status_t fft_status1;

Next, set the runtime configuration. This example sets the direction of the FFT (Forward or Inverse) based on the value of variable “direction” and also set the value of the scaling schedule.

fft_config1.setDir(direction);
fft_config1.setSch(0x2AB);

Call the FFT function using the HLS namespace with the defined static configuration (param1 in this example). The function parameters are, in order, input data, output data, output status and input configuration.

hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);

Finally, check the output status. This example checks the overflow flag and stores the results in variable “ovflo”.

    *ovflo = fft_status1->getOvflo();

Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FFT.

FFT Static Parameters

The static parameters of the FFT define how the FFT is configured and specifies the fixed parameters such as the size of the FFT, whether the size can be changed dynamically, whether the implementation is pipelined or radix_4_burst_io.

The hls_fft.h header file defines a struct hls::ip_fft::params_t which can be used to set default values for the static parameters. If the default values are to be used, the parameterization struct can be used directly with the FFT function.

hls::fft<hls::ip_fft::params_t >  
    (xn1, xk1, &fft_status1, &fft_config1);

A more typical use is to change some of the parameters to non-default values. This is performed by creating a new user-defined parameterization struct based on the default parameterization struct and changing some of the default values.

In the following example, a new user struct my_fft_config is defined with a new value for the output ordering (changed to natural_order). All other static parameters to the FFT use the default values.

struct my_fft_config : hls::ip_fft::params_t {
    static const unsigned ordering_opt = hls::ip_fft::natural_order;
};

hls::fft<my_fft_config >  
     (xn1, xk1, &fft_status1, &fft_config1);

The values used for the parameterization struct hls::ip_fft::params_t are explained in FFT Struct Parameters. The default values for the parameters and a list of possible values are provided in FFT Struct Parameter Values.

Note: Xilinx highly recommends that you review the Fast Fourier Transform LogiCORE IP Product Guide (PG109) for details on the parameters and the implication for their settings.

FFT Struct Parameters

Table 2. FFT Struct Parameters
Parameter Description
input_width Data input port width.
output_width Data output port width.
status_width Output status port width.
config_width Input configuration port width.
max_nfft The size of the FFT data set is specified as 1 << max_nfft.
has_nfft Determines if the size of the FFT can be runtime configurable.
channels Number of channels.
arch_opt The implementation architecture.
phase_factor_width Configure the internal phase factor precision.
ordering_opt The output ordering mode.
ovflo Enable overflow mode.
scaling_opt Define the scaling options.
rounding_opt Define the rounding modes.
mem_data Specify using block or distributed RAM for data memory.
mem_phase_factors Specify using block or distributed RAM for phase factors memory.
mem_reorder Specify using block or distributed RAM for output reorder memory.
stages_block_ram Defines the number of block RAM stages used in the implementation.
mem_hybrid When block RAMs are specified for data, phase factor, or reorder buffer, mem_hybrid specifies where or not to use a hybrid of block and distributed RAMs to reduce block RAM count in certain configurations.
complex_mult_type Defines the types of multiplier to use for complex multiplications.
butterfly_type Defines the implementation used for the FFT butterfly.

When specifying parameter values which are not integer or boolean, the HLS FFT namespace should be used.

For example, the possible values for parameter butterfly_type in the following table are use_luts and use_xtremedsp_slices. The values used in the C program should be butterfly_type = hls::ip_fft::use_luts and butterfly_type = hls::ip_fft::use_xtremedsp_slices.

FFT Struct Parameter Values

The following table covers all features and functionality of the FFT IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.

Table 3. FFT Struct Parameter Values
Parameter C Type Default Value Valid Values
input_width unsigned 16 8-34
output_width unsigned 16 input_width to (input_width + max_nfft + 1)
status_width unsigned 8 Depends on FFT configuration
config_width unsigned 16 Depends on FFT configuration
max_nfft unsigned 10 3-16
has_nfft bool false True, False
channels unsigned 1 1-12
arch_opt unsigned pipelined_streaming_io automatically_select

pipelined_streaming_io

radix_4_burst_io

radix_2_burst_io

radix_2_lite_burst_io

phase_factor_width unsigned 16 8-34
ordering_opt unsigned bit_reversed_order bit_reversed_order

natural_order

ovflo bool true false

true

scaling_opt unsigned scaled scaled

unscaled

block_floating_point

rounding_opt unsigned truncation truncation

convergent_rounding

mem_data unsigned block_ram block_ram

distributed_ram

mem_phase_factors unsigned block_ram block_ram

distributed_ram

mem_reorder unsigned block_ram block_ram

distributed_ram

stages_block_ram unsigned (max_nfft < 10) ? 0 :

(max_nfft - 9)

0-11
mem_hybrid bool false false

true

complex_mult_type unsigned use_mults_resources use_luts

use_mults_resources

use_mults_performance

butterfly_type unsigned use_luts use_luts

use_xtremedsp_slices

FFT Runtime Configuration and Status

The FFT supports runtime configuration and runtime status monitoring through the configuration and status ports. These ports are defined as arguments to the FFT function, shown here as variables fft_status1 and fft_config1:

hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);

The runtime configuration and status can be accessed using the predefined structs from the FFT C library:

  • hls::ip_fft::config_t<param1>
  • hls::ip_fft::status_t<param1>
Note: In both cases, the struct requires the name of the static parameterization struct, shown in these examples as param1. Refer to the previous section for details on defining the static parameterization struct.

The runtime configuration struct allows the following actions to be performed in the C code:

  • Set the FFT length, if runtime configuration is enabled
  • Set the FFT direction as forward or inverse
  • Set the scaling schedule

The FFT length can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Set FFT length to 512 => log2(512) =>9
fft_config1-> setNfft(9);
IMPORTANT: The length specified during runtime cannot exceed the size defined by max_nfft in the static configuration.

The FFT direction can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Forward FFT
fft_config1->setDir(1);
// Inverse FFT 
fft_config1->setDir(0);

The FFT scaling schedule can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
fft_config1->setSch(0x2AB);

The output status port can be accessed using the pre-defined struct to determine:

  • If any overflow occurred during the FFT
  • The value of the block exponent

The FFT overflow mode can be checked as follows:

typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Check the overflow flag
bool *ovflo = fft_status1->getOvflo();
IMPORTANT: After each transaction completes, check the overflow status to confirm the correct operation of the FFT.

And the block exponent value can be obtained using:

typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Obtain the block exponent
unsigned int *blk_exp = fft_status1-> getBlkExp();

Using the FFT Function

The FFT function is defined in the HLS namespace and can be called as follows:

hls::fft<STATIC_PARAM> (
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY, 
OUTPUT_STATUS, 
INPUT_RUN_TIME_CONFIGURATION);

The STATIC_PARAM is the static parameterization struct that defines the static parameters for the FFT.

Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY and OUTPUT_DATA_ARRAY). In the final implementation, the ports on the FFT RTL block will be implemented as AXI4-Stream ports. Xilinx recommends always using the FFT function in a region using dataflow optimization (set_directive_dataflow), because this ensures the arrays are implemented as streaming arrays. An alternative is to specify both arrays as streaming using the set_directive_stream command.

IMPORTANT: The FFT cannot be used in a region which is pipelined. If high-performance operation is required, pipeline the loops or functions before and after the FFT then use dataflow optimization on all loops and functions in the region.

The data types for the arrays can be float or ap_fixed.

typedef float data_t;
complex<data_t> xn[FFT_LENGTH];
complex<data_t> xk[FFT_LENGTH];

To use fixed-point data types, the Vitis HLS arbitrary precision type ap_fixed should be used.

#include "ap_fixed.h"
typedef ap_fixed<FFT_INPUT_WIDTH,1> data_in_t;
typedef ap_fixed<FFT_OUTPUT_WIDTH,FFT_OUTPUT_WIDTH-FFT_INPUT_WIDTH+1> data_out_t;
#include <complex>
typedef hls::x_complex<data_in_t> cmpxData;
typedef hls::x_complex<data_out_t> cmpxDataOut;

In both cases, the FFT should be parameterized with the same correct data sizes. In the case of floating point data, the data widths will always be 32-bit and any other specified size will be considered invalid.

IMPORTANT: The input and output width of the FFT can be configured to any arbitrary value within the supported range. The variables which connect to the input and output parameters must be defined in increments of 8-bit. For example, if the output width is configured as 33-bit, the output variable must be defined as a 40-bit variable.

The multichannel functionality of the FFT can be used by using two-dimensional arrays for the input and output data. In this case, the array data should be configured with the first dimension representing each channel and the second dimension representing the FFT data.

typedef float data_t;
static complex<data_t> xn[CHANNEL][FFT_LENGTH];
static complex<data_t> xk[CHANELL][FFT_LENGTH];

The FFT core consumes and produces data as interleaved channels (for example, ch0-data0, ch1-data0, ch2-data0, etc, ch0-data1, ch1-data1, ch2-data2, etc.). Therefore, to stream the input or output arrays of the FFT using the same sequential order that the data was read or written, you must fill or empty the two-dimensional arrays for multiple channels by iterating through the channel index first, as shown in the following example:

cmpxData   in_fft[FFT_CHANNELS][FFT_LENGTH];
cmpxData  out_fft[FFT_CHANNELS][FFT_LENGTH];
 
// Write to FFT Input Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
 for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
 in_fft[j][i] = in.read().data;
 }
}
   
// Read from FFT Output Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
 for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
 out.data = out_fft[j][i];
 
 }
}

Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FFT.

FIR Filter IP Library

The Xilinx FIR IP block can be called within a C++ design using the library hls_fir.h. This section explains how the FIR can be configured in your C++ code.

Note: Xilinx highly recommends that you review the FIR Compiler LogiCORE IP Product Guide (PG149) for information on how to implement and use the features of the IP.

To use the FIR in your C++ code:

  1. Include the hls_fir.h library in the code.
  2. Set the static parameters using the predefined struct hls::ip_fir::params_t.
  3. Call the FIR function.
  4. Optionally, define a runtime input configuration to modify some parameters dynamically.

The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.

First, include the FIR library in the source code. This header file resides in the include directory in the Vitis HLS installation area. This directory is automatically searched when Vitis HLS executes. There is no need to specify the path to this directory if compiling inside Vitis HLS.

#include "hls_fir.h"

Define the static parameters of the FIR. This includes such static attributes such as the input width, the coefficients, the filter rate (single, decimation, hilbert). The FIR library includes a parameterization struct hls::ip_fir::params_t which can be used to initialize all static parameters with default values.

In this example, the coefficients are defined as residing in array coeff_vec and the default values for the number of coefficients, the input width and the quantization mode are over-ridden using a user a user-defined struct myconfig based on the predefined struct.

struct myconfig : hls::ip_fir::params_t {
static const double coeff_vec[sg_fir_srrc_coeffs_len];
    static const unsigned num_coeffs = sg_fir_srrc_coeffs_len;
    static const unsigned input_width = INPUT_WIDTH; 
    static const unsigned quantization = hls::ip_fir::quantize_only;
};

Create an instance of the FIR function using the HLS namespace with the defined static parameters (myconfig in this example) and then call the function with the run method to execute the function. The function arguments are, in order, input data and output data.

static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out);

Optionally, a runtime input configuration can be used. In some modes of the FIR, the data on this input determines how the coefficients are used during interleaved channels or when coefficient reloading is required. This configuration can be dynamic and is therefore defined as a variable. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).

When the runtime input configuration is used, the FIR function is called with three arguments: input data, output data and input configuration.

// Define the configuration type
typedef ap_uint<8> config_t;
// Define the configuration variable
config_t fir_config = 8;
// Use the configuration in the FFT
static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out, &fir_config);

Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FIR.

FIR Static Parameters

The static parameters of the FIR define how the FIR IP is parameterized and specifies non-dynamic items such as the input and output widths, the number of fractional bits, the coefficient values, the interpolation and decimation rates. Most of these configurations have default values: there are no default values for the coefficients.

The hls_fir.h header file defines a struct hls::ip_fir::params_t that can be used to set the default values for most of the static parameters.

IMPORTANT: There are no defaults defined for the coefficients. Therefore, Xilinx does not recommend using the pre-defined struct to directly initialize the FIR. A new user defined struct which specifies the coefficients should always be used to perform the static parameterization.

In this example, a new user struct my_config is defined and with a new value for the coefficients. The coefficients are specified as residing in array coeff_vec. All other parameters to the FIR use the default values.

struct myconfig : hls::ip_fir::params_t {
    static const double coeff_vec[sg_fir_srrc_coeffs_len];
};
static hls::FIR<myconfig> fir1;
fir1.run(fir_in, fir_out);

FIR Static Parameters describes the parameters used for the parametrization struct hls::ip_fir::params_t. FIR Struct Parameter Values provides the default values for the parameters and a list of possible values.

Note: Xilinx highly recommends that you refer to the FIR Compiler LogiCORE IP Product Guide (PG149) for details on the parameters and the implication for their settings.

FIR Struct Parameters

Table 4. FIR Struct Parameters
Parameter Description
input_width Data input port width
input_fractional_bits Number of fractional bits on the input port
output_width Data output port width
output_fractional_bits Number of fractional bits on the output port
coeff_width Bit-width of the coefficients
coeff_fractional_bits Number of fractional bits in the coefficients
num_coeffs Number of coefficients
coeff_sets Number of coefficient sets
input_length Number of samples in the input data
output_length Number of samples in the output data
num_channels Specify the number of channels of data to process
total_num_coeff Total number of coefficients
coeff_vec[total_num_coeff] The coefficient array
filter_type The type implementation used for the filter
rate_change Specifies integer or fractional rate changes
interp_rate The interpolation rate
decim_rate The decimation rate
zero_pack_factor Number of zero coefficients used in interpolation
rate_specification Specify the rate as frequency or period
hardware_oversampling_rate Specify the rate of over-sampling
sample_period The hardware oversample period
sample_frequency The hardware oversample frequency
quantization The quantization method to be used
best_precision Enable or disable the best precision
coeff_structure The type of coefficient structure to be used
output_rounding_mode Type of rounding used on the output
filter_arch Selects a systolic or transposed architecture
optimization_goal Specify a speed or area goal for optimization
inter_column_pipe_length The pipeline length required between DSP columns
column_config Specifies the number of DSP module columns
config_method Specifies how the DSP module columns are configured
coeff_padding Number of zero padding added to the front of the filter

When specifying parameter values that are not integer or boolean, the HLS FIR namespace should be used.

For example the possible values for rate_change are shown in the following table to be integer and fixed_fractional. The values used in the C program should be rate_change = hls::ip_fir::integer and rate_change = hls::ip_fir::fixed_fractional.

FIR Struct Parameter Values

The following table covers all features and functionality of the FIR IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.

Table 5. FIR Struct Parameter Values
Parameter C Type Default Value Valid Values
input_width unsigned 16 No limitation
input_fractional_bits unsigned 0 Limited by size of input_width
output_width unsigned 24 No limitation
output_fractional_bits unsigned 0 Limited by size of output_width
coeff_width unsigned 16 No limitation
coeff_fractional_bits unsigned 0 Limited by size of coeff_width
num_coeffs bool 21 Full
coeff_sets unsigned 1 1-1024
input_length unsigned 21 No limitation
output_length unsigned 21 No limitation
num_channels unsigned 1 1-1024
total_num_coeff unsigned 21 num_coeffs * coeff_sets
coeff_vec[total_num_coeff] double array None Not applicable
filter_type unsigned single_rate single_rate, interpolation, decimation, hilbert_filter, interpolated
rate_change unsigned integer integer, fixed_fractional
interp_rate unsigned 1 1-1024
decim_rate unsigned 1 1-1024
zero_pack_factor unsigned 1 1-8
rate_specification unsigned period frequency, period
hardware_oversampling_rate unsigned 1 No Limitation
sample_period bool 1 No Limitation
sample_frequency unsigned 0.001 No Limitation
quantization unsigned integer_coefficients integer_coefficients, quantize_only, maximize_dynamic_range
best_precision unsigned false

false

true

coeff_structure unsigned non_symmetric inferred, non_symmetric, symmetric, negative_symmetric, half_band, hilbert
output_rounding_mode unsigned full_precision full_precision, truncate_lsbs, non_symmetric_rounding_down, non_symmetric_rounding_up, symmetric_rounding_to_zero, symmetric_rounding_to_infinity, convergent_rounding_to_even, convergent_rounding_to_odd
filter_arch unsigned systolic_multiply_accumulate systolic_multiply_accumulate, transpose_multiply_accumulate
optimization_goal unsigned area area, speed
inter_column_pipe_length unsigned 4 1-16
column_config unsigned 1 Limited by number of DSP macrocells used
config_method unsigned single single, by_channel
coeff_padding bool false false

true

Using the FIR Function

The FIR function is defined in the HLS namespace and can be called as follows:

// Create an instance of the FIR 
static hls::FIR<STATIC_PARAM> fir1;
// Execute the FIR instance fir1
fir1.run(INPUT_DATA_ARRAY, OUTPUT_DATA_ARRAY);

The STATIC_PARAM is the static parameterization struct that defines most static parameters for the FIR.

Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY and OUTPUT_DATA_ARRAY). In the final implementation, these ports on the FIR IP will be implemented as AXI4-Stream ports. Xilinx recommends always using the FIR function in a region using the dataflow optimization (set_directive_dataflow), because this ensures the arrays are implemented as streaming arrays. An alternative is to specify both arrays as streaming using the set_directive_stream command.

IMPORTANT: The FIR cannot be used in a region which is pipelined. If high-performance operation is required, pipeline the loops or functions before and after the FIR then use dataflow optimization on all loops and functions in the region.

The multichannel functionality of the FIR is supported through interleaving the data in a single input and single output array.

  • The size of the input array should be large enough to accommodate all samples: num_channels * input_length.
  • The output array size should be specified to contain all output samples: num_channels * output_length.

The following code example demonstrates, for two channels, how the data is interleaved. In this example, the top-level function has two channels of input data (din_i, din_q) and two channels of output data (dout_i, dout_q). Two functions, at the front-end (fe) and back-end (be) are used to correctly order the data in the FIR input array and extract it from the FIR output array.

void dummy_fe(din_t din_i[LENGTH], din_t din_q[LENGTH], din_t out[FIR_LENGTH]) {
    for (unsigned i = 0; i < LENGTH; ++i) {
        out[2*i] = din_i[i];
        out[2*i + 1] = din_q[i];
    }
}
void dummy_be(dout_t in[FIR_LENGTH], dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {   
    for(unsigned i = 0; i < LENGTH; ++i) {
        dout_i[i] = in[2*i];
        dout_q[i] = in[2*i+1];
    }
}
void fir_top(din_t din_i[LENGTH], din_t din_q[LENGTH],
             dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {   

 din_t fir_in[FIR_LENGTH];
    dout_t fir_out[FIR_LENGTH];
    static hls::FIR<myconfig> fir1;

    dummy_fe(din_i, din_q, fir_in);
    fir1.run(fir_in, fir_out);
    dummy_be(fir_out, dout_i, dout_q);
}

Optional FIR Runtime Configuration

In some modes of operation, the FIR requires an additional input to configure how the coefficients are used. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).

This input configuration can be performed in the C code using a standard ap_int.h 8-bit data type. In this example, the header file fir_top.h specifies the use of the FIR and ap_fixed libraries, defines a number of the design parameter values and then defines some fixed-point types based on these:

#include "ap_fixed.h"
#include "hls_fir.h"

const unsigned FIR_LENGTH   = 21;
const unsigned INPUT_WIDTH = 16;
const unsigned INPUT_FRACTIONAL_BITS = 0;
const unsigned OUTPUT_WIDTH = 24;
const unsigned OUTPUT_FRACTIONAL_BITS = 0;
const unsigned COEFF_WIDTH = 16;
const unsigned COEFF_FRACTIONAL_BITS = 0;
const unsigned COEFF_NUM = 7;
const unsigned COEFF_SETS = 3;
const unsigned INPUT_LENGTH = FIR_LENGTH;
const unsigned OUTPUT_LENGTH = FIR_LENGTH;
const unsigned CHAN_NUM = 1;
typedef ap_fixed<INPUT_WIDTH, INPUT_WIDTH - INPUT_FRACTIONAL_BITS> s_data_t;
typedef ap_fixed<OUTPUT_WIDTH, OUTPUT_WIDTH - OUTPUT_FRACTIONAL_BITS> m_data_t;
typedef ap_uint<8> config_t;

In the top-level code, the information in the header file is included, the static parameterization struct is created using the same constant values used to specify the bit-widths, ensuring the C code and FIR configuration match, and the coefficients are specified. At the top-level, an input configuration, defined in the header file as 8-bit data, is passed into the FIR.

#include "fir_top.h"

struct param1 : hls::ip_fir::params_t {
    static const double coeff_vec[total_num_coeff];
    static const unsigned input_length = INPUT_LENGTH;
    static const unsigned output_length = OUTPUT_LENGTH;
    static const unsigned num_coeffs = COEFF_NUM;
    static const unsigned coeff_sets = COEFF_SETS;
};
const double param1::coeff_vec[total_num_coeff] = 
    {6,0,-4,-3,5,6,-6,-13,7,44,64,44,7,-13,-6,6,5,-3,-4,0,6};

void dummy_fe(s_data_t in[INPUT_LENGTH], s_data_t out[INPUT_LENGTH], 
                config_t* config_in, config_t* config_out)
{
    *config_out = *config_in;
    for(unsigned i = 0; i < INPUT_LENGTH; ++i)
        out[i] = in[i];
}

void dummy_be(m_data_t in[OUTPUT_LENGTH], m_data_t out[OUTPUT_LENGTH])
{
    for(unsigned i = 0; i < OUTPUT_LENGTH; ++i)
        out[i] = in[i];
}

// DUT
void fir_top(s_data_t in[INPUT_LENGTH],
             m_data_t out[OUTPUT_LENGTH],
             config_t* config)
{

    s_data_t fir_in[INPUT_LENGTH];
    m_data_t fir_out[OUTPUT_LENGTH];
    config_t fir_config;
    // Create struct for config
    static hls::FIR<param1> fir1;
    
    //==================================================
// Dataflow process
    dummy_fe(in, fir_in, config, &fir_config);
    fir1.run(fir_in, fir_out, &fir_config);
    dummy_be(fir_out, out);
    //==================================================
}

Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FIR.

DDS IP Library

You can use the Xilinx Direct Digital Synthesizer (DDS) IP block within a C++ design using the hls_dds.h library. This section explains how to configure DDS IP in your C++ code.

Note: Xilinx highly recommends that you review the DDS Compiler LogiCORE IP Product Guide (PG141) for information on how to implement and use the features of the IP.
IMPORTANT: The C IP implementation of the DDS IP core supports the fixed mode for the Phase_Increment and Phase_Offset parameters and supports the none mode for Phase_Offset, but it does not support programmable and streaming modes for these parameters.

To use the DDS in the C++ code:

  1. Include the hls_dds.h library in the code.
  2. Set the default parameters using the pre-defined struct hls::ip_dds::params_t.
  3. Call the DDS function.

First, include the DDS library in the source code. This header file resides in the include directory in the Vitis HLS installation area, which is automatically searched when Vitis HLS executes.

#include "hls_dds.h"

Define the static parameters of the DDS. For example, define the phase width, clock rate, and phase and increment offsets. The DDS C library includes a parameterization struct hls::ip_dds::params_t, which is used to initialize all static parameters with default values. By redefining any of the values in this struct, you can customize the implementation.

The following example shows how to override the default values for the phase width, clock rate, phase offset, and the number of channels using a user-defined struct param1, which is based on the existing predefined struct hls::ip_dds::params_t:

struct param1 : hls::ip_dds::params_t {
 static const unsigned Phase_Width = PHASEWIDTH;
 static const double   DDS_Clock_Rate = 25.0;
 static const double PINC[16];
 static const double POFF[16];
};

Create an instance of the DDS function using the HLS namespace with the defined static parameters (for example, param1). Then, call the function with the run method to execute the function. Following are the data and phase function arguments shown in order:

static hls::DDS<config1> dds1;
dds1.run(data_channel, phase_channel);

To access design examples that use the DDS C library, select Help > Welcome > Open Example Project > Design Examples > DDS.

DDS Static Parameters

The static parameters of the DDS define how to configure the DDS, such as the clock rate, phase interval, and modes. The hls_dds.h header file defines an hls::ip_dds::params_t struct, which sets the default values for the static parameters. To use the default values, you can use the parameterization struct directly with the DDS function.

static hls::DDS< hls::ip_dds::params_t > dds1;
dds1.run(data_channel, phase_channel);

The following table describes the parameters for the hls::ip_dds::params_t parameterization struct.

Note: Xilinx highly recommends that you review the DDS Compiler LogiCORE IP Product Guide (PG141) for details on the parameters and values.
Table 6. DDS Struct Parameters
Parameter Description
DDS_Clock_Rate Specifies the clock rate for the DDS output.
Channels Specifies the number of channels. The DDS and phase generator can support up to 16 channels. The channels are time-multiplexed, which reduces the effective clock frequency per channel.
Mode_of_Operation Specifies one of the following operation modes:

Standard mode for use when the accumulated phase can be truncated before it is used to access the SIN/COS LUT.

Rasterized mode for use when the desired frequencies and system clock are related by a rational fraction.

Modulus Describes the relationship between the system clock frequency and the desired frequencies.

Use this parameter in rasterized mode only.

Spurious_Free_Dynamic_Range Specifies the targeted purity of the tone produced by the DDS.
Frequency_Resolution Specifies the minimum frequency resolution in Hz and determines the Phase Width used by the phase accumulator, including associated phase increment (PINC) and phase offset (POFF) values.
Noise_Shaping Controls whether to use phase truncation, dithering, or Taylor series correction.
Phase_Width Sets the width of the following:
  • PHASE_OUT field within m_axis_phase_tdata
  • Phase field within s_axis_phase_tdata when the DDS is configured to be a SIN/COS LUT only
  • Phase accumulator
  • Associated phase increment and offset registers
  • Phase field in s_axis_config_tdata

For rasterized mode, the phase width is fixed as the number of bits required to describe the valid input range [0, Modulus-1], that is, log2 (Modulus-1) rounded up.

Output_Width Sets the width of SINE and COSINE fields within m_axis_data_tdata. The SFDR provided by this parameter depends on the selected Noise Shaping option.
Phase_Increment Selects the phase increment value.
Phase_Offset Selects the phase offset value.
Output_Selection Sets the output selection to SINE, COSINE, or both in the m_axis_data_tdata bus.
Negative_Sine Negates the SINE field at runtime.
Negative_Cosine Negates the COSINE field at runtime.
Amplitude_Mode Sets the amplitude to full range or unit circle.
Memory_Type Controls the implementation of the SIN/COS LUT.
Optimization_Goal Controls whether the implementation decisions target highest speed or lowest resource.
DSP48_Use Controls the implementation of the phase accumulator and addition stages for phase offset, dither noise addition, or both.
Latency_Configuration Sets the latency of the core to the optimum value based upon the Optimization Goal.
Latency Specifies the manual latency value.
Output_Form Sets the output form to two’s complement or to sign and magnitude. In general, the output of SINE and COSINE is in two’s complement form. However, when quadrant symmetry is used, the output form can be changed to sign and magnitude.
PINC[XIP_DDS_CHANNELS_MAX] Sets the values for the phase increment for each output channel.
POFF[XIP_DDS_CHANNELS_MAX] Sets the values for the phase offset for each output channel.

DDS Struct Parameter Values

The following table shows the possible values for the hls::ip_dds::params_t parameterization struct parameters.

Table 7. DDS Struct Parameter Values
Parameter C Type Default Value Valid Values
DDS_Clock_Rate double 20.0 Any double value
Channels unsigned 1 1 to 16
Mode_of_Operation unsigned XIP_DDS_MOO_CONVENTIONAL XIP_DDS_MOO_CONVENTIONAL truncates the accumulated phase.

XIP_DDS_MOO_RASTERIZED selects rasterized mode.

Modulus unsigned 200 129 to 256
Spurious_Free_Dynamic_Range double 20.0 18.0 to 150.0
Frequency_Resolution double 10.0 0.000000001 to 125000000
Noise_Shaping unsigned XIP_DDS_NS_NONE XIP_DDS_NS_NONE produces phase truncation DDS.

XIP_DDS_NS_DITHER uses phase dither to improve SFDR at the expense of increased noise floor.

XIP_DDS_NS_TAYLOR interpolates sine/cosine values using the otherwise discarded bits from phase truncation

XIP_DDS_NS_AUTO automatically determines noise-shaping.

Phase_Width unsigned 16 Must be an integer multiple of 8
Output_Width unsigned 16 Must be an integer multiple of 8
Phase_Increment unsigned XIP_DDS_PINCPOFF_FIXED XIP_DDS_PINCPOFF_FIXED fixes PINC at generation time, and PINC cannot be changed at runtime.

This is the only value supported.

Phase_Offset unsigned XIP_DDS_PINCPOFF_NONE XIP_DDS_PINCPOFF_NONE does not generate phase offset.

XIP_DDS_PINCPOFF_FIXED fixes POFF at generation time, and POFF cannot be changed at runtime.

Output_Selection unsigned XIP_DDS_OUT_SIN_AND_COS XIP_DDS_OUT_SIN_ONLY produces sine output only.

XIP_DDS_OUT_COS_ONLY produces cosine output only.

XIP_DDS_OUT_SIN_AND_COS produces both sin and cosine output.

Negative_Sine unsigned XIP_DDS_ABSENT XIP_DDS_ABSENT produces standard sine wave.

XIP_DDS_PRESENT negates sine wave.

Negative_Cosine bool XIP_DDS_ABSENT XIP_DDS_ABSENT produces standard sine wave.

XIP_DDS_PRESENT negates sine wave.

Amplitude_Mode unsigned XIP_DDS_FULL_RANGE XIP_DDS_FULL_RANGE normalizes amplitude to the output width with the binary point in the first place. For example, an 8-bit output has a binary amplitude of 100000000 - 10 giving values between 01111110 and 11111110, which corresponds to just less than 1 and just more than -1, respectively.

XIP_DDS_UNIT_CIRCLE normalizes amplitude to half full range, that is, values range from 01000 .. (+0.5). to 110000 .. (-0.5).

Memory_Type unsigned XIP_DDS_MEM_AUTO XIP_DDS_MEM_AUTO selects distributed ROM for small cases where the table can be contained in a single layer of memory and selects block ROM for larger cases.

XIP_DDS_MEM_BLOCK always uses block RAM.

XIP_DDS_MEM_DIST always uses distributed RAM.

Optimization_Goal unsigned XIP_DDS_OPTGOAL_AUTO XIP_DDS_OPTGOAL_AUTO automatically selects the optimization goal.

XIP_DDS_OPTGOAL_AREA optimizes for area.

XIP_DDS_OPTGOAL_SPEED optimizes for performance.

DSP48_Use unsigned XIP_DDS_DSP_MIN XIP_DDS_DSP_MIN implements the phase accumulator and the stages for phase offset, dither noise addition, or both in FPGA logic.

XIP_DDS_DSP_MAX implements the phase accumulator and the phase offset, dither noise addition, or both using DSP slices. In the case of single channel, the DSP slice can also provide the register to store programmable phase increment, phase offset, or both and thereby, save further fabric resources.

Latency_Configuration unsigned XIP_DDS_LATENCY_AUTO XIP_DDS_LATENCY_AUTO automatically determines he latency.

XIP_DDS_LATENCY_MANUAL manually specifies the latency using the Latency option.

Latency unsigned 5 Any value
Output_Form unsigned XIP_DDS_OUTPUT_TWOS XIP_DDS_OUTPUT_TWOS outputs two's complement.

XIP_DDS_OUTPUT_SIGN_MAG outputs signed magnitude.

PINC[XIP_DDS_CHANNELS_MAX] unsigned array {0} Any value for the phase increment for each channel
POFF[XIP_DDS_CHANNELS_MAX] unsigned array {0} Any value for the phase offset for each channel

SRL IP Library

C code is written to satisfy several different requirements: reuse, readability, and performance. Until now, it is unlikely that the C code was written to result in the most ideal hardware after high-level synthesis.

Like the requirements for reuse, readability, and performance, certain coding techniques or pre-defined constructs can ensure that the synthesis output results in more optimal hardware or to better model hardware in C for easier validation of the algorithm.

Mapping Directly into SRL Resources

Many C algorithms sequentially shift data through arrays. They add a new value to the start of the array, shift the existing data through array, and drop the oldest data value. This operation is implemented in hardware as a shift register.

This most common way to implement a shift register from C into hardware is to completely partition the array into individual elements, and allow the data dependencies between the elements in the RTL to imply a shift register.

Logic synthesis typically implements the RTL shift register into a Xilinx SRL resource, which efficiently implements shift registers. The issue is that sometimes logic synthesis does not implement the RTL shift register using an SRL component:

  • When data is accessed in the middle of the shift register, logic synthesis cannot directly infer an SRL.
  • Sometimes, even when the SRL is ideal, logic synthesis may implement the shift-resister in flip-flops, due to other factors. (Logic synthesis is also a complex process).

Vitis HLS provides a C++ class (ap_shift_reg) to ensure that the shift register defined in the C code is always implemented using an SRL resource. The ap_shift_reg class has two methods to perform the various read and write accesses supported by an SRL component.

Read from the Shifter

The read method allows a specified location to be read from the shifter register.

The ap_shift_reg.h header file that defines the ap_shift_reg class is also included with Vitis HLS as a standalone package. You have the right to use it in your own source code. The package xilinx_hls_lib_<release_number>.tgz is located in the include directory in the Vitis HLS installation area.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;

// Read location 2 of Sreg into var1
var1 = Sreg.read(2);

Read, Write, and Shift Data

A shift method allows a read, write, and shift operation to be performed.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;

// Read location 3 of Sreg into var1
// THEN shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3);

Read, Write, and Enable-Shift

The shift method also supports an enabled input, allowing the shift process to be controlled and enabled by a variable.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1, In1;
bool En;

// Read location 3 of Sreg into var1
// THEN if En=1 
// Shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3,En);

When using the ap_shift_reg class, Vitis HLS creates a unique RTL component for each shifter. When logic synthesis is performed, this component is synthesized into an SRL resource.