HLS IP Libraries - betway必威登陆

Table 1. HLS IP Libraries
Library Header File	Description
hls_fft.h	Allows the Xilinx LogiCORE IP FFT to be simulated in C and implemented using the Xilinx LogiCORE block.
hls_fir.h	Allows the Xilinx LogiCORE IP FIR to be simulated in C and implemented using the Xilinx LogiCORE block.
hls_dds.h	Allows the Xilinx LogiCORE IP DDS to be simulated in C and implemented using the Xilinx LogiCORE block.
ap_shift_reg.h	Provides a C++ class to implement a shift register which is implemented directly using a Xilinx SRL primitive.

FFT IP Library

The Xilinx FFT IP block can be called within a C++ design using the library hls_fft.h. This section explains how the FFT can be configured in your C++ code.

Note: Xilinx highly recommends that you review the Fast Fourier Transform LogiCORE IP Product Guide (PG109) for information on how to implement and use the features of the IP.

To use the FFT in your C++ code:

Include the hls_fft.h library in the code
Set the default parameters using the predefined struct hls::ip_fft::params_t
Define the runtime configuration
Call the FFT function
Optionally, check the runtime status

The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.

First, include the FFT library in the source code. This header file resides in the include directory in the Vitis HLS installation area which is automatically searched when Vitis HLS executes.

#include "hls_fft.h"

Define the static parameters of the FFT. This includes such things as input width, number of channels, type of architecture. which do not change dynamically. The FFT library includes a parameterization struct hls::ip_fft::params_t, which can be used to initialize all static parameters with default values.

In this example, the default values for output ordering and the widths of the configuration and status ports are over-ridden using a user-defined struct param1 based on the predefined struct.

struct param1 : hls::ip_fft::params_t {
    static const unsigned ordering_opt = hls::ip_fft::natural_order;
    static const unsigned config_width = FFT_CONFIG_WIDTH;
    static const unsigned status_width = FFT_STATUS_WIDTH;
};

Define types and variables for both the runtime configuration and runtime status. These values can be dynamic and are therefore defined as variables in the C code which can change and are accessed through APIs.

typedef hls::ip_fft::config_t<param1> config_t;
typedef hls::ip_fft::status_t<param1> status_t;
config_t fft_config1;
status_t fft_status1;

Next, set the runtime configuration. This example sets the direction of the FFT (Forward or Inverse) based on the value of variable “direction” and also set the value of the scaling schedule.

fft_config1.setDir(direction);
fft_config1.setSch(0x2AB);

Call the FFT function using the HLS namespace with the defined static configuration (param1 in this example). The function parameters are, in order, input data, output data, output status and input configuration.

hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);

Finally, check the output status. This example checks the overflow flag and stores the results in variable “ovflo”.

    *ovflo = fft_status1->getOvflo();

Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FFT.

FFT Static Parameters

The static parameters of the FFT define how the FFT is configured and specifies the fixed parameters such as the size of the FFT, whether the size can be changed dynamically, whether the implementation is pipelined or radix_4_burst_io.

The hls_fft.h header file defines a struct hls::ip_fft::params_t which can be used to set default values for the static parameters. If the default values are to be used, the parameterization struct can be used directly with the FFT function.

hls::fft<hls::ip_fft::params_t >  
    (xn1, xk1, &fft_status1, &fft_config1);

A more typical use is to change some of the parameters to non-default values. This is performed by creating a new user-defined parameterization struct based on the default parameterization struct and changing some of the default values.

In the following example, a new user struct my_fft_config is defined with a new value for the output ordering (changed to natural_order). All other static parameters to the FFT use the default values.

struct my_fft_config : hls::ip_fft::params_t {
    static const unsigned ordering_opt = hls::ip_fft::natural_order;
};

hls::fft<my_fft_config >  
     (xn1, xk1, &fft_status1, &fft_config1);

The values used for the parameterization struct hls::ip_fft::params_t are explained in FFT Struct Parameters. The default values for the parameters and a list of possible values are provided in FFT Struct Parameter Values.

Note: Xilinx highly recommends that you review the Fast Fourier Transform LogiCORE IP Product Guide (PG109) for details on the parameters and the implication for their settings.

FFT Struct Parameters

Table 2. FFT Struct Parameters
Parameter	Description
input_width	Data input port width.
output_width	Data output port width.
status_width	Output status port width.
config_width	Input configuration port width.
max_nfft	The size of the FFT data set is specified as 1 << max_nfft.
has_nfft	Determines if the size of the FFT can be runtime configurable.
channels	Number of channels.
arch_opt	The implementation architecture.
phase_factor_width	Configure the internal phase factor precision.
ordering_opt	The output ordering mode.
ovflo	Enable overflow mode.
scaling_opt	Define the scaling options.
rounding_opt	Define the rounding modes.
mem_data	Specify using block or distributed RAM for data memory.
mem_phase_factors	Specify using block or distributed RAM for phase factors memory.
mem_reorder	Specify using block or distributed RAM for output reorder memory.
stages_block_ram	Defines the number of block RAM stages used in the implementation.
mem_hybrid	When block RAMs are specified for data, phase factor, or reorder buffer, mem_hybrid specifies where or not to use a hybrid of block and distributed RAMs to reduce block RAM count in certain configurations.
complex_mult_type	Defines the types of multiplier to use for complex multiplications.
butterfly_type	Defines the implementation used for the FFT butterfly.

When specifying parameter values which are not integer or boolean, the HLS FFT namespace should be used.

For example, the possible values for parameter butterfly_type in the following table are use_luts and use_xtremedsp_slices. The values used in the C program should be butterfly_type = hls::ip_fft::use_luts and butterfly_type = hls::ip_fft::use_xtremedsp_slices.

FFT Struct Parameter Values

The following table covers all features and functionality of the FFT IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.

Table 3. FFT Struct Parameter Values
Parameter	C Type	Default Value	Valid Values
input_width	unsigned	16	8-34
output_width	unsigned	16	input_width to (input_width + max_nfft + 1)
status_width	unsigned	8	Depends on FFT configuration
config_width	unsigned	16	Depends on FFT configuration
max_nfft	unsigned	10	3-16
has_nfft	bool	false	True, False
channels	unsigned	1	1-12
arch_opt	unsigned	pipelined_streaming_io	automatically_select pipelined_streaming_io radix_4_burst_io radix_2_burst_io radix_2_lite_burst_io
phase_factor_width	unsigned	16	8-34
ordering_opt	unsigned	bit_reversed_order	bit_reversed_order natural_order
ovflo	bool	true	false true
scaling_opt	unsigned	scaled	scaled unscaled block_floating_point
rounding_opt	unsigned	truncation	truncation convergent_rounding
mem_data	unsigned	block_ram	block_ram distributed_ram
mem_phase_factors	unsigned	block_ram	block_ram distributed_ram
mem_reorder	unsigned	block_ram	block_ram distributed_ram
stages_block_ram	unsigned	(max_nfft < 10) ? 0 : (max_nfft - 9)	0-11
mem_hybrid	bool	false	false true
complex_mult_type	unsigned	use_mults_resources	use_luts use_mults_resources use_mults_performance
butterfly_type	unsigned	use_luts	use_luts use_xtremedsp_slices

FFT Runtime Configuration and Status

The FFT supports runtime configuration and runtime status monitoring through the configuration and status ports. These ports are defined as arguments to the FFT function, shown here as variables fft_status1 and fft_config1:

hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);

The runtime configuration and status can be accessed using the predefined structs from the FFT C library:

hls::ip_fft::config_t<param1>
hls::ip_fft::status_t<param1>

Note: In both cases, the struct requires the name of the static parameterization struct, shown in these examples as param1. Refer to the previous section for details on defining the static parameterization struct.

The runtime configuration struct allows the following actions to be performed in the C code:

Set the FFT length, if runtime configuration is enabled
Set the FFT direction as forward or inverse
Set the scaling schedule

The FFT length can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Set FFT length to 512 => log2(512) =>9
fft_config1-> setNfft(9);

IMPORTANT: The length specified during runtime cannot exceed the size defined by max_nfft in the static configuration.

The FFT direction can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Forward FFT
fft_config1->setDir(1);
// Inverse FFT 
fft_config1->setDir(0);

The FFT scaling schedule can be set as follows:

typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
fft_config1->setSch(0x2AB);

The output status port can be accessed using the pre-defined struct to determine:

If any overflow occurred during the FFT
The value of the block exponent

The FFT overflow mode can be checked as follows:

typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Check the overflow flag
bool *ovflo = fft_status1->getOvflo();

IMPORTANT: After each transaction completes, check the overflow status to confirm the correct operation of the FFT.

And the block exponent value can be obtained using:

typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Obtain the block exponent
unsigned int *blk_exp = fft_status1-> getBlkExp();

Using the FFT Function

The FFT function is defined in the HLS namespace and can be called as follows:

hls::fft<STATIC_PARAM> (
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY, 
OUTPUT_STATUS, 
INPUT_RUN_TIME_CONFIGURATION);

The STATIC_PARAM is the static parameterization struct that defines the static parameters for the FFT.

Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY and OUTPUT_DATA_ARRAY). In the final implementation, the ports on the FFT RTL block will be implemented as AXI4-Stream ports. Xilinx recommends always using the FFT function in a region using dataflow optimization (set_directive_dataflow), because this ensures the arrays are implemented as streaming arrays. An alternative is to specify both arrays as streaming using the set_directive_stream command.

IMPORTANT: The FFT cannot be used in a region which is pipelined. If high-performance operation is required, pipeline the loops or functions before and after the FFT then use dataflow optimization on all loops and functions in the region.

The data types for the arrays can be float or ap_fixed.

typedef float data_t;
complex<data_t> xn[FFT_LENGTH];
complex<data_t> xk[FFT_LENGTH];

To use fixed-point data types, the Vitis HLS arbitrary precision type ap_fixed should be used.

#include "ap_fixed.h"
typedef ap_fixed<FFT_INPUT_WIDTH,1> data_in_t;
typedef ap_fixed<FFT_OUTPUT_WIDTH,FFT_OUTPUT_WIDTH-FFT_INPUT_WIDTH+1> data_out_t;
#include <complex>
typedef hls::x_complex<data_in_t> cmpxData;
typedef hls::x_complex<data_out_t> cmpxDataOut;

In both cases, the FFT should be parameterized with the same correct data sizes. In the case of floating point data, the data widths will always be 32-bit and any other specified size will be considered invalid.

IMPORTANT: The input and output width of the FFT can be configured to any arbitrary value within the supported range. The variables which connect to the input and output parameters must be defined in increments of 8-bit. For example, if the output width is configured as 33-bit, the output variable must be defined as a 40-bit variable.

The multichannel functionality of the FFT can be used by using two-dimensional arrays for the input and output data. In this case, the array data should be configured with the first dimension representing each channel and the second dimension representing the FFT data.

typedef float data_t;
static complex<data_t> xn[CHANNEL][FFT_LENGTH];
static complex<data_t> xk[CHANELL][FFT_LENGTH];

The FFT core consumes and produces data as interleaved channels (for example, ch0-data0, ch1-data0, ch2-data0, etc, ch0-data1, ch1-data1, ch2-data2, etc.). Therefore, to stream the input or output arrays of the FFT using the same sequential order that the data was read or written, you must fill or empty the two-dimensional arrays for multiple channels by iterating through the channel index first, as shown in the following example:

cmpxData   in_fft[FFT_CHANNELS][FFT_LENGTH];
cmpxData  out_fft[FFT_CHANNELS][FFT_LENGTH];
 
// Write to FFT Input Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
 for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
 in_fft[j][i] = in.read().data;
 }
}
   
// Read from FFT Output Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
 for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
 out.data = out_fft[j][i];
 
 }
}

Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FFT.

FIR Filter IP Library

The Xilinx FIR IP block can be called within a C++ design using the library hls_fir.h. This section explains how the FIR can be configured in your C++ code.

Note: Xilinx highly recommends that you review the FIR Compiler LogiCORE IP Product Guide (PG149) for information on how to implement and use the features of the IP.

To use the FIR in your C++ code:

Include the hls_fir.h library in the code.
Set the static parameters using the predefined struct hls::ip_fir::params_t.
Call the FIR function.
Optionally, define a runtime input configuration to modify some parameters dynamically.

The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.

First, include the FIR library in the source code. This header file resides in the include directory in the Vitis HLS installation area. This directory is automatically searched when Vitis HLS executes. There is no need to specify the path to this directory if compiling inside Vitis HLS.

#include "hls_fir.h"

Define the static parameters of the FIR. This includes such static attributes such as the input width, the coefficients, the filter rate (single, decimation, hilbert). The FIR library includes a parameterization struct hls::ip_fir::params_t which can be used to initialize all static parameters with default values.

In this example, the coefficients are defined as residing in array coeff_vec and the default values for the number of coefficients, the input width and the quantization mode are over-ridden using a user a user-defined struct myconfig based on the predefined struct.

struct myconfig : hls::ip_fir::params_t {
static const double coeff_vec[sg_fir_srrc_coeffs_len];
    static const unsigned num_coeffs = sg_fir_srrc_coeffs_len;
    static const unsigned input_width = INPUT_WIDTH; 
    static const unsigned quantization = hls::ip_fir::quantize_only;
};

Create an instance of the FIR function using the HLS namespace with the defined static parameters (myconfig in this example) and then call the function with the run method to execute the function. The function arguments are, in order, input data and output data.

static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out);

Optionally, a runtime input configuration can be used. In some modes of the FIR, the data on this input determines how the coefficients are used during interleaved channels or when coefficient reloading is required. This configuration can be dynamic and is therefore defined as a variable. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).

When the runtime input configuration is used, the FIR function is called with three arguments: input data, output data and input configuration.

// Define the configuration type
typedef ap_uint<8> config_t;
// Define the configuration variable
config_t fir_config = 8;
// Use the configuration in the FFT
static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out, &fir_config);

Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FIR.

FIR Static Parameters

The static parameters of the FIR define how the FIR IP is parameterized and specifies non-dynamic items such as the input and output widths, the number of fractional bits, the coefficient values, the interpolation and decimation rates. Most of these configurations have default values: there are no default values for the coefficients.

The hls_fir.h header file defines a struct hls::ip_fir::params_t that can be used to set the default values for most of the static parameters.

IMPORTANT: There are no defaults defined for the coefficients. Therefore, Xilinx does not recommend using the pre-defined struct to directly initialize the FIR. A new user defined struct which specifies the coefficients should always be used to perform the static parameterization.

In this example, a new user struct my_config is defined and with a new value for the coefficients. The coefficients are specified as residing in array coeff_vec. All other parameters to the FIR use the default values.

struct myconfig : hls::ip_fir::params_t {
    static const double coeff_vec[sg_fir_srrc_coeffs_len];
};
static hls::FIR<myconfig> fir1;
fir1.run(fir_in, fir_out);

FIR Static Parameters describes the parameters used for the parametrization struct hls::ip_fir::params_t. FIR Struct Parameter Values provides the default values for the parameters and a list of possible values.

Note: Xilinx highly recommends that you refer to the FIR Compiler LogiCORE IP Product Guide (PG149) for details on the parameters and the implication for their settings.

FIR Struct Parameters

Table 4. FIR Struct Parameters
Parameter	Description
input_width	Data input port width
input_fractional_bits	Number of fractional bits on the input port
output_width	Data output port width
output_fractional_bits	Number of fractional bits on the output port
coeff_width	Bit-width of the coefficients
coeff_fractional_bits	Number of fractional bits in the coefficients
num_coeffs	Number of coefficients
coeff_sets	Number of coefficient sets
input_length	Number of samples in the input data
output_length	Number of samples in the output data
num_channels	Specify the number of channels of data to process
total_num_coeff	Total number of coefficients
coeff_vec[total_num_coeff]	The coefficient array
filter_type	The type implementation used for the filter
rate_change	Specifies integer or fractional rate changes
interp_rate	The interpolation rate
decim_rate	The decimation rate
zero_pack_factor	Number of zero coefficients used in interpolation
rate_specification	Specify the rate as frequency or period
hardware_oversampling_rate	Specify the rate of over-sampling
sample_period	The hardware oversample period
sample_frequency	The hardware oversample frequency
quantization	The quantization method to be used
best_precision	Enable or disable the best precision
coeff_structure	The type of coefficient structure to be used
output_rounding_mode	Type of rounding used on the output
filter_arch	Selects a systolic or transposed architecture
optimization_goal	Specify a speed or area goal for optimization
inter_column_pipe_length	The pipeline length required between DSP columns
column_config	Specifies the number of DSP module columns
config_method	Specifies how the DSP module columns are configured
coeff_padding	Number of zero padding added to the front of the filter

When specifying parameter values that are not integer or boolean, the HLS FIR namespace should be used.

For example the possible values for rate_change are shown in the following table to be integer and fixed_fractional. The values used in the C program should be rate_change = hls::ip_fir::integer and rate_change = hls::ip_fir::fixed_fractional.

FIR Struct Parameter Values

The following table covers all features and functionality of the FIR IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.

Table 5. FIR Struct Parameter Values
Parameter	C Type	Default Value	Valid Values
input_width	unsigned	16	No limitation
input_fractional_bits	unsigned	0	Limited by size of input_width
output_width	unsigned	24	No limitation
output_fractional_bits	unsigned	0	Limited by size of output_width
coeff_width	unsigned	16	No limitation
coeff_fractional_bits	unsigned	0	Limited by size of coeff_width
num_coeffs	bool	21	Full
coeff_sets	unsigned	1	1-1024
input_length	unsigned	21	No limitation
output_length	unsigned	21	No limitation
num_channels	unsigned	1	1-1024
total_num_coeff	unsigned	21	num_coeffs * coeff_sets
coeff_vec[total_num_coeff]	double array	None	Not applicable
filter_type	unsigned	single_rate	single_rate, interpolation, decimation, hilbert_filter, interpolated
rate_change	unsigned	integer	integer, fixed_fractional
interp_rate	unsigned	1	1-1024
decim_rate	unsigned	1	1-1024
zero_pack_factor	unsigned	1	1-8
rate_specification	unsigned	period	frequency, period
hardware_oversampling_rate	unsigned	1	No Limitation
sample_period	bool	1	No Limitation
sample_frequency	unsigned	0.001	No Limitation
quantization	unsigned	integer_coefficients	integer_coefficients, quantize_only, maximize_dynamic_range
best_precision	unsigned	false	false true
coeff_structure	unsigned	non_symmetric	inferred, non_symmetric, symmetric, negative_symmetric, half_band, hilbert
output_rounding_mode	unsigned	full_precision	full_precision, truncate_lsbs, non_symmetric_rounding_down, non_symmetric_rounding_up, symmetric_rounding_to_zero, symmetric_rounding_to_infinity, convergent_rounding_to_even, convergent_rounding_to_odd
filter_arch	unsigned	systolic_multiply_accumulate	systolic_multiply_accumulate, transpose_multiply_accumulate
optimization_goal	unsigned	area	area, speed
inter_column_pipe_length	unsigned	4	1-16
column_config	unsigned	1	Limited by number of DSP macrocells used
config_method	unsigned	single	single, by_channel
coeff_padding	bool	false	false true

Using the FIR Function

The FIR function is defined in the HLS namespace and can be called as follows:

// Create an instance of the FIR 
static hls::FIR<STATIC_PARAM> fir1;
// Execute the FIR instance fir1
fir1.run(INPUT_DATA_ARRAY, OUTPUT_DATA_ARRAY);

The STATIC_PARAM is the static parameterization struct that defines most static parameters for the FIR.

Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY and OUTPUT_DATA_ARRAY). In the final implementation, these ports on the FIR IP will be implemented as AXI4-Stream ports. Xilinx recommends always using the FIR function in a region using the dataflow optimization (set_directive_dataflow), because this ensures the arrays are implemented as streaming arrays. An alternative is to specify both arrays as streaming using the set_directive_stream command.

IMPORTANT: The FIR cannot be used in a region which is pipelined. If high-performance operation is required, pipeline the loops or functions before and after the FIR then use dataflow optimization on all loops and functions in the region.

The multichannel functionality of the FIR is supported through interleaving the data in a single input and single output array.

The size of the input array should be large enough to accommodate all samples: num_channels * input_length.
The output array size should be specified to contain all output samples: num_channels * output_length.

The following code example demonstrates, for two channels, how the data is interleaved. In this example, the top-level function has two channels of input data (din_i, din_q) and two channels of output data (dout_i, dout_q). Two functions, at the front-end (fe) and back-end (be) are used to correctly order the data in the FIR input array and extract it from the FIR output array.

void dummy_fe(din_t din_i[LENGTH], din_t din_q[LENGTH], din_t out[FIR_LENGTH]) {
    for (unsigned i = 0; i < LENGTH; ++i) {
        out[2*i] = din_i[i];
        out[2*i + 1] = din_q[i];
    }
}
void dummy_be(dout_t in[FIR_LENGTH], dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {   
    for(unsigned i = 0; i < LENGTH; ++i) {
        dout_i[i] = in[2*i];
        dout_q[i] = in[2*i+1];
    }
}
void fir_top(din_t din_i[LENGTH], din_t din_q[LENGTH],
             dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {   

 din_t fir_in[FIR_LENGTH];
    dout_t fir_out[FIR_LENGTH];
    static hls::FIR<myconfig> fir1;

    dummy_fe(din_i, din_q, fir_in);
    fir1.run(fir_in, fir_out);
    dummy_be(fir_out, dout_i, dout_q);
}

Optional FIR Runtime Configuration

In some modes of operation, the FIR requires an additional input to configure how the coefficients are used. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).

This input configuration can be performed in the C code using a standard ap_int.h 8-bit data type. In this example, the header file fir_top.h specifies the use of the FIR and ap_fixed libraries, defines a number of the design parameter values and then defines some fixed-point types based on these:

#include "ap_fixed.h"
#include "hls_fir.h"

const unsigned FIR_LENGTH   = 21;
const unsigned INPUT_WIDTH = 16;
const unsigned INPUT_FRACTIONAL_BITS = 0;
const unsigned OUTPUT_WIDTH = 24;
const unsigned OUTPUT_FRACTIONAL_BITS = 0;
const unsigned COEFF_WIDTH = 16;
const unsigned COEFF_FRACTIONAL_BITS = 0;
const unsigned COEFF_NUM = 7;
const unsigned COEFF_SETS = 3;
const unsigned INPUT_LENGTH = FIR_LENGTH;
const unsigned OUTPUT_LENGTH = FIR_LENGTH;
const unsigned CHAN_NUM = 1;
typedef ap_fixed<INPUT_WIDTH, INPUT_WIDTH - INPUT_FRACTIONAL_BITS> s_data_t;
typedef ap_fixed<OUTPUT_WIDTH, OUTPUT_WIDTH - OUTPUT_FRACTIONAL_BITS> m_data_t;
typedef ap_uint<8> config_t;

In the top-level code, the information in the header file is included, the static parameterization struct is created using the same constant values used to specify the bit-widths, ensuring the C code and FIR configuration match, and the coefficients are specified. At the top-level, an input configuration, defined in the header file as 8-bit data, is passed into the FIR.

#include "fir_top.h"

struct param1 : hls::ip_fir::params_t {
    static const double coeff_vec[total_num_coeff];
    static const unsigned input_length = INPUT_LENGTH;
    static const unsigned output_length = OUTPUT_LENGTH;
    static const unsigned num_coeffs = COEFF_NUM;
    static const unsigned coeff_sets = COEFF_SETS;
};
const double param1::coeff_vec[total_num_coeff] = 
    {6,0,-4,-3,5,6,-6,-13,7,44,64,44,7,-13,-6,6,5,-3,-4,0,6};

void dummy_fe(s_data_t in[INPUT_LENGTH], s_data_t out[INPUT_LENGTH], 
                config_t* config_in, config_t* config_out)
{
    *config_out = *config_in;
    for(unsigned i = 0; i < INPUT_LENGTH; ++i)
        out[i] = in[i];
}

void dummy_be(m_data_t in[OUTPUT_LENGTH], m_data_t out[OUTPUT_LENGTH])
{
    for(unsigned i = 0; i < OUTPUT_LENGTH; ++i)
        out[i] = in[i];
}

// DUT
void fir_top(s_data_t in[INPUT_LENGTH],
             m_data_t out[OUTPUT_LENGTH],
             config_t* config)
{

    s_data_t fir_in[INPUT_LENGTH];
    m_data_t fir_out[OUTPUT_LENGTH];
    config_t fir_config;
    // Create struct for config
    static hls::FIR<param1> fir1;
    
    //==================================================
// Dataflow process
    dummy_fe(in, fir_in, config, &fir_config);
    fir1.run(fir_in, fir_out, &fir_config);
    dummy_be(fir_out, out);
    //==================================================
}

Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option Help > Welcome > Open Example Project > Design Examples > FIR.

DDS IP Library

You can use the Xilinx Direct Digital Synthesizer (DDS) IP block within a C++ design using the hls_dds.h library. This section explains how to configure DDS IP in your C++ code.

Note: Xilinx highly recommends that you review the DDS Compiler LogiCORE IP Product Guide (PG141) for information on how to implement and use the features of the IP.

IMPORTANT: The C IP implementation of the DDS IP core supports the fixed mode for the Phase_Increment and Phase_Offset parameters and supports the none mode for Phase_Offset, but it does not support programmable and streaming modes for these parameters.

To use the DDS in the C++ code:

Include the hls_dds.h library in the code.
Set the default parameters using the pre-defined struct hls::ip_dds::params_t.
Call the DDS function.

First, include the DDS library in the source code. This header file resides in the include directory in the Vitis HLS installation area, which is automatically searched when Vitis HLS executes.

#include "hls_dds.h"

Define the static parameters of the DDS. For example, define the phase width, clock rate, and phase and increment offsets. The DDS C library includes a parameterization struct hls::ip_dds::params_t, which is used to initialize all static parameters with default values. By redefining any of the values in this struct, you can customize the implementation.

The following example shows how to override the default values for the phase width, clock rate, phase offset, and the number of channels using a user-defined struct param1, which is based on the existing predefined struct hls::ip_dds::params_t:

struct param1 : hls::ip_dds::params_t {
 static const unsigned Phase_Width = PHASEWIDTH;
 static const double   DDS_Clock_Rate = 25.0;
 static const double PINC[16];
 static const double POFF[16];
};

Create an instance of the DDS function using the HLS namespace with the defined static parameters (for example, param1). Then, call the function with the run method to execute the function. Following are the data and phase function arguments shown in order:

static hls::DDS<config1> dds1;
dds1.run(data_channel, phase_channel);

To access design examples that use the DDS C library, select Help > Welcome > Open Example Project > Design Examples > DDS.

DDS Static Parameters

The static parameters of the DDS define how to configure the DDS, such as the clock rate, phase interval, and modes. The hls_dds.h header file defines an hls::ip_dds::params_t struct, which sets the default values for the static parameters. To use the default values, you can use the parameterization struct directly with the DDS function.

static hls::DDS< hls::ip_dds::params_t > dds1;
dds1.run(data_channel, phase_channel);

The following table describes the parameters for the hls::ip_dds::params_t parameterization struct.

Note: Xilinx highly recommends that you review the DDS Compiler LogiCORE IP Product Guide (PG141) for details on the parameters and values.

Table 6. DDS Struct Parameters
Parameter	Description
`DDS_Clock_Rate`	Specifies the clock rate for the DDS output.
`Channels`	Specifies the number of channels. The DDS and phase generator can support up to 16 channels. The channels are time-multiplexed, which reduces the effective clock frequency per channel.
`Mode_of_Operation`	Specifies one of the following operation modes: Standard mode for use when the accumulated phase can be truncated before it is used to access the SIN/COS LUT. Rasterized mode for use when the desired frequencies and system clock are related by a rational fraction.
`Modulus`	Describes the relationship between the system clock frequency and the desired frequencies. Use this parameter in rasterized mode only.
`Spurious_Free_Dynamic_Range`	Specifies the targeted purity of the tone produced by the DDS.
`Frequency_Resolution`	Specifies the minimum frequency resolution in Hz and determines the Phase Width used by the phase accumulator, including associated phase increment (PINC) and phase offset (POFF) values.
`Noise_Shaping`	Controls whether to use phase truncation, dithering, or Taylor series correction.
`Phase_Width`	Sets the width of the following: PHASE_OUT field within `m_axis_phase_tdata` Phase field within `s_axis_phase_tdata` when the DDS is configured to be a SIN/COS LUT only Phase accumulator Associated phase increment and offset registers Phase field in `s_axis_config_tdata` For rasterized mode, the phase width is fixed as the number of bits required to describe the valid input range `[0, Modulus-1]`, that is, `log2 (Modulus-1)` rounded up.
`Output_Width`	Sets the width of SINE and COSINE fields within `m_axis_data_tdata`. The SFDR provided by this parameter depends on the selected Noise Shaping option.
`Phase_Increment`	Selects the phase increment value.
`Phase_Offset`	Selects the phase offset value.
`Output_Selection`	Sets the output selection to SINE, COSINE, or both in the `m_axis_data_tdata` bus.
`Negative_Sine`	Negates the SINE field at runtime.
`Negative_Cosine`	Negates the COSINE field at runtime.
`Amplitude_Mode`	Sets the amplitude to full range or unit circle.
`Memory_Type`	Controls the implementation of the SIN/COS LUT.
`Optimization_Goal`	Controls whether the implementation decisions target highest speed or lowest resource.
`DSP48_Use`	Controls the implementation of the phase accumulator and addition stages for phase offset, dither noise addition, or both.
`Latency_Configuration`	Sets the latency of the core to the optimum value based upon the Optimization Goal.
`Latency`	Specifies the manual latency value.
`Output_Form`	Sets the output form to two’s complement or to sign and magnitude. In general, the output of SINE and COSINE is in two’s complement form. However, when quadrant symmetry is used, the output form can be changed to sign and magnitude.
`PINC[XIP_DDS_CHANNELS_MAX]`	Sets the values for the phase increment for each output channel.
`POFF[XIP_DDS_CHANNELS_MAX]`	Sets the values for the phase offset for each output channel.

DDS Struct Parameter Values

The following table shows the possible values for the hls::ip_dds::params_t parameterization struct parameters.

Table 7. DDS Struct Parameter Values
Parameter	C Type	Default Value	Valid Values
DDS_Clock_Rate	double	20.0	Any double value
Channels	unsigned	1	1 to 16
Mode_of_Operation	unsigned	XIP_DDS_MOO_CONVENTIONAL	XIP_DDS_MOO_CONVENTIONAL truncates the accumulated phase. XIP_DDS_MOO_RASTERIZED selects rasterized mode.
Modulus	unsigned	200	129 to 256
Spurious_Free_Dynamic_Range	double	20.0	18.0 to 150.0
Frequency_Resolution	double	10.0	0.000000001 to 125000000
Noise_Shaping	unsigned	XIP_DDS_NS_NONE	XIP_DDS_NS_NONE produces phase truncation DDS. XIP_DDS_NS_DITHER uses phase dither to improve SFDR at the expense of increased noise floor. XIP_DDS_NS_TAYLOR interpolates sine/cosine values using the otherwise discarded bits from phase truncation XIP_DDS_NS_AUTO automatically determines noise-shaping.
Phase_Width	unsigned	16	Must be an integer multiple of 8
Output_Width	unsigned	16	Must be an integer multiple of 8
Phase_Increment	unsigned	XIP_DDS_PINCPOFF_FIXED	XIP_DDS_PINCPOFF_FIXED fixes PINC at generation time, and PINC cannot be changed at runtime. This is the only value supported.
Phase_Offset	unsigned	XIP_DDS_PINCPOFF_NONE	XIP_DDS_PINCPOFF_NONE does not generate phase offset. XIP_DDS_PINCPOFF_FIXED fixes POFF at generation time, and POFF cannot be changed at runtime.
Output_Selection	unsigned	XIP_DDS_OUT_SIN_AND_COS	XIP_DDS_OUT_SIN_ONLY produces sine output only. XIP_DDS_OUT_COS_ONLY produces cosine output only. XIP_DDS_OUT_SIN_AND_COS produces both sin and cosine output.
Negative_Sine	unsigned	XIP_DDS_ABSENT	XIP_DDS_ABSENT produces standard sine wave. XIP_DDS_PRESENT negates sine wave.
Negative_Cosine	bool	XIP_DDS_ABSENT	XIP_DDS_ABSENT produces standard sine wave. XIP_DDS_PRESENT negates sine wave.
Amplitude_Mode	unsigned	XIP_DDS_FULL_RANGE	XIP_DDS_FULL_RANGE normalizes amplitude to the output width with the binary point in the first place. For example, an 8-bit output has a binary amplitude of 100000000 - 10 giving values between 01111110 and 11111110, which corresponds to just less than 1 and just more than -1, respectively. XIP_DDS_UNIT_CIRCLE normalizes amplitude to half full range, that is, values range from 01000 .. (+0.5). to 110000 .. (-0.5).
Memory_Type	unsigned	XIP_DDS_MEM_AUTO	XIP_DDS_MEM_AUTO selects distributed ROM for small cases where the table can be contained in a single layer of memory and selects block ROM for larger cases. XIP_DDS_MEM_BLOCK always uses block RAM. XIP_DDS_MEM_DIST always uses distributed RAM.
Optimization_Goal	unsigned	XIP_DDS_OPTGOAL_AUTO	XIP_DDS_OPTGOAL_AUTO automatically selects the optimization goal. XIP_DDS_OPTGOAL_AREA optimizes for area. XIP_DDS_OPTGOAL_SPEED optimizes for performance.
DSP48_Use	unsigned	XIP_DDS_DSP_MIN	XIP_DDS_DSP_MIN implements the phase accumulator and the stages for phase offset, dither noise addition, or both in FPGA logic. XIP_DDS_DSP_MAX implements the phase accumulator and the phase offset, dither noise addition, or both using DSP slices. In the case of single channel, the DSP slice can also provide the register to store programmable phase increment, phase offset, or both and thereby, save further fabric resources.
Latency_Configuration	unsigned	XIP_DDS_LATENCY_AUTO	XIP_DDS_LATENCY_AUTO automatically determines he latency. XIP_DDS_LATENCY_MANUAL manually specifies the latency using the Latency option.
Latency	unsigned	5	Any value
Output_Form	unsigned	XIP_DDS_OUTPUT_TWOS	XIP_DDS_OUTPUT_TWOS outputs two's complement. XIP_DDS_OUTPUT_SIGN_MAG outputs signed magnitude.
PINC[XIP_DDS_CHANNELS_MAX]	unsigned array	{0}	Any value for the phase increment for each channel
POFF[XIP_DDS_CHANNELS_MAX]	unsigned array	{0}	Any value for the phase offset for each channel

SRL IP Library

C code is written to satisfy several different requirements: reuse, readability, and performance. Until now, it is unlikely that the C code was written to result in the most ideal hardware after high-level synthesis.

Like the requirements for reuse, readability, and performance, certain coding techniques or pre-defined constructs can ensure that the synthesis output results in more optimal hardware or to better model hardware in C for easier validation of the algorithm.

Mapping Directly into SRL Resources

Many C algorithms sequentially shift data through arrays. They add a new value to the start of the array, shift the existing data through array, and drop the oldest data value. This operation is implemented in hardware as a shift register.

This most common way to implement a shift register from C into hardware is to completely partition the array into individual elements, and allow the data dependencies between the elements in the RTL to imply a shift register.

Logic synthesis typically implements the RTL shift register into a Xilinx SRL resource, which efficiently implements shift registers. The issue is that sometimes logic synthesis does not implement the RTL shift register using an SRL component:

When data is accessed in the middle of the shift register, logic synthesis cannot directly infer an SRL.
Sometimes, even when the SRL is ideal, logic synthesis may implement the shift-resister in flip-flops, due to other factors. (Logic synthesis is also a complex process).

Vitis HLS provides a C++ class (ap_shift_reg) to ensure that the shift register defined in the C code is always implemented using an SRL resource. The ap_shift_reg class has two methods to perform the various read and write accesses supported by an SRL component.

Read from the Shifter

The read method allows a specified location to be read from the shifter register.

The ap_shift_reg.h header file that defines the ap_shift_reg class is also included with Vitis HLS as a standalone package. You have the right to use it in your own source code. The package xilinx_hls_lib_<release_number>.tgz is located in the include directory in the Vitis HLS installation area.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;

// Read location 2 of Sreg into var1
var1 = Sreg.read(2);

Read, Write, and Shift Data

A shift method allows a read, write, and shift operation to be performed.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;

// Read location 3 of Sreg into var1
// THEN shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3);

Read, Write, and Enable-Shift

The shift method also supports an enabled input, allowing the shift process to be controlled and enabled by a variable.

// Include the Class
#include "ap_shift_reg.h"

// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1, In1;
bool En;

// Read location 3 of Sreg into var1
// THEN if En=1 
// Shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3,En);

When using the ap_shift_reg class, Vitis HLS creates a unique RTL component for each shifter. When logic synthesis is performed, this component is synthesized into an SRL resource.