HLS IP Libraries
Vitis™ HLS provides C++ libraries to implement a number of Xilinx® IP blocks. The C libraries allow the following Xilinx IP blocks to be directly inferred from the C++ source code ensuring a high-quality implementation in the FPGA.
Library Header File | Description |
---|---|
hls_fft.h | Allows the Xilinx LogiCORE IP FFT to be simulated in C and implemented using the Xilinx LogiCORE block. |
hls_fir.h | Allows the Xilinx LogiCORE IP FIR to be simulated in C and implemented using the Xilinx LogiCORE block. |
hls_dds.h | Allows the Xilinx LogiCORE IP DDS to be simulated in C and implemented using the Xilinx LogiCORE block. |
ap_shift_reg.h | Provides a C++ class to implement a shift register which is implemented directly using a Xilinx SRL primitive. |
FFT IP Library
The Xilinx FFT IP block can be called within a C++ design using the library hls_fft.h
. This section explains how the FFT can be configured in your C++ code.
To use the FFT in your C++ code:
- Include the hls_fft.h library in the code
- Set the default parameters using the predefined struct
hls::ip_fft::params_t
- Define the runtime configuration
- Call the FFT function
- Optionally, check the runtime status
The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.
First, include the FFT library in the source code. This header file resides in the include directory in the Vitis HLS installation area which is automatically searched when Vitis HLS executes.
#include "hls_fft.h"
Define the static parameters of the FFT. This includes such things as input width, number of
channels, type of architecture. which do not change dynamically. The FFT library includes a
parameterization struct hls::ip_fft::params_t
, which can be used to initialize all
static parameters with default values.
In this example, the default values for output ordering and the widths of the
configuration and status ports are over-ridden using a user-defined struct param1
based on the predefined struct.
struct param1 : hls::ip_fft::params_t {
static const unsigned ordering_opt = hls::ip_fft::natural_order;
static const unsigned config_width = FFT_CONFIG_WIDTH;
static const unsigned status_width = FFT_STATUS_WIDTH;
};
Define types and variables for both the runtime configuration and runtime status. These values can be dynamic and are therefore defined as variables in the C code which can change and are accessed through APIs.
typedef hls::ip_fft::config_t<param1> config_t;
typedef hls::ip_fft::status_t<param1> status_t;
config_t fft_config1;
status_t fft_status1;
Next, set the runtime configuration. This example sets the direction of the FFT (Forward or Inverse) based on the value of variable “direction” and also set the value of the scaling schedule.
fft_config1.setDir(direction);
fft_config1.setSch(0x2AB);
Call the FFT function using the HLS namespace with the defined static configuration
(param1
in this example). The function parameters are, in order, input data, output data,
output status and input configuration.
hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);
Finally, check the output status. This example checks the overflow flag and stores the results in variable “ovflo”.
*ovflo = fft_status1->getOvflo();
Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option .
FFT Static Parameters
The static parameters of the FFT define how the FFT is configured and specifies
the fixed parameters such as the size of the FFT, whether the size can be changed
dynamically, whether the implementation is pipelined or
radix_4_burst_io
.
The hls_fft.h header file defines a struct
hls::ip_fft::params_t
which can be used to
set default values for the static parameters. If the default values are to be used,
the parameterization struct can be used directly with the FFT function.
hls::fft<hls::ip_fft::params_t >
(xn1, xk1, &fft_status1, &fft_config1);
A more typical use is to change some of the parameters to non-default values. This is performed by creating a new user-defined parameterization struct based on the default parameterization struct and changing some of the default values.
In the following example, a new user struct my_fft_config
is defined with a new value for the output ordering
(changed to natural_order
). All other static
parameters to the FFT use the default values.
struct my_fft_config : hls::ip_fft::params_t {
static const unsigned ordering_opt = hls::ip_fft::natural_order;
};
hls::fft<my_fft_config >
(xn1, xk1, &fft_status1, &fft_config1);
The values used for the parameterization struct hls::ip_fft::params_t
are explained in FFT Struct Parameters. The default values
for the parameters and a list of possible values are provided in FFT Struct Parameter Values.
FFT Struct Parameters
Parameter | Description |
---|---|
input_width | Data input port width. |
output_width | Data output port width. |
status_width | Output status port width. |
config_width | Input configuration port width. |
max_nfft | The size of the FFT data set is specified as 1 << max_nfft. |
has_nfft | Determines if the size of the FFT can be runtime configurable. |
channels | Number of channels. |
arch_opt | The implementation architecture. |
phase_factor_width | Configure the internal phase factor precision. |
ordering_opt | The output ordering mode. |
ovflo | Enable overflow mode. |
scaling_opt | Define the scaling options. |
rounding_opt | Define the rounding modes. |
mem_data | Specify using block or distributed RAM for data memory. |
mem_phase_factors | Specify using block or distributed RAM for phase factors memory. |
mem_reorder | Specify using block or distributed RAM for output reorder memory. |
stages_block_ram | Defines the number of block RAM stages used in the implementation. |
mem_hybrid | When block RAMs are specified for data, phase factor, or reorder buffer, mem_hybrid specifies where or not to use a hybrid of block and distributed RAMs to reduce block RAM count in certain configurations. |
complex_mult_type | Defines the types of multiplier to use for complex multiplications. |
butterfly_type | Defines the implementation used for the FFT butterfly. |
When specifying parameter values which are not integer or boolean, the HLS FFT namespace should be used.
For example, the possible values for parameter butterfly_type
in the following table are use_luts
and use_xtremedsp_slices
. The values used in the C program should be butterfly_type = hls::ip_fft::use_luts
and butterfly_type = hls::ip_fft::use_xtremedsp_slices
.
FFT Struct Parameter Values
The following table covers all features and functionality of the FFT IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.
Parameter | C Type | Default Value | Valid Values |
---|---|---|---|
input_width | unsigned | 16 | 8-34 |
output_width | unsigned | 16 | input_width to (input_width + max_nfft + 1) |
status_width | unsigned | 8 | Depends on FFT configuration |
config_width | unsigned | 16 | Depends on FFT configuration |
max_nfft | unsigned | 10 | 3-16 |
has_nfft | bool | false | True, False |
channels | unsigned | 1 | 1-12 |
arch_opt | unsigned | pipelined_streaming_io | automatically_select pipelined_streaming_io radix_4_burst_io radix_2_burst_io radix_2_lite_burst_io |
phase_factor_width | unsigned | 16 | 8-34 |
ordering_opt | unsigned | bit_reversed_order | bit_reversed_order natural_order |
ovflo | bool | true | false true |
scaling_opt | unsigned | scaled | scaled unscaled block_floating_point |
rounding_opt | unsigned | truncation | truncation convergent_rounding |
mem_data | unsigned | block_ram | block_ram distributed_ram |
mem_phase_factors | unsigned | block_ram | block_ram distributed_ram |
mem_reorder | unsigned | block_ram | block_ram distributed_ram |
stages_block_ram | unsigned | (max_nfft < 10) ? 0 : (max_nfft - 9) |
0-11 |
mem_hybrid | bool | false | false true |
complex_mult_type | unsigned | use_mults_resources | use_luts use_mults_resources use_mults_performance |
butterfly_type | unsigned | use_luts | use_luts use_xtremedsp_slices |
FFT Runtime Configuration and Status
The FFT supports runtime configuration and runtime status monitoring through the
configuration and status ports. These ports are defined as arguments to the FFT
function, shown here as variables fft_status1
and fft_config1
:
hls::fft<param1> (xn1, xk1, &fft_status1, &fft_config1);
The runtime configuration and status can be accessed using the predefined structs from the FFT C library:
- hls::ip_fft::config_t<param1>
- hls::ip_fft::status_t<param1>
The runtime configuration struct allows the following actions to be performed in the C code:
- Set the FFT length, if runtime configuration is enabled
- Set the FFT direction as forward or inverse
- Set the scaling schedule
The FFT length can be set as follows:
typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Set FFT length to 512 => log2(512) =>9
fft_config1-> setNfft(9);
max_nfft
in the
static configuration.The FFT direction can be set as follows:
typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
// Forward FFT
fft_config1->setDir(1);
// Inverse FFT
fft_config1->setDir(0);
The FFT scaling schedule can be set as follows:
typedef hls::ip_fft::config_t<param1> config_t;
config_t fft_config1;
fft_config1->setSch(0x2AB);
The output status port can be accessed using the pre-defined struct to determine:
- If any overflow occurred during the FFT
- The value of the block exponent
The FFT overflow mode can be checked as follows:
typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Check the overflow flag
bool *ovflo = fft_status1->getOvflo();
And the block exponent value can be obtained using:
typedef hls::ip_fft::status_t<param1> status_t;
status_t fft_status1;
// Obtain the block exponent
unsigned int *blk_exp = fft_status1-> getBlkExp();
Using the FFT Function
The FFT function is defined in the HLS namespace and can be called as follows:
hls::fft<STATIC_PARAM> (
INPUT_DATA_ARRAY,
OUTPUT_DATA_ARRAY,
OUTPUT_STATUS,
INPUT_RUN_TIME_CONFIGURATION);
The STATIC_PARAM
is the static parameterization
struct that defines the static parameters for the FFT.
Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY
and OUTPUT_DATA_ARRAY
). In the final implementation, the ports on the FFT
RTL block will be implemented as AXI4-Stream
ports. Xilinx recommends always using the FFT
function in a region using dataflow optimization (set_directive_dataflow
), because this ensures the arrays are
implemented as streaming arrays. An alternative is to specify both arrays as
streaming using the set_directive_stream
command.
The data types for the arrays can be float or ap_fixed.
typedef float data_t;
complex<data_t> xn[FFT_LENGTH];
complex<data_t> xk[FFT_LENGTH];
To use fixed-point data types, the Vitis HLS arbitrary precision type ap_fixed should be used.
#include "ap_fixed.h"
typedef ap_fixed<FFT_INPUT_WIDTH,1> data_in_t;
typedef ap_fixed<FFT_OUTPUT_WIDTH,FFT_OUTPUT_WIDTH-FFT_INPUT_WIDTH+1> data_out_t;
#include <complex>
typedef hls::x_complex<data_in_t> cmpxData;
typedef hls::x_complex<data_out_t> cmpxDataOut;
In both cases, the FFT should be parameterized with the same correct data sizes. In the case of floating point data, the data widths will always be 32-bit and any other specified size will be considered invalid.
The multichannel functionality of the FFT can be used by using two-dimensional arrays for the input and output data. In this case, the array data should be configured with the first dimension representing each channel and the second dimension representing the FFT data.
typedef float data_t;
static complex<data_t> xn[CHANNEL][FFT_LENGTH];
static complex<data_t> xk[CHANELL][FFT_LENGTH];
The FFT core consumes and produces data as interleaved channels (for example, ch0-data0, ch1-data0, ch2-data0, etc, ch0-data1, ch1-data1, ch2-data2, etc.). Therefore, to stream the input or output arrays of the FFT using the same sequential order that the data was read or written, you must fill or empty the two-dimensional arrays for multiple channels by iterating through the channel index first, as shown in the following example:
cmpxData in_fft[FFT_CHANNELS][FFT_LENGTH];
cmpxData out_fft[FFT_CHANNELS][FFT_LENGTH];
// Write to FFT Input Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
in_fft[j][i] = in.read().data;
}
}
// Read from FFT Output Array
for (unsigned i = 0; i < FFT_LENGTH; i++) {
for (unsigned j = 0; j < FFT_CHANNELS; ++j) {
out.data = out_fft[j][i];
}
}
Design examples using the FFT C library are provided in the Vitis HLS examples and can be accessed using menu option .
FIR Filter IP Library
The Xilinx FIR IP block can be called within a C++ design using the library hls_fir.h. This section explains how the FIR can be configured in your C++ code.
To use the FIR in your C++ code:
- Include the hls_fir.h library in the code.
- Set the static parameters using the predefined struct
hls::ip_fir::params_t
. - Call the FIR function.
- Optionally, define a runtime input configuration to modify some parameters dynamically.
The following code examples provide a summary of how each of these steps is performed. Each step is discussed in more detail below.
First, include the FIR library in the source code. This header file resides in the include directory in the Vitis HLS installation area. This directory is automatically searched when Vitis HLS executes. There is no need to specify the path to this directory if compiling inside Vitis HLS.
#include "hls_fir.h"
Define the static parameters of the FIR. This includes such static attributes
such as the input width, the coefficients, the filter rate (single
,
decimation
, hilbert
). The FIR library includes
a parameterization struct hls::ip_fir::params_t
which can be used to initialize all static parameters with default values.
In this example, the coefficients are defined as residing in array coeff_vec
and the default values for the number of
coefficients, the input width and the quantization mode are over-ridden using a user
a user-defined struct myconfig
based on the
predefined struct.
struct myconfig : hls::ip_fir::params_t {
static const double coeff_vec[sg_fir_srrc_coeffs_len];
static const unsigned num_coeffs = sg_fir_srrc_coeffs_len;
static const unsigned input_width = INPUT_WIDTH;
static const unsigned quantization = hls::ip_fir::quantize_only;
};
Create an instance of the FIR function using the HLS namespace with the defined static
parameters (myconfig
in this example) and then call the function with the run
method to
execute the function. The function arguments are, in order, input data and output data.
static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out);
Optionally, a runtime input configuration can be used. In some modes of the FIR, the data on this input determines how the coefficients are used during interleaved channels or when coefficient reloading is required. This configuration can be dynamic and is therefore defined as a variable. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).
When the runtime input configuration is used, the FIR function is called with three arguments: input data, output data and input configuration.
// Define the configuration type
typedef ap_uint<8> config_t;
// Define the configuration variable
config_t fir_config = 8;
// Use the configuration in the FFT
static hls::FIR<param1> fir1;
fir1.run(fir_in, fir_out, &fir_config);
Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option .
FIR Static Parameters
The static parameters of the FIR define how the FIR IP is parameterized and specifies non-dynamic items such as the input and output widths, the number of fractional bits, the coefficient values, the interpolation and decimation rates. Most of these configurations have default values: there are no default values for the coefficients.
The hls_fir.h header file defines a struct hls::ip_fir::params_t
that can be used to set
the default values for most of the static parameters.
In this example, a new user struct my_config
is defined and with a new value for the coefficients. The coefficients are specified as residing in array coeff_vec
. All other parameters to the FIR use the default values.
struct myconfig : hls::ip_fir::params_t {
static const double coeff_vec[sg_fir_srrc_coeffs_len];
};
static hls::FIR<myconfig> fir1;
fir1.run(fir_in, fir_out);
FIR Static Parameters describes the parameters used for the parametrization struct
hls::ip_fir::params_t
. FIR Struct Parameter Values provides the default values for the
parameters and a list of possible values.
FIR Struct Parameters
Parameter | Description |
---|---|
input_width | Data input port width |
input_fractional_bits | Number of fractional bits on the input port |
output_width | Data output port width |
output_fractional_bits | Number of fractional bits on the output port |
coeff_width | Bit-width of the coefficients |
coeff_fractional_bits | Number of fractional bits in the coefficients |
num_coeffs | Number of coefficients |
coeff_sets | Number of coefficient sets |
input_length | Number of samples in the input data |
output_length | Number of samples in the output data |
num_channels | Specify the number of channels of data to process |
total_num_coeff | Total number of coefficients |
coeff_vec[total_num_coeff] | The coefficient array |
filter_type | The type implementation used for the filter |
rate_change | Specifies integer or fractional rate changes |
interp_rate | The interpolation rate |
decim_rate | The decimation rate |
zero_pack_factor | Number of zero coefficients used in interpolation |
rate_specification | Specify the rate as frequency or period |
hardware_oversampling_rate | Specify the rate of over-sampling |
sample_period | The hardware oversample period |
sample_frequency | The hardware oversample frequency |
quantization | The quantization method to be used |
best_precision | Enable or disable the best precision |
coeff_structure | The type of coefficient structure to be used |
output_rounding_mode | Type of rounding used on the output |
filter_arch | Selects a systolic or transposed architecture |
optimization_goal | Specify a speed or area goal for optimization |
inter_column_pipe_length | The pipeline length required between DSP columns |
column_config | Specifies the number of DSP module columns |
config_method | Specifies how the DSP module columns are configured |
coeff_padding | Number of zero padding added to the front of the filter |
When specifying parameter values that are not integer or boolean, the HLS FIR namespace should be used.
For example the possible values for rate_change
are shown in the following table to be integer
and fixed_fractional
. The values used in the C program should be rate_change = hls::ip_fir::integer
and rate_change = hls::ip_fir::fixed_fractional
.
FIR Struct Parameter Values
The following table covers all features and functionality of the FIR IP. Features and functionality not described in this table are not supported in the Vitis HLS implementation.
Parameter | C Type | Default Value | Valid Values |
---|---|---|---|
input_width | unsigned | 16 | No limitation |
input_fractional_bits | unsigned | 0 | Limited by size of input_width |
output_width | unsigned | 24 | No limitation |
output_fractional_bits | unsigned | 0 | Limited by size of output_width |
coeff_width | unsigned | 16 | No limitation |
coeff_fractional_bits | unsigned | 0 | Limited by size of coeff_width |
num_coeffs | bool | 21 | Full |
coeff_sets | unsigned | 1 | 1-1024 |
input_length | unsigned | 21 | No limitation |
output_length | unsigned | 21 | No limitation |
num_channels | unsigned | 1 | 1-1024 |
total_num_coeff | unsigned | 21 | num_coeffs * coeff_sets |
coeff_vec[total_num_coeff] | double array | None | Not applicable |
filter_type | unsigned | single_rate | single_rate, interpolation, decimation, hilbert_filter, interpolated |
rate_change | unsigned | integer | integer, fixed_fractional |
interp_rate | unsigned | 1 | 1-1024 |
decim_rate | unsigned | 1 | 1-1024 |
zero_pack_factor | unsigned | 1 | 1-8 |
rate_specification | unsigned | period | frequency, period |
hardware_oversampling_rate | unsigned | 1 | No Limitation |
sample_period | bool | 1 | No Limitation |
sample_frequency | unsigned | 0.001 | No Limitation |
quantization | unsigned | integer_coefficients | integer_coefficients, quantize_only, maximize_dynamic_range |
best_precision | unsigned | false | false true |
coeff_structure | unsigned | non_symmetric | inferred, non_symmetric, symmetric, negative_symmetric, half_band, hilbert |
output_rounding_mode | unsigned | full_precision | full_precision, truncate_lsbs, non_symmetric_rounding_down, non_symmetric_rounding_up, symmetric_rounding_to_zero, symmetric_rounding_to_infinity, convergent_rounding_to_even, convergent_rounding_to_odd |
filter_arch | unsigned | systolic_multiply_accumulate | systolic_multiply_accumulate, transpose_multiply_accumulate |
optimization_goal | unsigned | area | area, speed |
inter_column_pipe_length | unsigned | 4 | 1-16 |
column_config | unsigned | 1 | Limited by number of DSP macrocells used |
config_method | unsigned | single | single, by_channel |
coeff_padding | bool | false | false true |
Using the FIR Function
The FIR function is defined in the HLS namespace and can be called as follows:
// Create an instance of the FIR
static hls::FIR<STATIC_PARAM> fir1;
// Execute the FIR instance fir1
fir1.run(INPUT_DATA_ARRAY, OUTPUT_DATA_ARRAY);
The STATIC_PARAM
is the static parameterization
struct that defines most static parameters for the FIR.
Both the input and output data are supplied to the function as arrays (INPUT_DATA_ARRAY
and OUTPUT_DATA_ARRAY
). In the final implementation, these ports on the
FIR IP will be implemented as AXI4-Stream ports.
Xilinx recommends always using the FIR
function in a region using the dataflow optimization (set_directive_dataflow
), because this ensures the arrays are
implemented as streaming arrays. An alternative is to specify both arrays as
streaming using the set_directive_stream
command.
The multichannel functionality of the FIR is supported through interleaving the data in a single input and single output array.
- The size of the input array should be large enough to accommodate all samples:
num_channels * input_length
. - The output array size should be specified to contain all output samples:
num_channels * output_length
.
The following code example demonstrates, for two channels, how the data is
interleaved. In this example, the top-level function has two channels of input data
(din_i
, din_q
)
and two channels of output data (dout_i
, dout_q
). Two functions, at the front-end (fe) and
back-end (be) are used to correctly order the data in the FIR input array and
extract it from the FIR output array.
void dummy_fe(din_t din_i[LENGTH], din_t din_q[LENGTH], din_t out[FIR_LENGTH]) {
for (unsigned i = 0; i < LENGTH; ++i) {
out[2*i] = din_i[i];
out[2*i + 1] = din_q[i];
}
}
void dummy_be(dout_t in[FIR_LENGTH], dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {
for(unsigned i = 0; i < LENGTH; ++i) {
dout_i[i] = in[2*i];
dout_q[i] = in[2*i+1];
}
}
void fir_top(din_t din_i[LENGTH], din_t din_q[LENGTH],
dout_t dout_i[LENGTH], dout_t dout_q[LENGTH]) {
din_t fir_in[FIR_LENGTH];
dout_t fir_out[FIR_LENGTH];
static hls::FIR<myconfig> fir1;
dummy_fe(din_i, din_q, fir_in);
fir1.run(fir_in, fir_out);
dummy_be(fir_out, dout_i, dout_q);
}
Optional FIR Runtime Configuration
In some modes of operation, the FIR requires an additional input to configure how the coefficients are used. For a complete description of which modes require this input configuration, refer to the FIR Compiler LogiCORE IP Product Guide (PG149).
This input configuration can be performed in the C code using a standard ap_int.h 8-bit data type. In this example, the header file fir_top.h
specifies the use of the FIR and ap_fixed
libraries, defines a number of the design parameter values and then defines some fixed-point types based on these:
#include "ap_fixed.h"
#include "hls_fir.h"
const unsigned FIR_LENGTH = 21;
const unsigned INPUT_WIDTH = 16;
const unsigned INPUT_FRACTIONAL_BITS = 0;
const unsigned OUTPUT_WIDTH = 24;
const unsigned OUTPUT_FRACTIONAL_BITS = 0;
const unsigned COEFF_WIDTH = 16;
const unsigned COEFF_FRACTIONAL_BITS = 0;
const unsigned COEFF_NUM = 7;
const unsigned COEFF_SETS = 3;
const unsigned INPUT_LENGTH = FIR_LENGTH;
const unsigned OUTPUT_LENGTH = FIR_LENGTH;
const unsigned CHAN_NUM = 1;
typedef ap_fixed<INPUT_WIDTH, INPUT_WIDTH - INPUT_FRACTIONAL_BITS> s_data_t;
typedef ap_fixed<OUTPUT_WIDTH, OUTPUT_WIDTH - OUTPUT_FRACTIONAL_BITS> m_data_t;
typedef ap_uint<8> config_t;
In the top-level code, the information in the header file is included, the static parameterization struct is created using the same constant values used to specify the bit-widths, ensuring the C code and FIR configuration match, and the coefficients are specified. At the top-level, an input configuration, defined in the header file as 8-bit data, is passed into the FIR.
#include "fir_top.h"
struct param1 : hls::ip_fir::params_t {
static const double coeff_vec[total_num_coeff];
static const unsigned input_length = INPUT_LENGTH;
static const unsigned output_length = OUTPUT_LENGTH;
static const unsigned num_coeffs = COEFF_NUM;
static const unsigned coeff_sets = COEFF_SETS;
};
const double param1::coeff_vec[total_num_coeff] =
{6,0,-4,-3,5,6,-6,-13,7,44,64,44,7,-13,-6,6,5,-3,-4,0,6};
void dummy_fe(s_data_t in[INPUT_LENGTH], s_data_t out[INPUT_LENGTH],
config_t* config_in, config_t* config_out)
{
*config_out = *config_in;
for(unsigned i = 0; i < INPUT_LENGTH; ++i)
out[i] = in[i];
}
void dummy_be(m_data_t in[OUTPUT_LENGTH], m_data_t out[OUTPUT_LENGTH])
{
for(unsigned i = 0; i < OUTPUT_LENGTH; ++i)
out[i] = in[i];
}
// DUT
void fir_top(s_data_t in[INPUT_LENGTH],
m_data_t out[OUTPUT_LENGTH],
config_t* config)
{
s_data_t fir_in[INPUT_LENGTH];
m_data_t fir_out[OUTPUT_LENGTH];
config_t fir_config;
// Create struct for config
static hls::FIR<param1> fir1;
//==================================================
// Dataflow process
dummy_fe(in, fir_in, config, &fir_config);
fir1.run(fir_in, fir_out, &fir_config);
dummy_be(fir_out, out);
//==================================================
}
Design examples using the FIR C library are provided in the Vitis HLS examples and can be accessed using menu option .
DDS IP Library
You can use the Xilinx Direct Digital Synthesizer (DDS) IP block within a C++ design using the hls_dds.h
library. This section explains how to configure DDS IP in your C++ code.
none
mode for Phase_Offset, but it does
not support programmable
and streaming
modes for
these parameters.To use the DDS in the C++ code:
- Include the
hls_dds.h
library in the code. - Set the default parameters using the pre-defined struct
hls::ip_dds::params_t
. - Call the DDS function.
First, include the DDS library in the source code. This header file resides in the include directory in the Vitis HLS installation area, which is automatically searched when Vitis HLS executes.
#include "hls_dds.h"
Define the static parameters of the DDS. For example, define the phase width,
clock rate, and phase and increment offsets. The DDS C library includes a
parameterization struct hls::ip_dds::params_t
,
which is used to initialize all static parameters with default values. By redefining
any of the values in this struct, you can customize the implementation.
The following example shows how to override the default values for the phase
width, clock rate, phase offset, and the number of channels using a user-defined
struct param1
, which is based on the existing
predefined struct hls::ip_dds::params_t
:
struct param1 : hls::ip_dds::params_t {
static const unsigned Phase_Width = PHASEWIDTH;
static const double DDS_Clock_Rate = 25.0;
static const double PINC[16];
static const double POFF[16];
};
Create an instance of the DDS function using the HLS namespace with the defined
static parameters (for example, param1
). Then, call
the function with the run method to execute the function. Following are the data and
phase function arguments shown in order:
static hls::DDS<config1> dds1;
dds1.run(data_channel, phase_channel);
To access design examples that use the DDS C library, select
.DDS Static Parameters
The static parameters of the DDS define how to configure the DDS, such as the
clock rate, phase interval, and modes. The hls_dds.h header file defines an hls::ip_dds::params_t
struct, which sets the default values for the
static parameters. To use the default values, you can use the parameterization
struct directly with the DDS function.
static hls::DDS< hls::ip_dds::params_t > dds1;
dds1.run(data_channel, phase_channel);
The following table describes the parameters for the hls::ip_dds::params_t
parameterization struct.
Parameter | Description |
---|---|
DDS_Clock_Rate |
Specifies the clock rate for the DDS output. |
Channels |
Specifies the number of channels. The DDS and phase generator can support up to 16 channels. The channels are time-multiplexed, which reduces the effective clock frequency per channel. |
Mode_of_Operation |
Specifies one of the following operation modes: Standard mode for use when the accumulated phase can be truncated before it is used to access the SIN/COS LUT. Rasterized mode for use when the desired frequencies and system clock are related by a rational fraction. |
Modulus |
Describes the relationship between the system clock frequency and the desired
frequencies. Use this parameter in rasterized mode only. |
Spurious_Free_Dynamic_Range |
Specifies the targeted purity of the tone produced by the DDS. |
Frequency_Resolution |
Specifies the minimum frequency resolution in Hz and determines the Phase Width used by the phase accumulator, including associated phase increment (PINC) and phase offset (POFF) values. |
Noise_Shaping |
Controls whether to use phase truncation, dithering, or Taylor series correction. |
Phase_Width |
Sets the width of the following:
For rasterized mode, the phase width is
fixed as the number of bits required to describe the valid input
range |
Output_Width |
Sets the width of SINE and COSINE fields within m_axis_data_tdata . The SFDR provided by
this parameter depends on the selected Noise Shaping option. |
Phase_Increment |
Selects the phase increment value. |
Phase_Offset |
Selects the phase offset value. |
Output_Selection |
Sets the output selection to SINE,
COSINE, or both in the
m_axis_data_tdata bus. |
Negative_Sine |
Negates the SINE field at runtime. |
Negative_Cosine |
Negates the COSINE field at runtime. |
Amplitude_Mode |
Sets the amplitude to full range or unit circle. |
Memory_Type |
Controls the implementation of the SIN/COS LUT. |
Optimization_Goal |
Controls whether the implementation decisions target highest speed or lowest resource. |
DSP48_Use |
Controls the implementation of the phase accumulator and addition stages for phase offset, dither noise addition, or both. |
Latency_Configuration |
Sets the latency of the core to the optimum value based upon the Optimization Goal. |
Latency |
Specifies the manual latency value. |
Output_Form |
Sets the output form to two’s complement or to sign and magnitude. In general, the output of SINE and COSINE is in two’s complement form. However, when quadrant symmetry is used, the output form can be changed to sign and magnitude. |
PINC[XIP_DDS_CHANNELS_MAX] |
Sets the values for the phase increment for each output channel. |
POFF[XIP_DDS_CHANNELS_MAX] |
Sets the values for the phase offset for each output channel. |
DDS Struct Parameter Values
The following table shows the possible values for the hls::ip_dds::params_t
parameterization struct parameters.
Parameter | C Type | Default Value | Valid Values |
---|---|---|---|
DDS_Clock_Rate | double | 20.0 | Any double value |
Channels | unsigned | 1 | 1 to 16 |
Mode_of_Operation | unsigned | XIP_DDS_MOO_CONVENTIONAL | XIP_DDS_MOO_CONVENTIONAL truncates the accumulated phase. XIP_DDS_MOO_RASTERIZED selects rasterized mode. |
Modulus | unsigned | 200 | 129 to 256 |
Spurious_Free_Dynamic_Range | double | 20.0 | 18.0 to 150.0 |
Frequency_Resolution | double | 10.0 | 0.000000001 to 125000000 |
Noise_Shaping | unsigned | XIP_DDS_NS_NONE | XIP_DDS_NS_NONE produces phase truncation DDS. XIP_DDS_NS_DITHER uses phase dither to improve SFDR at the expense of increased noise floor. XIP_DDS_NS_TAYLOR interpolates sine/cosine values using the otherwise discarded bits from phase truncation XIP_DDS_NS_AUTO automatically determines noise-shaping. |
Phase_Width | unsigned | 16 | Must be an integer multiple of 8 |
Output_Width | unsigned | 16 | Must be an integer multiple of 8 |
Phase_Increment | unsigned | XIP_DDS_PINCPOFF_FIXED | XIP_DDS_PINCPOFF_FIXED fixes PINC at generation time, and PINC cannot be
changed at runtime. This is the only value supported. |
Phase_Offset | unsigned | XIP_DDS_PINCPOFF_NONE | XIP_DDS_PINCPOFF_NONE does not generate phase offset. XIP_DDS_PINCPOFF_FIXED fixes POFF at generation time, and POFF cannot be changed at runtime. |
Output_Selection | unsigned | XIP_DDS_OUT_SIN_AND_COS | XIP_DDS_OUT_SIN_ONLY produces sine output only. XIP_DDS_OUT_COS_ONLY produces cosine output only. XIP_DDS_OUT_SIN_AND_COS produces both sin and cosine output. |
Negative_Sine | unsigned | XIP_DDS_ABSENT | XIP_DDS_ABSENT produces standard sine wave. XIP_DDS_PRESENT negates sine wave. |
Negative_Cosine | bool | XIP_DDS_ABSENT | XIP_DDS_ABSENT produces standard sine wave. XIP_DDS_PRESENT negates sine wave. |
Amplitude_Mode | unsigned | XIP_DDS_FULL_RANGE | XIP_DDS_FULL_RANGE normalizes amplitude to the output width with the binary
point in the first place. For example, an 8-bit output has a binary
amplitude of 100000000 - 10 giving values between 01111110 and
11111110, which corresponds to just less than 1 and just more than
-1, respectively. XIP_DDS_UNIT_CIRCLE normalizes amplitude to half full range, that is, values range from 01000 .. (+0.5). to 110000 .. (-0.5). |
Memory_Type | unsigned | XIP_DDS_MEM_AUTO | XIP_DDS_MEM_AUTO selects distributed ROM for small cases where the table can
be contained in a single layer of memory and selects block ROM for
larger cases. XIP_DDS_MEM_BLOCK always uses block RAM. XIP_DDS_MEM_DIST always uses distributed RAM. |
Optimization_Goal | unsigned | XIP_DDS_OPTGOAL_AUTO | XIP_DDS_OPTGOAL_AUTO automatically selects the optimization goal. XIP_DDS_OPTGOAL_AREA optimizes for area. XIP_DDS_OPTGOAL_SPEED optimizes for performance. |
DSP48_Use | unsigned | XIP_DDS_DSP_MIN | XIP_DDS_DSP_MIN implements the phase accumulator and the stages for phase
offset, dither noise addition, or both in FPGA logic. XIP_DDS_DSP_MAX implements the phase accumulator and the phase offset, dither noise addition, or both using DSP slices. In the case of single channel, the DSP slice can also provide the register to store programmable phase increment, phase offset, or both and thereby, save further fabric resources. |
Latency_Configuration | unsigned | XIP_DDS_LATENCY_AUTO | XIP_DDS_LATENCY_AUTO automatically determines he latency. XIP_DDS_LATENCY_MANUAL manually specifies the latency using the Latency option. |
Latency | unsigned | 5 | Any value |
Output_Form | unsigned | XIP_DDS_OUTPUT_TWOS | XIP_DDS_OUTPUT_TWOS outputs two's complement. XIP_DDS_OUTPUT_SIGN_MAG outputs signed magnitude. |
PINC[XIP_DDS_CHANNELS_MAX] | unsigned array | {0} | Any value for the phase increment for each channel |
POFF[XIP_DDS_CHANNELS_MAX] | unsigned array | {0} | Any value for the phase offset for each channel |
SRL IP Library
C code is written to satisfy several different requirements: reuse, readability, and performance. Until now, it is unlikely that the C code was written to result in the most ideal hardware after high-level synthesis.
Like the requirements for reuse, readability, and performance, certain coding techniques or pre-defined constructs can ensure that the synthesis output results in more optimal hardware or to better model hardware in C for easier validation of the algorithm.
Mapping Directly into SRL Resources
Many C algorithms sequentially shift data through arrays. They add a new value to the start of the array, shift the existing data through array, and drop the oldest data value. This operation is implemented in hardware as a shift register.
This most common way to implement a shift register from C into hardware is to completely partition the array into individual elements, and allow the data dependencies between the elements in the RTL to imply a shift register.
Logic synthesis typically implements the RTL shift register into a Xilinx SRL resource, which efficiently implements shift registers. The issue is that sometimes logic synthesis does not implement the RTL shift register using an SRL component:
- When data is accessed in the middle of the shift register, logic synthesis cannot directly infer an SRL.
- Sometimes, even when the SRL is ideal, logic synthesis may implement the shift-resister in flip-flops, due to other factors. (Logic synthesis is also a complex process).
Vitis HLS provides a C++ class (ap_shift_reg
) to ensure that the shift register defined in the C code is always implemented using an SRL resource. The ap_shift_reg
class has two methods to perform the various read and write accesses supported by an SRL component.
Read from the Shifter
The read method allows a specified location to be read from the shifter register.
The ap_shift_reg.h header file that defines
the ap_shift_reg
class is also included with
Vitis HLS as a standalone package. You
have the right to use it in your own source code. The package xilinx_hls_lib_<release_number>.tgz is located
in the include directory in the Vitis HLS installation area.
// Include the Class
#include "ap_shift_reg.h"
// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;
// Read location 2 of Sreg into var1
var1 = Sreg.read(2);
Read, Write, and Shift Data
A shift
method allows a read, write, and shift operation to be performed.
// Include the Class
#include "ap_shift_reg.h"
// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1;
// Read location 3 of Sreg into var1
// THEN shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3);
Read, Write, and Enable-Shift
The shift
method also supports an enabled input, allowing the shift process to be controlled and enabled by a variable.
// Include the Class
#include "ap_shift_reg.h"
// Define a variable of type ap_shift_reg<type, depth>
// - Sreg must use the static qualifier
// - Sreg will hold integer data types
// - Sreg will hold 4 data values
static ap_shift_reg<int, 4> Sreg;
int var1, In1;
bool En;
// Read location 3 of Sreg into var1
// THEN if En=1
// Shift all values up one and load In1 into location 0
var1 = Sreg.shift(In1,3,En);
When using the ap_shift_reg
class, Vitis HLS creates a unique RTL component for
each shifter. When logic synthesis is performed, this component is synthesized into
an SRL resource.