What's New
Vitis Unified Software Platform
- 500+ FPGA-accelerated functions spread across 11 open-source Vitis libraries
- New Vitis HLS compiler for custom C/C++ kernel design with familiar programming constructs
- Improved RTL kernel integration within Vitis applications
- Higher-level Xilinx Runtime (XRT) Library APIs for easier communication with deployed Kernels
- Better visibility into kernel and system performance and actionable insights for improving performance
- Enhancements for easier custom Vitis target platform creation for embedded platforms
- Open-source Xilinx FPGA Resource Manager (XRM) for server-based computing orchestration
Vitis Quantitative Finance Library
- 5 new Level-3 (L3) Host-callable APIs with Python bindings (pybind11)
- 11 new Level-2 (L2) Kernels
- 25 new Level-1 (L1) Primitives
Vitis BLAS Library
- Level-3 API Enhancements
- The following enhancements were made.
- Python Host API added for General Matrix Multiply (GeMM).
- GeMM floating point benchmark results available for Alveo U250 Data Center accelerator cards.
- New application example added using Python APIs in MLP Keras framework.
- 3x-12x speed-up for four-layer fully connected MLP inference compared to CPU on AWS C5n.4xLarge node.
- Performance Improvements
- The following performance improvements can be observed.
- 3x-4x speed-up for GeMM API compared to 2019.2.
- For matrix size < 1024, 2x-50x speed-up compared to MKL GEMM API for float type.
Vitis Solver Library
- New Level-1 Function
- Added sqrt function.
Vitis DSP Library
- New Performance Benchmarks
- The following performance benchmarks were added.
- Level-2 Kernels added for benchmarking FFT APIs.
- Enables rapid prototyping and performance evaluations.
- Level-1 Function Examples
- New 2D FFT fixed-point and floating-point examples.
Vitis Vision Library
- Letter Box (Level-3)
- Image scaling algorithm, while preserving aspect ratio.
- ISP Pipeline example design (Vitis_Libraries/vision/L3/examples/)
-
- Demonstrates end-end streaming camera processing pipeline.
- Uses Vitis vision functions, such as Channel Gain, Demosaic, Auto White Balance, Gamma Correction, and Bad Pixel Correction.
- Auto white balance algorithm
- Enhanced
- Vitis HLS compatible
- Easier to design custom kernels.
- The
xf::cv::Mat
class memberdata
, which was previously a pointer, is now changed to anhls::stream
type. - The
read
,write
,read_float
, andwrite_float
member functions that facilitate the data access ofxf::cv::Mat
have been updated accordingly. - Enhanced
Array2xfMat
andxfMat2Array
utility functions. - The L1 host functions targeting HLS flow are updated to have pointers at
the interface instead of
xf::cv::Mat
, similar to L2 functions. All testbench and config files have been updated accordingly.
- The
- Library infrastructure enhancements
- The following infrastructure enhancements were made.
- L2/L3 Makefiles use smaller images for faster verification using software and hardware emulation.
- All JSON files have been updated to support automatic creation of projects in the Vitis IDE.
- Makefiles and JSON files are moved out of the build folder in the examples directory with the host source files.
- Emulation flow for embedded devices has been updated in all the Makefiles to use a Perl-based script. Added the corresponding script in the ext/make_utility folder.
- The data folders inside individual L1 examples have been removed. All input arguments are now provided in the top-level data folder in the Makefiles and JSON files.
- 2020.1 code base is not backward-compatible
- All functions in the library must be built with 2020.1 Vitis/Vivado® tools only. None of the functions in this release can be used with any of the previous versions of Vitis or Vivado.
Vitis Database Library
- Compound sort API (
compoundSort
) - Previously three sort algorithm modules were provided, and this new API
combines
insertSort
andmergeSort
to provide a more scalable solution for on-chip sorting. When working with 32-bit integer keys, URAM resource on one SLR could support the design to scale to 2M entries. - Better HBM bandwidth usage in hash-join (
hashJoinV3
) - In the 2019.2 Alveo U280 shell, ECC
was enabled. Therefore, the sub-ECC size write to HBM becomes
read-modify-write, and wastes some bandwidth. The
hashJoinV3
primitive in this release has been modified to use 256-bit port to avoid this problem.
Vitis Utility Library
- Read-only cache
- This API stores history data recently loaded from DDR/HBM in the on-chip memory (URAM). It reduces DDR/HBM access when the memory is accessed randomly.
- Better HBM bandwidth usage in hash-join (
hashJoinV3
) - In 2019.2 Alveo U280 Data Center accelerator cards, ECC has been enabled. Therefore, the sub-ECC size write to HBM becomes read-modify-write, and wastes some bandwidth. To avoid this problem, the hashJoinV3 primitive in this release has been modified to use 256-bit port.
- AXI Master Read without e signal
- This API provides buffered read from AXI master into stream, assuming that the receiver of the stream knows the number of elements to process.
Vitis Graph Library
Performance-optimized functions to accelerate graph analytics. Example use cases include machine learning, genomics, recommendation systems, search engines, social network analysis, and traffic-based path planning.
Available as Level-2 Kernel functions.
- Centrality analysis
- Page Rank algorithm
- Pathfinding
- Single Source Shortest Path algorithm
- Connectivity analysis
- Weakly connected components, strongly connected components
- Community Detection
- Label propagation and triangle count algorithms
- Search
- Breadth First Search algorithm
- Graph Format
- Calculate Degree and Format Convert between CSR and CSC
Vitis Data Analytics Library
Offers performance-optimized functions to accelerate data analytics pipelines.
- Classification
-
- Decision Tree
- Random Forest
- Logistic Regression
- Linear Support Vector Machine
- Naive Bayes
- Regression
-
- Linear Least Square Regression
- LASSO Regression
- Ridge Regression
- Clustering
- K-Means
- Optimization Framework
-
- Stochastic Gradient Descent
- L-BFGS
- Performance Highlights
- Enhancements include:
- Training of Naive Bayes achieves 319X acceleration on Dataset "news20" with U250 against Spark MLLib with Intel® Xeon™ CPU E5-2667.
- Training of Decision Tree classification achieves 23x acceleration on Dataset "Heterogeneity Activity Recognition" with U250 against Spark MLLib with Intel Xeon CPU E5-2667.
- Training of Random Forest classification achieves 15x acceleration on Dataset "HIGGS" with U250 against Spark MLLib with Intel Xeon CPU E5-2667.
Vitis Compiler and Linker (v++)
- v++ calls Vitis HLS compiler by default
- It inherits all enhancements included in Vitis HLS.
- v++ linker enhancements
- New linker option to insert FIFOs with user specified depth on streaming connections. In addition, the v++ linker automatically adds clock domain crossing (CDC) logic and data width converter (DWC) logic where necessary when connecting streaming interfaces. This eliminates the requirement of manual instantiation.
- New
v++ --package
stage - Enables generating and packaging all the components needed for booting designs on Emulation and Hardware platforms. This step is at the end of compiling and linking processes to generate the components for booting a design on emulation and hardware target. For more details, refer to Vitis Accelerated Software Development Flow Documentation in the Application Acceleration Development flow of the Vitis Unified Software Platform Documentation (UG1416) to learn about the options and usage.
- Enhancements to RTL Kernel Import
- Easily package and reuse RTL IPs within a Vitis application. All necessary files required by Vitis (kernel.xml) are automatically generated from the component.xml file. This simplifies the RTL kernel design process and makes it more error prone.
- EoU and productivity enhancement on v++
- This includes improvements on many different areas, such as better messaging, IP cache sharing, and improved support for Tcl hooks in all Vivado steps.
- Support user-specified FREQ_HZ with customizable error margin
- Specify clock frequencies with tolerances through
v++
linker option and clocking connectivity will be automatically handled by the tool. This is only for embedded platforms for this release.
Vitis HLS
- New high-level synthesis tool engine and Vitis HLS interface for C-based kernel compilation.
- Vitis HLS brings C++14 support (constexpr, variadic templates, auto for type inference, initializer separator, string literals and user-defined literals, range based for-loops, static_assert, and so on).
- Two main behavior presets based on the desired target flow
vitis
orvivado
. This target flow can be selected in the "Solutions Settings" widget under "Synthesis" or via Tcl with the newflow_target
option foropen_solution
. - The Vitis HLS Migration Guide (UG1391) helps transition from Vivado HLS to Vitis HLS for C/C++ kernel design in the Vitis tool.
- GCC struct layout style for ports compatible with the
aligned
andpacked
attributes to ensure better compatibility with host software applications. - New
disaggregate
pragma to force struct members into individual elements. - New
bind_op
andbind_storage pragmas
to constrain operators and RAM elements. - Implicit interface synthesis for kernels ports: specifying interface pragmas is now optional, resulting in less verbose kernel code.
- Interface pragmas
ap_memory
andbram
offer a newstorage_type
option to customize the external memory type (for example, to only use one port rather than the default of both). - Simplified hierarchical summary report in the IDE to easily check function hierarchy, timing, latency, throughput, and usage, and to confirm that pipeline and/or dataflow pragmas were applied.
- New automatic port resizing option for C/C++ kernels. This helps kernel ports better match the platform interface width for higher throughput.
- Code examples available as part of the Vitis acceleration C++ kernel examples on GitHub. Vitis HLS comes with its own new repository of examples also on GitHub.
- Unlabeled loops get assigned a machine-generated label to more easily differentiate between all the different loops in the design.
- Improved dataflow graphical viewer.
- Redesigned C/RTL Co-simulation widget now takes Vivado XSIM as the default simulator and introduces a new channel profiling option.
- Array of
std::complex
types can now be throughput-optimized for dataflow via theno_ctor
attribute to inhibit the performance limiting initialization. - The RTL black-box flow offers a wizard to help create the JSON configuration file.
- Stall randomization in co-simulation to validate the kernel in the presence of stalls at its interfaces.
- Function graph after C simulation helps visualize the code structure.
- Report shows the loop and function latency expressed as actual time (clock cycles of latency * period).
IDE
- Vitis Library Integration
- Vitis IDE can download Vitis accelerated libraries, create library example projects, and add libraries to include paths to acceleration applications.
- Enhancements to New Project Wizard
- Improved the New Project Wizard to provide a more intuitive flow.
- Wizard provides error or warning messages to the user as they are reported.
- The Template Selection page displays many application templates on a single page.
- Support non-project mode debugging
- Applications generated by command line mode can use the Vitis IDE to run debugging with one command.
- 32-bit App Compilation
- Compiling 32-bit application for Arm® Cortex-A53 is back.
- New project types and project relationship
- Enhanced options.
Emulation
- Improved debugging during emulation
- Enhancements include:
- Centralized logging of all the build-time
simulation failures into
v++
logs. - Uses XSIM as a default simulator.
- More built-in checks and DRCs.
- Centralized logging of all the build-time
simulation failures into
Profiling, Analysis and Visualization
- Enhanced profile summary report and design guidance report
- Profile summary report is more structured and easy to navigate. Design guidance report adds new rules and is more structured with better messaging.
- New device power section
- New section of the Profile Summary report shows board power information.
- Better visibility into system performance
- You can now do the following for better system performance
visibility:
- Overlay performance data on the System Diagram for hardware emulation and hardware runs.
- View Compute Unit (CU) statistics, including number of calls, CU utilization(%), total time(ms), and average time(ms)
- View read/write data transfer rates annotated on CU ports.
- Cross-probing between reports
- In the run summary, the guidance report has two new types
of links:
- Design object links (such as krnl_vadd)
- Click to select the object in the profile summary report and system diagram
- Value links (such as 6.789 KB)
- Click to open the profile summary report in context of the relevant section of the report
- Profile summary archives
- Save profile summary archives across multiple builds and runs to assess iterative performance improvements and share results between teams for analysis.
- Compare reports between application builds and runs
- Easily assess performance improvement as you iterate and implement optimizations by comparing reports between multiple application builds and runs. Supported reports include kernel estimate, system estimate, timing summary, and build logs.
- Improved and actionable guidance
- New guidance messages added for both Vitis and Vitis HLS. Guidance messages include web links to relevant resolution / troubleshooting documentation. Improved feedback categories (Throughput, Latency, Interface, Memory, Kernel).
- Cross-probing from guidance to system diagram
- Enhancement
Debug
- Continuous Timeline Trace Reads
- The following functionality was added.
- Provides improved accuracy by reading timeline trace data at regular user-specified intervals.
- Provides access to device-related profile information even on application hang/crash for better debug.
- Supports continuously offloading trace data while application is running.
- Supports both FIFO offload and DDR/HBM offload.
- Faster Execution with Low-overhead Timeline Trace
- This feature includes the following:
- Low-overhead tracing generates minimalistic trace necessary for debug and enables faster application runs.
- Produces a reduced timeline trace report with only host side data, therefore eliminating the overhead of device side profiling, which significantly reduces the impact on performance.
- Is enabled or disabled through the xrt.ini file and does not require recompiling the design.
- TLM Transaction View in Live Waveform Viewer
- Adds ability to enable the display of transaction-level details on AXI-MM interfaces in the live waveform viewer in the Vitis hardware emulation flow.
XRT
For information about Xilinx Runtime for this release, refer to the XRT Release Notes (UG1451).
XRM
Xilinx FPGA Resource Management (XRM) offers server-based compute orchestration capabilities based on the XRT API.
XRM is a set of APIs to manage compute units (subsets of xclbins) on a local server. Multiple applications can run on a pool of cards attached to a server. XRM then assigns the compute units based on demand and availability.
XRM is open source and available at https://github.com/Xilinx/XRM.
Embedded Platforms
- Intuitive Platform Naming Convention
-
<Vendor>_<Board>_<Feature>_<Supported Vitis Tool Version>_<Release Version>.
Example: xilinx_zcu102_base_dfx_202010_1.
- Pre-compiled common Linux components are provided
- Easier out-of-box flow. No need to install Vitis and PetaLinux in evaluation flow.
- Software package manager and package feed are provided
- You can now install common software packages on the fly. No need to compile from PetaLinux.
- Use Ext4 as default rootfs
- rootfs will not occupy DDR memory space like initramfs. Changes to the file system will be retained after reboot.
- No need for post link script
- v++ can link interrupt signals automatically and XRT can recognize these signals and control them in software.
- Easier Vitis Target Platform Export
- New wizard to package and export Xilinx Shell Archive (XSA) as Vitis Target Platform.