Overview - betway必威登陆

Core Overview

The Xilinx® Deep Learning Processing Unit (DPU) is a programmable engine optimized for convolutional neural networks. It is composed of a high performance scheduler module, a hybrid computing array module, an instruction fetch unit module, and a global memory pool module. The DPU uses a specialized instruction set, which allows for the efficient implementation of many convolutional neural networks. Some examples of convolutional neural networks which have been deployed include VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, and FPN among others.

The DPU IP can be implemented in the programmable logic (PL) of the selected Zynq® UltraScale+™ MPSoC device with direct connections to the processing system (PS). The DPU requires instructions to implement a neural network and accessible memory locations for input images as well as temporary and output data. A program running on the application processing unit (APU) is also required to service interrupts and coordinate data transfers.

The top-level block diagram of the DPU is shown in the following figure.

where,

APU - Application Processing Unit
PE - Processing Engine
DPU - Deep Learning Processing Unit
RAM - Random Access Memory

Navigating Content by Design Process

Xilinx® documentation is organized around a set of standard design processes to help you find relevant content for your current development task. All Versal™ ACAP design process Design Hubs can be found on the Xilinx.com website. This document covers the following design processes:

System and Solution Planning

Identifying the components, performance, I/O, and data transfer requirements at a system level. Includes application mapping for the solution to PS, PL, and AI Engine. Topics in this document that apply to this design process include:

DPU Configuration

Hardware, IP, and Platform Development

Creating the PL IP blocks for the hardware platform, creating PL kernels, functional simulation, and evaluating the Vivado® timing, resource use, and power closure. Also involves developing the hardware platform for system integration. Topics in this document that apply to this design process include:

System Integration and Validation

Integrating and validating the system functional performance, including timing, resource use, and power closure. Topics in this document that apply to this design process include:

Hardware Architecture

The detailed hardware architecture of the DPU is shown in the following figure. After start-up, the DPU fetches instructions from the off-chip memory to control the operation of the computing engine. The instructions are generated by the Vitis™ AI compiler, where substantial optimizations are performed.

On-chip memory is used to buffer input, intermediate, and output data to achieve high throughput and efficiency. The data is reused as much as possible to reduce the external memory bandwidth. A deep pipelined design is used for the computing engine. The processing elements (PE) take full advantage of the fine-grained building blocks such as multipliers, adders, and accumulators in Xilinx devices.

DPU with Enhanced Usage of DSP

A DSP Double Data Rate (DDR) technique is used to improve the performance achieved with the device. Therefore, two input clocks for the DPU are needed: One for general logic and another at twice the frequency for DSP slices. The difference between a DPU not using the DSP DDR technique and a DPU enhanced usage architecture is shown here.

Note: All DPU architectures referred to in this document refer to DPU enhanced usage, unless otherwise specified.

Figure 3: Difference between DPU without DSP DDR and DPU Enhanced Usage

Development Tools

Two flows are supported for integrating the DPU into your project: the Vivado flow and the Vitis™ flow.

The Xilinx Vivado® Design Suite is required to integrate the DPU into your projects for the Vivado flow. Vivado Design Suite 2021.1 or later version is recommended. Contact your local sales representative if the project requires an older version of Vivado.

The Vitis unified software platform 2021.1 or later is required to integrate the DPU for the Vitis flow.

Device Resources

The DPU logic resource usage is scalable across UltraScale+™ MPSoC devices. For more information on resource utilization, see the DPU Configuration.

DPU Development Flow

The DPU requires a device driver which is included in the Xilinx Vitis™ AI development kit.

Free developer resources can be obtained from the Xilinx website: https://github.com/Xilinx/Vitis-AI.

The Vitis AI User Guide (UG1414) describes how to use the DPU with the Vitis AI tools. The basic development flow is shown in the following figure. First, use Vivado/ Vitis to generate the bitstream. Then, download the bitstream to the target board and install the related driver. For instructions on installing the related driver and dependent libraries, see the Vitis AI User Guide (UG1414).

Example System with the DPUCZDX8G

The figure below shows an example system block diagram with the Xilinx® UltraScale+™ MPSoC using a camera input. The DPU is integrated into the system through an AXI interconnect to perform deep learning inference tasks such as image classification, object detection, and semantic segmentation.

Figure 5: Example System with Integrated DPU

Vitis AI Development Kit

The Vitis AI development environment is used for AI inference on Xilinx hardware platforms. It consists of optimized IP cores, tools, libraries, models, and example designs.

As shown in the following figure, the Vitis AI development kit consists of AI Compiler, AI Quantizer, AI Optimizer, AI Profiler, AI Library, and Xilinx Runtime Library (XRT).

For more information of the Vitis AI development kit, see the Vitis AI User Guide in the Vitis AI User Documentation (UG1431).

You can download the Vitis AI development kit for free from here.