Libraries and Samples

Caffe Framework

The Vitis™ AI Library contains the following neural network libraries based on the Caffe framework:

TensorFlow Framework

The Vitis™ AI contains the following neural network libraries based on the TensorFlow framework:

PyTorch Framework

The Vitis™ AI supports the following type of neural network libraries based on the PyTorch framework.

The related libraries are open source and can be modified as needed. The open source codes are available on GitHub.

The Vitis™ AI Library provides image test samples and video test samples for all the above networks. In addition, the kit provides the corresponding performance test program. For video based testing, we recommend to use raw video for evaluation. Decoding by software libraries on Arm® processors may have inconsistent decoding time, which may affect the accuracy of evaluation.

Note: For Edge devices, all the sample programs can only run on the target side but they can be cross compiled on the host side or compiled on the target side.

Model Library

After the model packet is installed on the target, all the models are stored under /usr/share/vitis_ai_library/models/. Each model is stored in a separate folder, which is composed of the following files, by default:

  • [model_name].xmodel
  • [model_name].prototxt
Note: The elf model is not supported by the Vitis AI library in VAI 1.3 and later releases.

Take the "inception_v1" model as an example. inception_v1.xmodel is the model data. inception_v1.prototxt is the parameter of the model.

Note: The name of the model directory should be the same with the model name.

Model Type

Classification

The Classification library is used to classify images. Such neural networks are trained on ImageNet for ILSVRC and they can identify the objects from its 1000 classification. The Vitis AI Library integrates networks including, but not limited to, ResNet18, ResNet50, Inception_v1, Inception_v2, Inception_v3, Inception_v4, Vgg, mobilenet_v1, mobilenet_v2, and Squeezenet into Xilinx libraries. The input is a picture with an object and the output is the top-K most probable category.

Figure 1: Classification Example

The following table lists the classification models supported by the Vitis AI library.

Table 1. Classification Models
No Model Name Framework
1 inception_resnet_v2_tf TensorFlow
2 inception_v1_tf
3 inception_v3_tf
4 inception_v4_2016_09_09_tf
5 mobilenet_v1_0_25_128_tf
6 mobilenet_v1_0_5_160_tf
7 mobilenet_v1_1_0_224_tf
8 mobilenet_v2_1_0_224_tf
9 mobilenet_v2_1_4_224_tf
10 resnet_v1_101_tf
11 resnet_v1_152_tf
12 resnet_v1_50_tf
13 vgg_16_tf
14 vgg_19_tf
15 mobilenet_edge_1_0_tf
16 mobilenet_edge_0_75_tf
17 inception_v2_tf
18 MLPerf_resnet50_v1.5_tf
19 resnet50_tf2
20 mobilenet_1_0_224_tf2
21 inception_v3_tf2
22 resnet_v2_50_tf
23 resnet_v2_101_tf
24 resnet_v2_152_tf
25 efficientnet-b0_tf2
26 efficientNet-edgetpu-S_tf
27 efficientNet-edgetpu-M_tf
28 efficientNet-edgetpu-L_tf
29 resnet50 Caffe
30 resnet18
31 inception_v1
32 inception_v2
33 inception_v3
34 inception_v4
35 mobilenet_v2
36 squeezenet
37 resnet50_pt PyTorch
38 squeezenet_pt
39 inception_v3_pt

Face Detection

The Face Detection library uses the DenseBox neural network to detect human faces. The input is a picture with the faces you want to detect and the output is a vector of the result structure containing the information of each detection box. The following image shows the result of face detection.

Figure 2: Face Detection Example

The following table lists the face detection models supported by the AI Library.

Table 2. Face Detection Models
No Model Name Framework
1 densebox_320_320 Caffe
2 densebox_640_360

Face Landmark Detection

The Face Landmark network is used to detect five key points on a human face. The five points include the left eye, the right eye, the nose, the left corner of the lips, and the right corner of the lips. This network is used to correct face direction (what this means is if a face is not directly facing the camera (e.g., tilted 20 degrees left or right), it is "adjusted" to face the camera directly) before face feature extraction. The input image should be a face which is detected by the face detection network. The output of the network is the five key points. The five key points are normalized. The following image shows the result of face detection.

Figure 3: Face Landmark Detection Example

The following table lists the face landmark models supported by the AI Library.

Table 3. Face Landmark Models
No Model Name Framework
1 face_landmark Caffe

SSD Detection

The SSD Detection library is commonly used with the SSD neuron network. SSD is a neural network which is used to detect objects. The input is a picture with some objects you want to detect. The output is a vector of the result structure containing the information of each detection box. The following image shows the result of SSD detection.

Figure 4: SSD Detection Example

The following table lists the SSD detection models supported by the Vitis AI Library.

Table 4. SSD Models
No Model Name Framework
1 ssd_mobilenet_v1_coco_tf TensorFlow
2 ssd_mobilenet_v2_coco_tf
3 ssd_resnet_50_fpn_coco_tf
4 mlperf_ssd_resnet34_tf
5 ssdlite_mobilenet_v2_coco_tf
6 ssd_inception_v2_coco_tf
7 ssd_pedestrian_pruned_0_97 Caffe
8 ssd_traffic_pruned_0_9
9 ssd_adas_pruned_0_95
10 ssd_mobilenet_v2

Pose Detection

The Pose Detection library is used to detect the posture of the human body. This library includes a neural network which can identify 14 key points on the human body (you can use our SSD detection library). The input is a picture that is detected by the pedestrian detection neural network. The output is a structure containing the coordinates of each point. The following image shows the result of pose detection.

Figure 5: Pose Detection Example

The following table lists the pose detection models supported by the Vitis AI Library.

Table 5. Pose Detection Models
No Model Name Framework
1 sp_net Caffe
Note: If the input image is arbitrary and you do not know the exact location of the person, perform the SSD detection first. See the test_jpeg_posedetect_with_ssd.cpp file. The input for test_jpeg_posedetect_ssd can be any image with or without a person in it. If there is a person in the image, it will first detect the person with SSD, then send the position of the person as the input for posedetect. If the SSD detection does not identify any person in the image, then posedetect does not run. As test_jpeg_posedetect only performs posedetect, so the input image must have atleast one person. If you input an image without a person for test_jpeg_posedetect, it will throw an error. See the test_jpeg_posedetect.cpp file.

Semantic Segmentation

Semantic segmentation assigns a semantic category to each pixel in the input image, that is, it identifies pixels as part of an object, say, a car, a road, a tree, a horse, etc. Libsegmentation is a segmentation library which can be used in ADAS applications. It offers simple interfaces for a developer to deploy segmentation tasks on a Xilinx® FPGA.

The following is an example of semantic segmentation, where "blue gray" denotes the sky, "green" denotes trees, "red" denotes people, "dark blue" denotes cars, "plum" denotes the road, and "gray" denotes structures.

Figure 6: Semantic Segmentation Example

The following table lists the semantic segmentation models supported by the Vitis AI library.

Table 6. Semantic Segmentation Models
No Model Name Framework
1 fpn Caffe
2 FPN-resnet18_Endov
3 semantic_seg_citys_tf2 TensorFlow
4 mobilenet_v2_cityscapes_tf
5

SemanticFPN_cityscapes_pt

PyTorch
6 ENet_cityscapes_pt
7 unet_chaos-CT_pt
8 SemanticFPN_Mobilenetv2_pt

Road Line Detection

The Road Line Detection library is used to draw lane lines in ADAS applications. Each lane line is represented by a number representing the category. A vector<Point> is used to draw the lane line. In the test code, a color map is used. Different types of lane lines are represented by different colors. The point is stored in the container vector, and the polygon interface cv::polylines() of OpenCV is used to draw the lane line. The following image shows the result of road line detection.

Figure 7: Road Line Detection Example
The following table lists the road line detection models supported by the Vitis AI Library.
Table 7. Road Line Detection Models
No Model Name Framework
1 vpgnet_pruned_0_99 Caffe
Note: The input of the image is fixed at 480x640 and images of other sizes need to be resized.

YOLOv3 Detection

YOLO is a neural network which is used to detect objects. The current version is v3. The input is a picture with one or more objects and the output is a vector of the result struct which is composed of the detected information. The following image shows the result of YOLOv3 detection.

Figure 8: YOLOv3 Detection Example

The following table lists the YOLOv3 detection models supported by the Vitis AI library.

Table 8. YOLOv3 Detection Models
No Model Name Framework
1 yolov3_voc_tf TensorFlow
2 yolov3_adas_pruned_0_9 Caffe
3 yolov3_voc
4 yolov3_bdd
5 tiny_yolov3_vmss

YOLOv4 Detection

YOLOv4 does the same thing as YOLOv3. It is an upgrade of YOLOv3. The following table lists the YOLOv4 detection models supported by the Vitis AI Library.
Table 9. YOLOv2 Detection Models
No Model Name Framework
1 yolov4_leaky_spp_m Caffe
2 yolov4_leaky_spp_m_pruned_0_36

YOLOv2 Detection

YOLOv2 does the same thing as YOLOv3, which is an upgraded version of YOLOv2. The following table lists the YOLOv2 detection models supported by the Vitis AI Library.
Table 10. YOLOv2 Detection Models
No Model Name Framework
1 yolov2_voc Caffe
2 yolov2_voc_pruned_0_66
3 yolov2_voc_pruned_0_71
4 yolov2_voc_pruned_0_77

Openpose Detection

The Openpose Detection library is used to detect posture of the human body. The posture is represented by an array of 14 key points as shown below:
 0: head, 1: neck, 2: L_shoulder, 3:L_elbow, 4: L_wrist, 5: R_shoulder,
 6: R_elbow, 7: R_wrist, 8: L_hip, 9: L_knee, 10: L_ankle, 11: R_hip,
 12: R_knee, 13: R_ankle

The input of the network is 368x368. The following image shows the result of openpose detection.

Note: Use a square picture for input. If you need to detect pictures of other size ratios, use a network with the same input size ratio.
Figure 9: Openpose Detection Example

The following table lists the Openpose detection models supported by the Vitis AI Library.

Table 11. Openpose Detection Models
No Model Name Framework
1 openpose_pruned_0_3 Caffe

RefineDet Detection

RefineDet is a neural network that is used to detect human bodies. The input is a picture with some individuals that you would like to detect. The output is a vector of the result structure that contain each box’s information. The following image shows the result of RefineDet detection:

Figure 10: RefineDet Detection Example

The following table lists the RefineDet detection models supported by the Vitis AI Library.

Table 12. RefineDet Detection Models
No Model Name Framework
1 refinedet_pruned_0_8 Caffe
2 refinedet_pruned_0_92
3 refinedet_pruned_0_96
4 refinedet_baseline
5 refinedet_VOC_tf TensorFlow

ReID Detection

The task of person re-identification is to identify a person of interest at any time or place. This is done by extracting the image feature and comparing the features. Images of the same person should have similar features and have small feature distance, while images of different persons have large feature distance. Given a queried image and a pile of candidate images, the image that has the smallest feature distance is identified as the same person as the queried image. The following table lists the ReID detection models supported by the Vitis AI Library.

Table 13. ReID Detection Models
Number Model Name Framework
1 reid Caffe
2 personreid-res18_pt PyTorch
3

personreid-res50_pt

4

facereid-large_pt

5

facereid-small_pt

Multi-task

The multi-task library is appropriate for a model that has multiple sub-tasks. The Multi-task model in the Vitis AI Library has two sub-tasks: semantic segmentation and SSD detection. The following table listss the multi-task models supported by the Vitis AI Library.

Table 14. Multi-task Models
Number Model Name Framework
1 multi_task Caffe
2 MT-resnet18_mixed_pt PyTorch

Face Recognition

The models of face feature are used for face recognition. They can extract the features of a person's face. The output of these models are 512 features. If you have two different images and you want to know if they are of the same person, use these models to extract features of the two images, and then use calculation functions and mapped functions to get the similarity of the two images.

Figure 11: Face Recognition Example

The following table listss the face recognition models supported by the Vitis AI Library.

Table 15. Face Recognition Models
No Model Name Framework
1 facerec_resnet20 Caffe
2 facerec_resnet64
3

facerec-resnet20_mixed_pt

PyTorch

Plate Detection

The Plate Detection library uses the DenseBox neuron network to detect license plates. The input is a picture of the vehicle that is detected by the SSD and the output is a structure containing the plate location information. The following image shows the result of the plate detection.

Figure 12: Plate Detection Example

The following table lists the plate detection models supported by the Vitis AI Library.

Table 16. Plate Detection Models
No Model Name Framework
1 plate_detect Caffe

Plate Recognition

The Plate Recognition library uses a classification network to recognize license plate number (Chinese license plates only). The input is a picture of the license plate that is detected by plate detect. The output is a structure containing license plate number information. The following image shows the result of the plate recognition.

Figure 13: Plate Recognition Example

The following table lists the plate recognition models supported by the Vitis AI Library.

Table 17. Plate Recognition Models
No Model Name Framework
1 plate_num Caffe

Medical Segmentation

Endoscopy is a common clinical procedure for the early detection of cancers in hollow-organs such as nasopharyngeal cancer, esophageal adenocarcinoma, gastric cancer, colorectal cancer, and bladder cancer. Accurate and temporally consistent localization and segmentation of diseased region-of-interests enable precise quantification and mapping of lesions from clinical endoscopy videos, which is critical for monitoring and surgical planning.

The medical segmentation model is used to classify diseased region-of-interests in the input image. It can be classified into many categories, including BE, cancer, HGD, polyp, and suspicious.

Libmedicalsegmentation is a segmentation library which can be used in segmentation of multi-class diseases in endoscopy. It offers simple interfaces for developers to deploy segmentation tasks on Xilinx FPGAs. The following is an example of medical segmentation, where the goal is to mark the diseased region.

Figure 14: Marking the Diseased Region

The following is an example of semantic segmentation, where the goal is to predict class labels for each pixel in the image.

Figure 15: Medical Segmentation Example

The following table lists the medical segmentation models supported by the Vitis AI Library.

Table 18. Semantic Segmentation Models
No Model Name Framework
1 FPN_Res18_Medical_segmentation Caffe

Medical Detection

The RefineDet model is based on vgg16. It is used for medical detection and can detect five types of diseases, namely, BE, cancer, HGD, polyp, and suspicious from an input endoscopy image like the Endoscopy Disease Detection and Segmentation database (EDD2020).

Figure 16: Medical Detection Example

The following table lists the medical detection models supported by the Vitis AI Library.

Table 19. Semantic Detection Models
No Model Name Framework
1 RefineDet-Medical_EDD_tf TensorFlow

Medical Cell Segmentation

The nucleus is an organelle present within all eukaryotic cells, including human cells. Abberant nuclear shape can be used to identify cancer cells, for example, pap smear tests for the diagnosis of cervical cancer. Medical segmentation cell models offer nuclear segmentation in digital microscopic tissue images which can enable extraction of high quality features for nuclear morphometric and other analyses in computational pathology. The following images show the results of cell segmentation.

Figure 17: Medical Cell Segmentation Examples

The following table lists the Medical Cell Segmentation models supported by the Vitis AI Library.

Table 20. Medical Cell Segmentation Models
No Model Name Framework
1 medical_seg_cell_tf2 TensorFlow

Retinaface

This retinaface network is used to detect human face and face landmark. The input is a picture with some faces you would like to detect and the output contains face positions, scores, and landmarks of faces.

Figure 18: Retinaface Detection Example

The following table lists the retinaface detection models supported by the Vitis AI Library.

Table 21. Retinaface Detection Models
No Model Name Framework
1 retinaface Caffe

Face Quality

Th Face Quality library uses the face quality network to detect the quality score of a face. If a face is clear and a front face, the score is high. On the contrary, a blurry or side face will get a low score. The score range from 0 to 1. It also provide face landmark positions. The input is a face which is detected by face detect network and the output contains quality score and five landmark key points.

Figure 19: Face Quality Example

The following table lists the face quality models supported by the Vitis AI Library.

Table 22. Face Quality Models List
No Model Name Framework
1 face-quality Caffe
2 face-quality_pt PyTorch

Hourglass Pose Detection

The Hourglass library is used to detect posture of the human body. It is represented by an array of 16 joint points. Joint points are arranged in order:
0 - r ankle, 1 - r knee, 2 - r hip, 3 - l hip, 4 - l knee, 5 - l ankle,
6 - pelvis, 7 - thorax, 8 - upper neck, 9 - head top, 10 - r wrist,
11 - r elbow, 12 - r shoulder, 13 - l shoulder, 14 - l elbow, 15 - l wrist

This network can detect the posture of only one person in the input image. The input of the network is 256x256. The following image shows the result of hourglass detection.

Note: Use a square picture for input. If you need to detect pictures of other size ratios, use a network with the same input size ratio.

The following table lists the hourglass models supported by the Vitis AI library.

Table 23. Hourglass Models
No Model Name Framework
1 hourglass-pe_mpii Caffe

Pointpillars

Object detection in point clouds is an important aspect of many robotics applications such as autonomous driving. The pointpillars model is a novel deep network and encoder that can be trained end-to-end on LiDAR point clouds. It offers the best architecture for 3D object detection from LiDAR. The following image shows the result of a pointpillar test.

Figure 20: Pointpillars Test Example

The following table lists the pointpillars models supported by the Vitis AI Library.

Table 24. Pointpillar Models
No Model Name Framework
1 pointpillars_kitti_12000_0_pt PyTorch
2 pointpillars_kitti_12000_1_pt PyTorch

3D Segmentation

The 3D segmentation library can support the SalsaNext model, which is used for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet which has an encoder-decoder architecture, where the encoder unit has a set of ResNet blocks and the decoder unit combines upsampled features from the residual blocks.

The following table lists the3D segmentation models supported by the Vitis AI library.

Table 25. 3D Segmentation Models
No Model Name Framework
1 salsanext_pt PyTorch
2 salsanext_v2_pt PyTorch

Covid19 Segmentation

The Covid19 segmentation library can support the COVID-Net model which is a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest X-ray (CXR) images.

The following table lists the Covid19 segmentation models supported by the Vitis AI Library.

Table 26. Covid19 Segmentation Models
No Model Name Framework
1 FPN-resnet18_covid19-seg_pt PyTorch

Bayesian Crowd Counting

Bayesian Crowd Counting is a neural network that is used for crowd counting. The input is a picture with crowd individuals that you would like to estimate the number of them. The output is a number which is the estimated value of crowd counting with the density map of input image. The following image shows the result of Bayesian Crowd Counting test.

Figure 21: Bayesian Crowd Counting Test Example




The following table lists the BCC models supported by the Vitis AI library.

Table 27. BCC Models
No Model Name Framework
1 bcc_pt PyTorch

Production Recognition

PMG model can be used for fine-grained goods product recognition, for example, RP2K dataset. The model is Resnet18-based and the detailed model structure is shown in the picture below. On rp2k dataset, this model can achieve 96.4% top-1 float accuracy with 13.82M parameters and 2.28G Flops. Model final deployment and quantization top-1 accuracy are 96.19% and 96.18%, respectively.

Figure 22: Production Recognition Example

The following table lists the PMG models supported by the Vitis AI library.

Table 28. PMG Models
No Model Name Framework
1 pmg_pt PyTorch

SA-Gate Segmentation

SA-Gate is a neural network that is used for indoor segmentation. The input is a pair images which are RGB image and HHA map generated with depth map. The output is a heat map where each pixels is predicted with a semantic category, like chair, bed, usual object in indoor.

The following image shows the result of SA-Gate segmentation.

Figure 23: SA-Gate segmentation Test Example




The following table lists the SA-Gate models supported by the Vitis AI library.

Table 29. SA-Gate Segmentation Models
No Model Name Framework
1 SA_gate_pt PyTorch

RCAN Super Resolution

RCAN model is a super-resolution network. The corresponding high-resolution image is reconstructed from the low-resolution image. Based on the original image, the length and width are enlarged by two times. It has important application value in the fields of monitoring equipment, satellite images, and medical imaging. The following images show the result of RCAN. The image is still clear after zooming in.

Figure 24: Production Recognition Example


The following table lists the RCAN super resolution models supported by the Vitis AI Library.

Table 30. RCAN Super Resolution Models
No Model Name Framework
1 rcan_pruned_tf TensorFlow

PointPainting

For AD/ADAS system, sensor-fusion algorithms play a significant role in providing high-quality perception and increasing the safety level for driving. PointPainting provides a sensor-fusion framework that takes advantage of 2D semantic segmentation and 3D object detection models. Specifically, first a network is applied to the camera images for semantic segmentation. Based on the semantic information and calibration information (on camera and LiDAR), the LiDAR point clouds are projected to the images and fused with the semantic information to get the painted point clouds. Finally the painted point clouds are consumed by the 3D object detector to achieve better perception.

Figure 25: PointPainting Example

The following table lists the PointPainting models supported by the Vitis AI library.

Table 31. PointPainting Models
No Model Name Framework
1 pointpainting_nuscenes_40000_64_0_pt PyTorch
2 pointpainting_nuscenes_40000_64_1_pt PyTorch
3 semanticfpn_nuimage_576_320_pt PyTorch

Pointpillars_nuscenes

PointPillars is an efficient network for real-time 3D object detection on point cloud. Trained on the nuScenes dataset, this model gives 3D bounding boxes and speed prediction for ten classes (including some kinds of vehicles, pedestrian, barrier, and traffic cone) in the surround-view range. With multi-sweep point clouds as input, PointPillars can achieve higher accuracy of 3D object detection and speed estimation at the cost of increasing complexity of the pre-processing part.

Figure 26: Pointpillars_nuscenes Example

The following table lists the Pointpillars_nuscenes models supported by the Vitis AI library.

Table 32. Pointpillars_nuscenes Models
No Model Name Framework
1 pointpillars_nuscenes_40000_64_0_pt PyTorch
2 pointpillars_nuscenes_40000_64_1_pt PyTorch

Multi-task V3

Multi-task V3 aims to do different tasks in autonomous driving scenarios simultaneously while achieving good performance and efficiency. The tasks includes object detection, segmentation, lane detection, drivable area segmentation and depth estimation, which are important components of the autonomous driving perception module.

Figure 27: Multi-task V3 Example

The following table lists the multi-task v3 models supported by the Vitis AI library.

Table 33. Multi-task V3 Models
No Model Name Framework
1 multi_task_v3_pt PyTorch

Centerpoint

4D radar is a high-resolution long-range radar sensor that not only detects the distance, relative speed, and azimuth of objects, but also their height above the road level. Unlike LiDAR, it works well in all weather conditions, including fog and heavy rain. A state-of-the-art anchor-free 3D object detector CenterPoint is used. It is trained on the 4D radar data of the open dataset Astyx. Because the annotated samples are limited and the 4D radar point clouds are sparse, the 3D bounding box prediction is naturally not so good. It is observed that although vehicles near ego car could be correctly detected, but there are still some false positive predictions and some objects at longer distance could not be detected. 4D radar object detection and fusion with camera image could boost the performance by a large margin.

Centerpoint model is used for 4D radar detection and the following figure shows the result of Centerpoint model.

Figure 28: Centerpoint Example


The following table lists the Centerpoint models supported by the Vitis AI library.

Table 34. Centerpoint Models
No Model Name Framework
1 centerpoint_0_pt PyTorch
2 centerpoint_1_pt PyTorch

Depth Estimation

FADNet is a model used for depth estimation. It is a fast and accurate network for disparity estimation. It has three main features:

  1. It exploits efficient 2D-based correlation layers with stacked blocks to preserve fast computation.
  2. It combines the residual structures to make the deeper model easier to learn.
  3. It contains multi-scale predictions so as to exploit a multi-scale weight scheduling training technique to improve the accuracy.

The following images show the result of depth estimation. The first image is the left camera image input, the second image is the right camera image input and the third image is the running result of the FADNet model.

Figure 29: FADNet Depth Estimation Example






The following table lists the depth estimation models supported by the Vitis AI library.

Table 35. Depth Estimation Models
No Model Name Framework
1 FADNet_0_pt PyTorch
2 FADNet_1_pt PyTorch
3 FADNet_2_pt PyTorch

Model Samples

Currently, there are 37 model samples that are located in ~/Vitis-AI/demo/Vitis-AI-Library/samples. Each sample has the following four kinds of test samples:

  • test_jpeg_[model type]
  • test_video_[model type]
  • test_performance_[model type]
  • test_accuracy_[model type]

Take YOLOv3 as an example.

  1. Before you run the YOLOv3 detection example, you can choose one of the following yolov3 models to run:
    1. yolov3_bdd
    2. yolov3_voc
    3. yolov3_voc_tf
  2. Ensure that the following test programs exists:
    1. test_jpeg_yolov3
    2. test_video_yolov3
    3. test_performance_yolov3
    4. test_accuracy_yolov3_bdd
    5. test_accuracy_yolov3_adas_pruned_0_9
    6. test_accuracy_yolov3_voc
    7. test_accuracy_yolov3_voc_tf

    If the executable program does not exist, you have to cross compile it on the host and then copy the executable program to the target.

  3. To test the image data, execute the following command:
    #./test_jpeg_yolov3 yolov3_bdd sample_yolov3.jpg

    The result is printed on the terminal. Also, you can view the output image: sample_yolov3_result.jpg.

  4. To test the video data, execute the following command:
    #./test_video_yolov3 yolov3_bdd video_input.mp4 -t 8
  5. To test the model performance, execute the following command:
    #./test_performance_yolov3 yolov3_bdd test_performance_yolov3.list -t 8
    The result is printed on the terminal.
  6. To test the model accuracy, prepare your own image dataset, image list file and the ground truth of the images. Then execute the following command:
    #./test_accuracy_yolov3_bdd [image_list_file] [output_file]

After the output_file is generated, a script file is needed to automatically compare the results. Finally, the accuracy result can be obtained.