TensorFlow 2.x Version (vai_q_tensorflow2)
Installing vai_q_tensorflow2
You can install vai_q_tensorflow2 in the following two ways:
Install Using Docker Container
Vitis AI provides a Docker container for quantization tools, including vai_q_tensorflow. After running a container, activate the Conda environment vitis-ai-tensorflow2.
conda activate vitis-ai-tensorflow2
If there is a patch package, install the vitis-ai-tensorflow2 patch package inside the Docker container.
# [optional]
$ sudo env CONDA_PREFIX=/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/ PATH=/opt/vitis_ai/conda/bin:$PATH conda install patch_package.tar.bz2
Install from Source Code with the Wheel Package
vai_q_tensorflow2 is a fork of TensorFlow Model Optimization Toolkit. It is open source in Vitis_AI_Quantizer. To build vai_q_tensorflow2, run the following command:
$ sh build.sh
$ pip install pkgs/*.whl
Install from Source Code with the Conda Package
# CPU-only version
$ conda build vai_q_tensorflow2_cpu_feedstock --output-folder ./conda_pkg/
# GPU version
$ conda build vai_q_tensorflow2_gpu_feedstock --output-folder ./conda_pkg/
# Install conda package on your machine
$ conda install --use-local ./conda_pkg/linux-64/*.tar.bz2
Running vai_q_tensorflow2
The TensorFlow2 quantizer supports two different approaches to quantize a deep learning model:
- Post-training quantization (PTQ)
- PTQ is a technique to convert a pre-trained float model into a quantized model with little degradation in model accuracy. A representative dataset is needed to run a few batches of inference on the float model to obtain the distributions of the activations. This is also called quantize calibration.
- Quantization aware training (QAT)
- QAT models the quantization errors in both the forward and backward passes during model quantization. For QAT, starting from a float-point pre-trained model with good accuracy is recommended over starting from scratch.
Preparing the Float Model and Calibration Set
Before running vai_q_tensorflow2, prepare the float model and calibration set, including the files listed in the following table.
No. | Name | Description |
---|---|---|
1 | float model | Floating-point TensorFlow 2 models, either in h5 format or saved model format. |
2 | calibration dataset | A subset of the training dataset or validation dataset to represent the input data distribution, usually 100 to 1000 images are enough. |
Quantizing Using the vai_q_tensorflow2 API
The following code shows how to do post-training quantization with vai_q_tensorflow2 API. You can find a full example here.
float_model = tf.keras.models.load_model(‘float_model.h5’)
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(float_model)
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10)
- calib_dataset
"calib_dataset"
is used as a representative calibration dataset for calibration. You can use full or part of theeval_dataset
,train_dataset
, or other datasets.- calib_steps
- calib_steps is the total number of steps for calibration. It has a default value
of None. If "calib_dataset" is a
tf.data dataset
, generator, orkeras.utils.Sequence
instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs. - calib_batch_size
- calib_batch_size is the number of samples per batch for calibration. If the
"calib_dataset" is in the form of a dataset, generator, or
keras.utils.Sequence
instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of anumpy.array
object, the default batch size is 32.
vai_q_tensorflow2 Fast Finetuning
Generally, there is a small accuracy loss after quantization, but for some
networks such as MobileNets, the accuracy loss can be large. Fast finetuning uses the
AdaQuant algorithm to adjust the weights and quantize parameters layer-by-layer with the
unlabeled calibration dataset to improve accuracy for some models. It takes longer than
normal PTQ (still much shorter than QAT as the calib_dataset
is smaller than the training dataset). Fast finetuning is
disabled, by default. It can be turned on to improve the performance if you meet
accuracy issues. A recommended workflow is to first try PTQ without fast finetuning and
then try quantization with fast finetuning if the accuracy is not acceptable. QAT is
another method to improve the accuracy, but it takes more time and needs the training
dataset. You can activate fast finetuning by setting include_fast_ft=True
during post-training quantization.
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=None, calib_batch_size=None, include_fast_ft=True, fast_ft_epochs=10)
Here,
include_fast_ft
indicates whether to do fast finetuning or not.fast_ft_epochs
indicates the number of finetuning epochs for each layer.
Saving the Quantized Model
tf.keras
model object. You can save it by running the following
command:quantized_model.save('quantized_model.h5')
The generated quantized_model.h5 file can be fed to the vai_c_tensorflow compiler and then deployed on the DPU.
(Optional) Evaluating the Quantized Model
If you have scripts to evaluate float models, like the models in Xilinx Model Zoo, you can replace the float model file with the quantized model for evaluation. To support the customized quantize layers, the vitis_quantize module should be imported, for example:
from tensorflow_model_optimization.quantization.keras import vitis_quantize quantized_model = tf.keras.models.load_model('quantized_model.h5')
After that, evaluate the quantized model just as the float model, for example:
quantized_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics= keras.metrics.SparseTopKCategoricalAccuracy())
quantized_model.evaluate(eval_dataset)
(Optional) Dumping the Simulation Results
VitisQuantizer.dump_model
API of vai_q_tensorflow2
to dump the simulation results with the quantized
model.from tensorflow_model_optimization.quantization.keras import vitis_quantize quantized_model = keras.models.load_model('./quantized_model.h5') vitis_quantize.VitisQuantizer.dump_model(quantized_model,
dump_dataset,
output_dir='./dump_results')
Dump results are generated in ${dump_output_dir} after the command has successfully executed. Results for weights and activation of each layer are saved separately in the folder. For each quantized layer, results are saved in *.bin and *.txt formats. If the output of the layer is not quantized (such as for the softmax layer), the float activation results are saved in the *_float.bin and *_float.txt files. The / symbol is replaced by _ for simplicity. Examples for dumping results are shown in the following table.
Batch No. | Quantized | Layer Name | Saved files | ||
---|---|---|---|---|---|
Weights | Biases | Activation | |||
1 | Yes | resnet_v1_50/conv1 |
{output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_kernel.bin {output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_kernel.txt |
{output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_bias.bin {output_dir}/dump_results_weights/quant_resnet_v1_50_conv1_bias.txt |
{output_dir}/dump_results_0/quant_resnet_v1_50_conv1.bin {output_dir}/dump_results_0/quant_resnet_v1_50_conv1.txt |
2 | No | resnet_v1_50/softmax | N/A | N/A |
{output_dir}/dump_results_0/quant_resnet_v1_50_softmax_float.bin {output_dir}/dump_results_0/quant_resnet_v1_50_softmax_float.txt |
vai_q_tensorflow2 Quantization Aware Training
QAT is similar to the float model training/finetuning except that vai_q_tensorflow2 rewrites the float graph to convert it to a quantized model before the training starts. The typical workflow is as follows. You can find a complete example here.
-
Preparing the float model, dataset, and training scripts:
Before QAT, prepare the following files:
Table 3. Input Files for vai_q_tensorflow2 QAT No. Name Description 1 Float model Floating-point model files to start from. Can be omitted if training from scratch. 2 Dataset The training dataset with labels. 3 Training Scripts The Python scripts to run float train/finetuning of the model. -
(Optional) Evaluate the float model.
Evaluate the float model first before QAT to check the correctness of the scripts and dataset. The accuracy and loss values of the float checkpoint can also be a baseline for QAT.
-
Modify the training scripts and run QAT.
Use the vai_q_tensorflow2 API,
VitisQuantizer.get_qat_model
, to convert the model to a quantized model and then proceed to training/finetuning with it. The following is an example:model = tf.keras.models.load_model(‘float_model.h5’) # *Call Vai_q_tensorflow2 api to create the quantize training model from tensorflow_model_optimization.quantization.keras import vitis_quantize quantizer = vitis_quantize.VitisQuantizer(model, quantize_strategy='8bit_tqt') qat_model = quantizer.get_qat_model( init_quant=True, # Do init PTQ quantization will help us to get a better initial state for the quantizers, especially for `8bit_tqt` strategy. Must be used together with calib_dataset calib_dataset=calib_dataset) # Then run the training process with this qat_model to get the quantize finetuned model. # Compile the model model.compile( optimizer= RMSprop(learning_rate=lr_schedule), loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=keras.metrics.SparseTopKCategoricalAccuracy()) # Start the training/finetuning model.fit(train_dataset)
Note: Vitis AI 1.4 supports 8bit_tqt. It uses trained threshold in quantizers and may result in better results for QAT. By default, the Straight-Through-Estimator is used. 8bit_tqt strategy should only be used in QAT with'init_quant=True'
to get best performance. Initialization with PTQ quantization can generate a better initial state for quantizer parameters, especially for 8bit_tqt. Otherwise, the training may not converge. -
Save the model.
Call
model.save()
to save the trained model or use callbacks inmodel.fit()
to save the model periodically. For example:# save model manually model.save(‘trained_model.h5’) # save the model periodically during fit using callbacks model.fit( train_dataset, callbacks = [ keras.callbacks.ModelCheckpoint( filepath=’./quantize_train/’ save_best_only=True, monitor="sparse_categorical_accuracy", verbose=1, )])
-
Convert to deployable quantized model.
Modify the trained/finetuned model to meet the compiler requirements. For example, if "train_with_bn" is set to TRUE, it means that the bn layers and the dropout layers are not folded during training and must be folded before deployment. Some of the quantizer parameters may vary during training and exceed the compiler permitted ranges. These must be corrected before deployment.
A
get_deploy_model()
function is provided to perform these conversions and generate a deployable model as shown in the following example.quantized_model = vitis_quantizer.get_deploy_model(model) quantized_model.save('quantized_model.h5')
-
(Optional) Evaluate the quantized model
Call
model.evaluate()
on theeval_dataset
to evaluate the quantized model, just like evaluation of the float model.from tensorflow_model_optimization.quantization.keras import vitis_quantize quantized_model = tf.keras.models.load_model('quantized_model.h5') quantized_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics= keras.metrics.SparseTopKCategoricalAccuracy()) quantized_model.evaluate(eval_dataset)
Note: Use the float model training and finetuning before proceeding to QAT.
Quantizing with Custom Layers
vai_q_tensorflow2 provides interfaces to load the custom layers that are available in some models. For example:
class MyCustomLayer(keras.layers.Layer):
def __init__(self, units=32, **kwargs):
super(MyLayer, self).__init__(kwargs)
self.units = units
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="random_normal",
trainable=True,
name='w')
self.b = self.add_weight(
shape=(self.units,), initializer="zeros", trainable=True, name='b')
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
def get_config(self):
base_config = super(MyLayer, self).get_config()
config = {"units": self.units}
return dict(list(base_config.items()) + list(config.items()))
# Here is a float model with custom layer "MyCustomLayer", use custom_objects argument in tf.keras.models.load_model to load it.
float_model = tf.keras.models.load_model(‘float_model.h5’, custom_objects={'MyCustomLayer': MyCustomLayer})
Here, a float model contains a custom layer named "MyCustomLayer"
. To load it into memory, use the custom_objects argument
in the tf.keras.model.load_model
API. Similarly, the
VitisQuantizer
class provides the 'custom_objects'
argument to handle the custom layers. The
following code is an example.
from tensorflow_model_optimization.quantization.keras import vitis_quantize
# Register the custom layer to VitisQuantizer by custom_objects argument.
quantizer = vitis_quantize.VitisQuantizer(float_model, custom_objects={'MyCustomLayer': MyCustomLayer})
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10)
You can find a complete example here.
With the default quantize strategy, the custom layers are not quantized
and continue to exist as a float model during the quantization as they are not in the
list of supported APIs of vai_q_tensorflow2. An interface named 'custom_quantize_strategy'
is provided for advanced users to build custom
quantize strategies to run quantize experiments. The custom quantize strategy is a Dict
object containing the quantize strategy items as a JSON file of the Dict.
The default quantize strategy provides an example of the quantize strategy. The custom quantize strategy follows the same format. However, the same item in the custom quantize strategy will override the one in the default strategy, but new items will be added to the quantize strategy.
With this feature, you can quantize the 'MyCustomLayer'
layer from the previous example:
# Define quantizer with custom quantize strategy, which quantizes w,b and outputs 0 of MyCustomLayer objects.
my_quantize_strategy = {
"quantize_registry_config": {
"layer_quantize_config": [{
"layer_type": "__main__.MyCustomLayer",
"quantizable_weights": ["w", "b"],
"weight_quantizers": [
"quantizer_type": "LastValueQuantPosQuantizer","quantizer_params": {"bit_width": 8, "method": 1, "round_mode": 0},
"quantizer_type": "LastValueQuantPosQuantizer", "quantizer_params": {"bit_width": 8, "method": 1, "round_mode": 0}
],
"quantizable_outputs": ["0"],
"output_quantizers": [
"quantizer_type": "LastValueQuantPosQuantizer", "quantizer_params": {"bit_width": 8, "method": 1, "round_mode": 1}
]
}]
}
}
quantizer = vitis_quantize.VitisQuantizer(model, custom_objects={'MyLayer': MyLayer}, custom_quantize_strategy=my_quantize_strategy)
# The following quantization process are all the same as before, here we do normal PTQ as an example
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_step=100, calib_batch_size=10)
vai_q_tensorflow2 Supported Operations and APIs
The following table lists the supported operations and APIs for vai_q_tensorflow2.
Layer Types | Supported Layers | Description |
---|---|---|
Core | tf.keras.layers.InputLayer | |
Core | tf.keras.layers.Dense | |
Core | tf.keras.layers.Activation | If 'activation' is 'relu' or
'linear',
will be quantized. If 'activation' is 'sigmoid' or 'swish', will be converted to hard-sigmoid or hard-swish and then be quantized. Otherwise will not be quantized. |
Convolution | tf.keras.layers.Conv2D | |
Convolution | tf.keras.layers.DepthwiseConv2D | |
Convolution | tf.keras.layers.Conv2DTranspose | |
Pooling | tf.keras.layers.AveragePooling2D | |
Pooling | tf.keras.layers.MaxPooling2D | |
Pooling | tf.keras.layers.GlobalAveragePooling | |
Normalization | tf.keras.layers.BatchNormalization | By default, BatchNormalization layers are fused with
the previous convolution layers. If they cannot be fused, they are
converted to depthwise convolutions. In the QAT mode, BatchNormalization layers are pseudo fused if train_with_bn is set to TRUE. They are fused when the get_deploy_model function is called. |
Regularization | tf.keras.layers.Dropout | By default, the dropout layers are removed. In the QAT mode, dropout layers are retained if remove_dropout is set FALSE. It is removed when the get_deploy_model function is called. |
Reshaping | tf.keras.layers.Reshape | |
Reshaping | tf.keras.layers.Flatten | |
Reshaping | tf.keras.UpSampling2D | |
Reshaping | tf.keras.ZeroPadding2D | |
Merging | tf.keras.layers.Concatenate | |
Merging | tf.keras.layers.Add | |
Merging | tf.keras.layers.Muliply | |
Activation | tf.keras.layers.ReLU | |
Activation | tf.keras.layers.Softmax | The input for the Softmax layer is quantized. It can run on the standalone Softmax IP for acceleration. |
Activation | tf.keras.layers.LeakyReLU | Only 'alpha'=0.1 is supported on the DPU. For other values, the model is not quantized and mapped to the CPU. |
vai_q_tensorflow2 Usage
vitis_quantize.VitisQuantizer
The construction function of class VitisQuantizer
.
vitis_quantize.VitisQuantizer(
float_model,
quantize_strategy='8bit',
custom_quantize_strategy=None,
custom_objects={})
Arguments
- model
- A
tf.keras.Model
object, containing the configurations for quantization. - quantize_strategy
- A string object of the quantize strategy type. Available values are
8bit and 8bit_tqt.
8bit is the default strategy that uses the
Straight-Through-Estimator. 8bit_tqt is a new strategy
introduced in Vitis AI 1.4 which uses
Trained-Threshold in quantizers and may better results for QAT. Note: 8bit_tqt strategy should only be used in QAT and be used together with
init_quant=True
to get the best performance. - custom_quantize_strategy
- A string object, the file path of custom quantize strategy JSON file.
- custom_objects
- A Dict object, mapping names (strings) to custom classes or functions.
vitis_quantize.VitisQuantizer.quantize_model
This function performs the post-training quantization (PTQ) of the float model, including model optimization, weights quantization, and activation quantize calibration.
vitis_quantize.VitisQuantizer.quantize_model(
calib_dataset=None,
calib_batch_size=None,
calib_steps=None,
verbose=0,
fold_conv_bn=True,
fold_bn=True,
replace_sigmoid=True,
replace_relu6=True,
include_cle=True,
cle_steps=10,
forced_cle=False,
include_fast_ft=False,
fast_ft_epochs=10)
Arguments
- calib_dataset
- A
tf.data.Dataset
,keras.utils.Sequence
, ornp.numpy
object, the representative dataset for calibration. You can use full or part of eval_dataset, train_dataset, or other datasets as calib_dataset. - calib_steps
- An int object, the total number of steps for calibration. Ignored with the
default value of None. If "calib_dataset" is a
tf.data
dataset, generator, orkeras.utils.Sequence
instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs. - calib_batch_size
- An int object, the number of samples per batch for calibration. If the
"calib_dataset" is in the form of a dataset, generator, or
keras.utils.Sequence
instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of anumpy.array
object, the default batch size is 32. - fold_conv_bn
- A
bool
object, whether to fold the batch norm layers into previousConv2D/DepthwiseConv2D/TransposeConv2D/Dense
layers. - fold_bn
- A
bool
object whether to convert the standalone batch norm layer into DepthwiseConv2D layers. - replace_sigmoid
- A
bool
object, whether to replace the Activation(activation='sigmoid') layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU. - replace_relu6
- A
bool
object, whether to replace the ReLU6 layers with ReLU layers. - include_cle
- A
bool
object, whether to do Cross-Layer Equalization before quantization. - cle_steps
- A
int
object, the iteration steps to do Cross-Layer Equalization. - forced_cle
- A
bool
object, whether to do forced Cross-Layer Equalization for ReLU6 layers. - include_fast_ft
- A
bool
object, whether to do fast fine-tuning or not. Fast fine-tuning adjust the weights layer by layer with calibration dataset and may get better accuracy for some models. Fast fine-tuning is disabled by default. It takes longer than normal PTQ (still much shorter than QAT as calib_dataset is much smaller than the training dataset). Turn on to improve the performance if you meet accuracy issues. - fast_ft_epochs
- An int object, the iteration epochs to do fast fine-tuning for each layer.
vitis_quantize.VitisQuantizer.dump_model
This function dumps the simulation results of the quantized model, including weights and activation results.
vitis_quantize.VitisQuantizer.dump_model(
model,
dataset=None,
output_dir=’./dump_results’,
dump_float=False,
weights_only=False)
Arguments
- model
- A
tf.keras.Model
object, the quantized model to dump. - dataset
- A
tf.data.Dataset
,keras.utils.Sequence
ornp.numpy
object, the dataset used to dump, not needed if weights_only is set toTrue
. - output_dir
- A
string
object, the directory to save the dump results. - weights_only
- A
bool
object, set toTrue
to only dump the weights, set toFalse
will also dump the activation results.
vitis_quantize.VitisQuantizer.get_qat_model
This function gets the float model for QAT.
vitis_quantize.VitisQuantizer.get_qat_model(
init_quant=False,
calib_dataset=None,
calib_batch_size=None,
calib_steps=None,
train_with_bn=False,
freeze_bn_delay=-1,
replace_sigmoid=True,
replace_relu6=True,
include_cle=True,
cle_steps=10,
forced_cle=False)
Arguments
- init_quant
- A
bool
object to notify whether or not to run initial quantization before QAT. Running an initial PTQ quantization yields an improved initial state for the quantizer parameters, especially for 8bit_tqt strategy. Otherwise, the training may not converge. - calib_dataset
- A
tf.data.Dataset
,keras.utils.Sequence
ornp.numpy
object, the representative dataset for calibration. Must be set when "init_quant" is setTrue
. You can use full or part of eval_dataset, train_dataset or other datasets as calib_dataset. - calib_steps
- An int object, the total number of steps for initial PTQ.
Ignored with the default value of None. If "calib_dataset" is a
tf.data dataset
, generator orkeras.utils.Sequence
instance and steps is None, calibration will run until the dataset is exhausted. This argument is not supported with array inputs. - calib_batch_size
- An int object, the number of samples per batch for initial
PTQ. If the "calib_dataset" is in the form of a dataset, generator or
keras.utils.Sequence
instances, the batch size is controlled by the dataset itself. If the "calib_dataset" is in the form of anumpy.array
object, the default batch size is 32. - train_with_bn
- A
bool
object, whether to keep bn layers during QAT. - freeze_bn_delay
- An int object, the train steps before freezing the bn parameters. Default value is -1, which means never do bn freezing.
- replace_sigmoid
- A
bool
object, whether to replace the Activation(activation='sigmoid') layers into hard sigmoid layers and do quantization. If not, the sigmoid layers will be left unquantized and will be scheduled on CPU. - replace_relu6
- A
bool
object, whether to replace the Relu6 layers with Relu layers. - include_cle
- A
bool
object, whether to do Cross Layer Equalization before quantization. - cle_steps
- An int object, the iteration steps to do Cross Layer Equalization.
- forced_cle
- A
bool
object, whether to do forced Cross Layer Equalization for relu6 layers.
vitis_quantize.VitisQuantizer.get_deploy_model
This function converts the QAT model and generates the deployable model. The results can be fed into the vai_c_tensorflow compiler.
vitis_quantize.VitisQuantizer.get_deploy_model(model)
Arguments
- model
- A
tf.keras.Model
object, the QAT model to deploy.
Examples
Quantize
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model)
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset)
Evaluate the Quantized Model
quantized_model.compile(loss=your_loss, metrics=your_metrics)
quantized_model.evaluate(eval_dataset)
Load the Quantized Model
from tensorflow_model_optimization.quantization.keras import vitis_quantize
with vitis_quantize.quantize_scope():
model = keras.models.load_model('./quantized_model.h5')
Dump the Quantized Model
from tensorflow_model_optimization.quantization.keras import vitis_quantize
with vitis_quantize.quantize_scope():
quantized_model = keras.models.load_model('./quantized_model.h5')
vitis_quantize.VitisQuantizer.dump_model(quantized_model, dump_dataset)