TensorFlow Version - vai_p_tensorflow

Exporting an Inference Graph

TensorFlow Model

First, build a TensorFlow graph for training and evaluation. Each part must be written in a separate script. If you have trained a baseline model before and you have the training codes, then you only need to prepare the codes for evaluation.

The evaluation script must contain a function named model_fn that creates all the needed nodes from input to output. The function should return a dictionary that maps the names of output nodes to their operations or a tf.estimator.Estimator. For example, if your network is an image classifier, the returned dictionary usually includes operations to calculate top-1 and top-5 accuracy as shown in the following snippet:

def model_fn():
  # graph definition codes here
  # ……
return {
      'top-1': slim.metrics.streaming_accuracy(predictions, labels),
      'top-5': slim.metrics.streaming_recall_at_k(logits, org_labels, 5)
  }

Or, if you use TensorFlow Estimator API to train and evaluate your network, your model_fn must return an instance of the tf.estimator. At the same time, you also need to provide a function called eval_input_fn, which the Estimator uses to get the data used in the evaluation.

def cnn_model_fn(features, labels, mode):
  # codes for building graph here
…
eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

def model_fn():
  return tf.estimator.Estimator(
      model_fn=cnn_model_fn, model_dir="./models/train/")

mnist = tf.contrib.learn.datasets.load_dataset("mnist")
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

def eval_input_fn():
  return tf.estimator.inputs.numpy_input_fn(
      x={"x": eval_data},
      y=eval_labels,
      num_epochs=1,
      shuffle=False)

The evaluation codes are used to export an inference GraphDef file and evaluate network performance during pruning.

To export a GraphDef proto file, use the following code:

import tensorflow as tf
from google.protobuf import text_format
from tensorflow.python.platform import gfile

with tf.Graph().as_default() as graph:
# your graph definition here
# ……
    graph_def = graph.as_graph_def()
    with gfile.GFile(‘inference_graph.pbtxt’, 'w') as f:
      f.write(text_format.MessageToString(graph_def))

Keras Model

For the Keras model, there is no explicit graph definition. You must get a GraphDef object first and then export it. An example of tf.keras pre-defined ResNet50 is given here:

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.python.framework import graph_util

tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights=None,
    include_top=True,
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000)
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy())
graph_def = K.get_session().graph.as_graph_def()

# "probs/Softmax": Output node of ResNet50 graph.
graph_def = graph_util.extract_sub_graph(graph_def, ["probs/Softmax"])
tf.train.write_graph(graph_def,
    "./",
    "inference_graph.pbtxt",
    as_text=True)

Preparing a Baseline Model

TensorFlow Model

TensorFlow saves variables in binary checkpoint files that map variable names to tensor values. vai_p_tensorflow takes a checkpoint file as input to load trained weights. The tf.train.Saver provides methods to specify paths for the checkpoint files to write to or read from.

Code snippet to call the tf.train.Saver.save method to save variables to checkpoint files:

with tf.Session() as sess:
# your graph building codes here
# ……
sess.run(train_op)

# Save the variables to disk.
save_path = saver.save(sess, "/tmp/model.ckpt")
print("Model saved in path: %s" % save_path)

The saved checkpoint files look like this:

model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta

Keras Model

tf.keras allows model weights to be saved in two formats: HDF5 and TensorFlow format. Currently only TensorFlow format is supported by the tool. If the model weights have been saved in HDF5, then you have to convert it to TensorFlow format.

import tensorflow as tf
tf.keras.backend.set_learning_phase(0)
model = tf.keras.applications.ResNet50(weights="imagenet",
    include_top=True,
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000)
model.save_weights("model.ckpt", save_format='tf')

The converted checkpoint files look like this:

model.ckpt.data-00000-of-00001
model.ckpt.data-00001-of-00002
model.ckpt.index

Performing Model Analysis

Before conducting model pruning, you need to analyze the model first. The main purpose of this process is to find a suitable pruning strategy to prune the model.

To run model analysis, you need to provide a Python script containing the functions that evaluate model performance. Assuming that your script is eval_model.py, you must provide the required functions in one of three ways:

A function named model_fn() that returns a Python dict of metric ops:

def model_fn():
  tf.logging.set_verbosity(tf.logging.INFO)
  img, labels = get_one_shot_test_data(TEST_BATCH)

  logits = net_fn(img, is_training=False)
  predictions = tf.argmax(logits, 1)
  labels = tf.argmax(labels, 1)
  eval_metric_ops = {
      'accuracy': tf.metrics.accuracy(labels, predictions),
      'recall_5': tf.metrics.recall_at_k(labels, logits, 5)
  }
  return eval_metric_ops

A function named model_fn() that returns an instance of tf.estimator.Estimator and a function named eval_input_fn() that feeds test data to the estimator:

def model_fn():
  return tf.estimator.Estimator(
      model_fn=cnn_model_fn, model_dir="./models/train/")

def eval_input_fn():
  return tf.estimator.inputs.numpy_input_fn(
      x={"x": eval_data},
      y=eval_labels,
      num_epochs=1,
      shuffle=False)

A function named evaluate() that takes a single parameter as argument that returns the metric score:

def evaluate(checkpoint_path):
  with tf.Graph().as_default():
    net = ConvNet(False)
    net.build(test_only=True)
    score = net.evaluate(checkpoint_path)
    return score

If you are using tf.keras API, this is the recommended way:

import tensorflow as tf

def evaluate(checkpoint_path):
net = tf.keras.applications.ResNet50(weights=None,
include_top=True,
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
net.load_weights(checkpoint_path)
metric_top_5 = tf.keras.metrics.SparseTopKCategoricalAccuracy()
accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
loss = tf.keras.losses.SparseCategoricalCrossentropy()

# eval_data: validation dataset. You can refer to ‘tf.keras.Model.evaluate’ method to generate your validation dataset. 
# EVAL_NUM: the number of validation dataset
res = net.evaluate(eval_data,
steps=EVAL_NUM/batch_size,
workers=16,
verbose=1)
eval_metric_ops = {'Recall_5': res[-1]}
return eval_metric_ops

Suppose you use the first way to write the script, the following snippet shows how to call vai_p_tensorflow to perform model analysis.

vai_p_tensorflow \
  --action=ana \
  --input_graph=inference_graph.pbtxt \
  --input_ckpt=model.ckpt \
  --eval_fn_path=eval_model.py \
  --target="recall_5" \
  --max_num_batches=500 \
  --workspace:/tmp \
  --exclude="conv node names that excluded from pruning" \
  --output_nodes="output node names of the network"

Following are the arguments in this command. See vai_p_tensorflow Usage for a full list of options.

--action: The action to perform.
--input_graph: A GraphDef proto file that represents the inference graph of the network.
--input_ckpt: The path to a checkpoint to use for pruning.
--eval_fn_path: The path to a Python script defining an evaluation graph.
--target: The target score that evaluates the performance of the network. If there is more than one score in the network, you should choose the one that is most important.
--max_num_batches: The number of batches to run in the evaluation phase. This parameter affects the time taken to analyze the model. The larger this value, the more time required for the analysis and the more accurate the analysis is. The maximum value of this parameter is the size of the validation set or the batch_size, that is, all the data in the validation set is used for evaluation.
--workspace: Directory for saving output files.
--exclude: Convolution nodes excluded from pruning.
--output_nodes: Output nodes of the inference graph.

Starting Pruning Loop

Once the command ana has ended, you can start pruning the model. The command prune is very similar to command ana, requiring the same configuration file:

vai_p_tensorflow \
  --action=prune \
  --input_graph=inference_graph.pbtxt \
  --input_ckpt=model.ckpt \
  --output_graph=sparse_graph.pbtxt \
  --output_ckpt=sparse.ckpt \
  --workspace=/home/deephi/tf_models/research/slim \
  --sparsity=0.1 \
  --exclude="conv node names that excluded from pruning" \
  --output_nodes="output node names of the network"

There is one new argument in this command:

--sparsity: The sparsity of network after pruning. It it a value between 0 and 1. The larger the value, the sparser the model is after pruning.

When the prune command finishes, the vai_p_tensorflow outputs FLOPs of network before and after pruning.

Finetuning the Pruned Model

The performance of the pruned model has a certain degree of decline and you need to fine-tune it to improve its performance. Finetuning a pruned model is basically the same as training model from scratch, except that the hype-parameters, such as the initial learning rate and the learning rate decay type, are different.

When pruning and fine-tuning is done, an iteration of pruning is completed. In general, to achieve higher pruning rate without significant loss of performance, the model needs to be pruned several times. After every iteration of "prune-finetune", you need to make two changes to the commands before you run the next pruning:

Modify the --input_ckpt flag to a checkpoint file generated in previous fine-tuning process.
Increase the value of --sparsity flag to prune more in the next iteration.

Generating Dense Checkpoints

After a few iterations of pruning, you get a model that is smaller than its original size. To get a final model, perform a transformation of the model.

vai_p_tensorflow \
  --action=transform \
  --input_ckpt=model.ckpt-10000 \
  --output_ckpt=dense.ckpt

Transformation is only required after all iterations of pruning are completed. Do not run the transform command between each iteration of pruning.

Freezing the Graph

Now, you have a GraphDef file containing the architecture of the pruned model and a checkpoint file saving trained weights. For prediction or quantization, merge these two files into a single pb file.

Freeze the graph using the following command:

freeze_graph \
    --input_graph=sparse_graph.pbtxt \
    --input_checkpoint=dense.ckpt \
    --input_binary=false \
    --output_graph=frozen.pb \
    --output_node_names=”vgg_16/fc8/squeezed”

After completing all the previous steps, you should get the final output file, frozen.pb, of the pruning. This file can be used for prediction or quantization. To get the FLOPs of the frozen graph, run the following command:

vai_p_tensorflow --action=flops --input_graph=frozen.pb --input_nodes=input --input_node_shapes=1,224,224,3 --output_nodes=vgg_16/fc8/squeezed

vai_p_tensorflow Usage

The following arguments are available when running vai_p_tensorflow:

Table 1. vai_p_tensorflow Arguments
Argument	Type	Action	Default	Description
action	string	-	""	Which action to run. Valid actions include ‘ana', 'prune', 'transform', and 'flops'.
workspace	string	[‘ana’, ‘prune’]	""	Directory for saving output files.
input_graph	string	[‘ana’, ‘prune’, ‘flops’]	""	Path of a GraphDef protobuf file that defines the network’s architecture.
input_ckpt	string	[‘ana’, ‘prune’, ‘transform’]	""	Path of a checkpoint file. It is the prefix of filenames created for the checkpoint.
eval_fn_path	string	[‘ana’]	""	A Python file path used for model evaluation.
target	string	[‘ana’]	""	The output node name that indicates the performance of the model.
max_num_batches	int	[‘ana’]	None	Maximum number of batches to evaluate.By default, use all.
output_graph	string	[‘prune’]	""	Path of a GraphDef protobuf file for saving the pruned network.
output_ckpt	string	[‘prune’, ‘transform’’]	""	Path of a checkpoint file for saving weights.
gpu	string	[‘ana’]	""	GPU device IDs to use separated by ‘,’.
sparsity	float	[‘prune’]	None	The desired sparsity of network after pruning.
exclude	repeated	[‘ana’, ‘prune’]	None	Convolution nodes excluded from pruning.
input_nodes	repeated	[‘flops’]	None	Input nodes of the inference graph.
input_node_shapes	repeated	[‘flops’]	None	Shape of input nodes.
output_nodes	repeated	[‘ana’, ‘prune’, ‘flops’]	None	Output nodes of the inference graph.
channel_batch	int	[‘prune’]	2	The number of output channels is a multiple of this value after pruning.