Pruning

Pruning Overview

Most neural networks are typically over-parameterized, with significant redundancy to achieve a certain accuracy. “Pruning” is the process of eliminating redundant weights while keeping the accuracy loss as low as possible.

Figure 1: Pruning Methods

The simplest form of pruning is called “fine-grained pruning” and results in sparse weight matrices. VAI pruner employs the “coarse-grained pruning” method, which eliminates neurons that do not contribute significantly to the network’s accuracy. For convolutional layers, “coarse-grained pruning” prunes the entire 3D kernel, so it is also called channel pruning.

Pruning will always reduce the accuracy of the original model. Retraining adjusts the remaining weights to recover accuracy.

Iterative Pruning

VAI pruner is designed to reduce the number of model parameters while minimizing the accuracy loss. This is done in an iterative process as shown in the following figure. Pruning results in accuracy loss and retraining (finetuning) recovers accuracy. Pruning followed by retraining forms one iteration. In the first iteration of pruning, the input model is the baseline model, and it will be pruned and fine-tuned. In subsequent iterations, the fine-tuned model obtained from previous iteration is used to prune again. Such process is usually repeated several times until a desired sparse model is obtained. Note that the reduction parameter is gradually increased in every iteration to help better recover accuracy during the finetuning stage. Pruning cannot be done to a smaller size at once. This is because, once too many parameters are removed from the model, the performance of the model will be reduced obviously, and it is challenging to restore model.

IMPORTANT: The reduction parameter is gradually increased in every iteration, to help better recover accuracy during the finetune stage.

Following the process of iterative pruning, higher pruning rates can be achieved without significant loss of model performance.

Figure 2: Iterative Process of Pruning

Four primary tasks in VAI pruner are as follows:

  1. Analysis (ana): Perform a sensitivity analysis on the model to determine the optimal pruning strategy.
  2. Pruning (prune): Reduce the number of computations in the input model.
  3. Fine-tuning (finetune): Retrain the pruned model to recover accuracy.
  4. Transformation (transform): Generate a dense model with reduced weights.

Generally follow these steps to prune a model. The steps are also shown in Figure 4.

  1. Analyze the original baseline model.
  2. Prune the input model.
  3. Finetune the pruned model.
  4. Repeat Step 2 and 3 several times.
  5. Transform the pruned sparse model to a final dense model.
Figure 3: Pruning Workflow

Guidelines for Better Pruning Results

The following are a list of suggestions for better pruning results, higher pruning rate, and smaller accuracy loss.

  1. Use as much data as possible to perform model analysis. Ideally, you should use all the data in the validation dataset, which is quite time consuming. You can also use partial validation set data, but you need to make sure at least half of the data set is used.
  2. During the finetuning stage, experiment with a few parameters, including the initial learning rate, the learning rate decay policy. Use the best result as the input to the next round of pruning.
  3. The data used in fine-tuning should be the same as the data used to train the baseline.
  4. If the accuracy does not improve sufficiently after several finetuning experiments, try reducing the pruning rate and then re-run pruning and finetuning.