模型压缩工具Distiller-INT8量化

1.distiller工具介绍

Distiller是一个开源的Python软件包，用于神经网络压缩研究。网络压缩可以减少神经网络的内存占用，提高推理速度并节省能源。Distiller提供了一个PyTorch环境，用于对压缩算法进行原型设计和分析。

主要功能：

A framework for integrating pruning, regularization and quantization algorithms. 模型剪枝，正则化以及量化
A set of tools for analyzing and evaluating compression performance. 分析和评估压缩性能
Example implementations of state-of-the-art compression algorithms. 一些压缩实例

2.distiller安装

github地址：https://github.com/NervanaSystems/distiller 支持PyTorch1.x（在Pytorch 1.1+PyThon3.6上测试成功）。

使用手册：https://nervanasystems.github.io/distiller/ 具体安装见github。

3.distiller量化模块使用

distiller量化模块的使用以及实例代码位于distiller/quantization和examples/classifier_compression下。

量化模块主要包含的功能如下：

Automatic mechanism to transform existing models to quantized versions, with customizable bit-width configuration for different layers. No need to re-write the model for different quantization methods.
可对不同的模型进行一般位数（1bit~8bit, 16bit, fp16）量化
Post-training quantization of trained full-precision models, dynamic and static (statistics-based)
对已经训练好的PyTorch模型文件直接进行压缩，并且能达到无精度损失，支持静态和动态量化。
Support for quantization-aware training in the loop
可在训练中对模型进行量化

下面主要对量化模型的使用进行简单说明，分别对工具自带模型和自己的模型进行量化。

（1）distiller自带量化实例测试

distiller自带一些测试实例如ResNet20+cifar-10, ResNet50+ImageNet等

下面是对ResNet+cifar-10的测试：

找到compress_classifier.py文件，如下：

在此文件目录下运行命令行环境，输入以下指令：

$ python3 compress_classifier.py -a resnet20_cifar ../../../data.cifar10 --resume ../ssl/checkpoints/checkpoint_trained_dense.pth.tar --quantize-eval --evaluate

参数解释：-a 表示模型名称（这里是工具自带的模型名称，其他的如resnet32_cifar, resnet44_cifar, resnet56_cifar等等 cifar的模型代码文件位于distiller/models/cifar10/resnet_cifar.py）

../../../data.cifar10是数据集路径

--resume 表示保存的模型的路径（一般是FP32即将被量化的模型），此参数现在已经更改为--resum-from

--quantize-eval 表示是否对模型量化并进行保存（只有当--evaluate被设置时才有用）

--evaluate 表示是否在测试集上进行测试

具体的其他参数参看distiller/apputils/image_classifier.py文件和distiller/quantization/range_linear.py文件以及github上参数解释。

运行时会对模型进行压缩，然后在测试集上测试，打印出top1和top5以及loss，运行结束后量化模型会保存在logs下。

（2）使用distiller量化自己的模型（PostTraining）

注意：distiller对模型文件的撰写有要求，如果不按照他的要求，那么可能不能量化模型，具体见https://github.com/NervanaSystems/distiller/issues/316

下面给出了量化自己模型的代码，在标红处调用prepare_model，此函数可以传递一些参数，比如需要量化的位数，量化的模式等。

代码如下：

import logging
from distiller.data_loggers import *
import DataLoader
import distiller.apputils.image_classifier as ic
import torch
from distiller.quantization.range_linear import PostTrainLinearQuantizer
from opt import args
import distiller.apputils as apputils
from distiller.apputils.image_classifier import test
import torch.nn as nn
import os
from your_model import your_modelmodel_path = ‘你自己的模型’model = your_model()
model.load_state_dict(model_path)model = model.cuda()
model.eval()quantizer = PostTrainLinearQuantizer(model)quantizer.prepare_model(torch.rand([你的模型输入的大小如[1, 3, 32, 32]]))apputils.save_checkpoint(0, 'my_model', model, optimizer=None,
name='quantized',
dir='quantization_model')

（3）使用distiller量化自己的模型（AwareTraing）

待写

4.一些问题

为什么模型被量化之后模型大小没有变小，速度反而降低？
参看https://github.com/NervanaSystems/distiller/wiki/Frequently-Asked-Questions-(FAQ) 中quantization部分
参看https://github.com/NervanaSystems/distiller/issues/159
distiller中支持的量化操作类型有哪些？
目前的版本支持conv(conv2d, conv3d)，relu， fc
运行上面3(1)时出现错误 Process finished with exit code 132 (interrupted by signal 4: SIGILL)？
原因：模型被放到CPU中。找到distiller/apputils/image_classifier.py文件，找到evaluate_model函数，将model.cpu()改为model.cuda()