Nets compression and speedup презентация

Содержание

Deep neural networks compression. Motivation Neural net architectures and size Too much for some types of devices (mobile phones, embedded). Some

Слайд 1
Deep neural networks compression

Alexander Chigorin
Head of research projects
VisionLabs

a.chigorin@visionlabs.ru


Слайд 2
Deep neural networks compression. Motivation
Neural net architectures and size








Too much for

some types of devices (mobile phones, embedded).

Some models can be compressed up to 50x without loss of accuracy.

Слайд 3
Deep neural networks compression. Overview

Methods to review:

Learning both Weights and Connections

for Efficient Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights


Слайд 4
Networks Pruning


Слайд 5
Networks Pruning. Generic idea
2


Слайд 6
Networks Pruning. More details
Initial weights
Prune small abs values
Retrain other weights
Until accuracy

is acceptable

Слайд 7
Networks Pruning. Results
2
9-13x overall compression


Слайд 8
Networks Pruning. Results
2
~60% weights sparsity in conv layers
~96% weights sparsity

in fc layers

Слайд 9
Deep Compression
2
ICLR 2016 Best Paper


Слайд 10
Deep Compression. Overview
2
Algorithm:

Iterative weights pruning

Weights quantization

Huffman encoding


Слайд 11
Deep Compression. Weights pruning
Already discussed


Слайд 12
Deep Compression. Weights quantization
Initial weights
Cluster weights
Centroids
Fine-tuned
centroids
Final weights
Retrain with
weights
sharing
Write to the

disk.
Each index can be
compressed to 2 bits

Write to the disk.
Only ¼ of original weights

~4x reduction


Слайд 13
Deep Compression. Huffman coding
2
Huffman coding – lossless compression. Output is
The

variable-length code table for encoding a source
symbol.

Frequent symbols are encoded with less bits.

Distribution of the weight indexes. Some indexes are more frequent
than the others!


Слайд 14
Deep Compression. Results
35-49x overall compression


Слайд 15
Deep Compression. Results
~13x reduction (pruning)
~31x reduction (quantization)
~49x reduction (Huffman coding)


Слайд 16
Incremental Network Quantization


Слайд 17
Incremental Network Quantization. Idea
Idea:

let’s quantize weights incrementally (as we do during

pruning)

let’s quantize to the power of 2

Слайд 18
Incremental Network Quantization. Overview

Initial weights
Partitioning
Power of 2
quantization
Retraining
Repeat until everything

is quantized

Слайд 19
Incremental Network Quantization. Overview
Initial weights
Partitioning
Power of 2
quantization
Retraining
Repeat until everything is

quantized

Слайд 20
Incremental Network Quantization. Overview
2
Initial weights
Partitioning
Power of 2
quantization
Write to the

disc.
Powers of 2 set: {-3, -2, -1, 0, 1}
Can be represented with 3 bits.

~10x reduction (3 bits instead of 32)


Слайд 21
Incremental Network Quantization. Results
~7x reduction, accuracy increased (!)


Слайд 22
Incremental Network Quantization. Results
2
No big drop in accuracy even with

3 bits
for ResNet-18

Слайд 23
Incremental Network Quantization. Results
~53x reduction if combined with pruning
(better than Deep

Compression)

Слайд 24
Future: native hardware support


Слайд 25
Future: native hardware support

~92 8-bit OPS/sec


Слайд 26
Этапы типового внедрения платформы
Alexander Chigorin
Head of research projects
VisionLabs

a.chigorin@visionlabs.ru


Обратная связь

Если не удалось найти и скачать презентацию, Вы можете заказать его на нашем сайте. Мы постараемся найти нужный Вам материал и отправим по электронной почте. Не стесняйтесь обращаться к нам, если у вас возникли вопросы или пожелания:

Email: Нажмите что бы посмотреть 

Что такое ThePresentation.ru?

Это сайт презентаций, докладов, проектов, шаблонов в формате PowerPoint. Мы помогаем школьникам, студентам, учителям, преподавателям хранить и обмениваться учебными материалами с другими пользователями.


Для правообладателей

Яндекс.Метрика