GTC 2021: Systematic Neural Network Quantization

927 subscribers

2,764 views

About
Share

Published On Apr 26, 2021

An important next milestone in machine learning is to bring intelligence at the edge without relying on the computational power of the cloud. This could lead to more reliable, lower latency, and privacy preserving AI for a wide range of applications. However, state-of-the-art NN models often require prohibitive amounts of compute, memory, and energy resources for edge deployment. To address these challenges, I will present our latest work on hardware-aware quantization that achieve optimal tradeoff between accuracy, latency, and model size. In particular, I will discuss HAWQV3, which is a new second-order quantization method where the entire inference can be performed with integer-only arithmetic and without any floating point operations.

Related papers are:
- A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630.

- HAWQV3: Dyadic neural network quantization. ICML, 2021.

-HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. NeurIPS, 2020.

- HAWQ: Hessian AWare quantization of neural networks with mixed-precision. ICCV, 2019.

- I-BERT: Integer-only BERT quantization. ICML, 2021.

- Q-BERT: Hessian based ultra low precision quantization of BERT. AAAI, 2020.

Published On Apr 26, 2021

Share/Embed

Video Link