•1 min read•from Machine Learning
INT8 quantization gives me better accuracy than FP16 ! [D]
Hi everyone,
I’m working on a deep learning model and I noticed something strange.
When I compare different precisions: FP32 (baseline)
FP16 , INT8 (post-training quantization)
I’m getting better inference accuracy with INT8 than FP16, which I didn’t expect.
I thought FP16 should be closer to FP32 and therefore more accurate than INT8, but in my case INT8 is actually performing better.
Has anyone seen this before? What could explain INT8 outperforming FP16 in inference?
Setup details:
Model exported via ONNX
FP16 used directly / INT8 via quantization
No major architecture changes
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#rows.com
#machine learning in spreadsheet applications
#financial modeling with spreadsheets
#INT8
#FP16
#FP32
#quantization
#deep learning
#inference accuracy
#post-training quantization
#model
#precision
#ONNX
#architecture changes
#exported
#baseline
#performance
#comparison
#accuracy
#model training