Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ironbound
10 months ago
|
parent
|
context
|
favorite
| on:
Lossless LLM compression for efficient GPU inferen...
The Deepseek v3 paper details a quantisation method of scaling after matmul but before accumulation to improve precision, this is different than normal GEMM as operations are left till the end, can read more in chapter 3.3 of the paper below.
https://arxiv.org/html/2412.19437v2#S3
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://arxiv.org/html/2412.19437v2#S3