Hacker Newsnew | past | comments | ask | show | jobs | submit | skidrow's submissionslogin
1.Creating custom kernels for the AMD MI300 (huggingface.co)
2 points by skidrow 38 days ago | past
2.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
4 points by skidrow 38 days ago | past
3.Matrix Core Programming on AMD GPUs (salykova.github.io)
116 points by skidrow 38 days ago | past | 5 comments
4.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
3 points by skidrow 40 days ago | past
5.Matrix Core Programming on AMD GPUs (salykova.github.io)
2 points by skidrow 40 days ago | past
6.Creating custom kernels for the AMD MI300 (huggingface.co)
1 point by skidrow 40 days ago | past
7.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
2 points by skidrow 41 days ago | past
8.Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture (salykova.github.io)
24 points by skidrow 41 days ago | past | 3 comments
9.Creating custom kernels for the AMD MI300 (huggingface.co)
2 points by skidrow 41 days ago | past
10.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
2 points by skidrow 42 days ago | past
11.Advanced Matrix Multiplication Optimization on Multi-Core Processors (2024) (salykova.github.io)
85 points by skidrow 42 days ago | past | 3 comments
12.Creating custom kernels for the AMD MI300 (huggingface.co)
2 points by skidrow 42 days ago | past
13.Introduction to Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture (salykova.github.io)
2 points by skidrow 42 days ago | past
14.Creating custom kernels for the AMD MI300 (huggingface.co)
2 points by skidrow 3 months ago | past
15.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
2 points by skidrow 3 months ago | past
16.Creating custom kernels for the AMD MI300 (huggingface.co)
1 point by skidrow 3 months ago | past
17.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
4 points by skidrow 3 months ago | past
18.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
2 points by skidrow 3 months ago | past | 1 comment
19.Compiler Explorer: An Essential Kernel Playground for CUDA Developers (nvidia.com)
2 points by skidrow 3 months ago | past
20.Creating custom kernels for the AMD MI300 (huggingface.co)
1 point by skidrow 3 months ago | past
21.DeepSeek-R1 and FP8 Mixed-Precision Training (colfax-intl.com)
2 points by skidrow 6 months ago | past
22.How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024) (alexarmbr.github.io)
147 points by skidrow 6 months ago | past | 17 comments
23.DeepSeek-R1 and FP8 Mixed-Precision Training (colfax-intl.com)
2 points by skidrow 6 months ago | past
24.Implementing a Fast Tensor Core Matmul on the Ada Architecture (spatters.ca)
1 point by skidrow 6 months ago | past
25.How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (alexarmbr.github.io)
2 points by skidrow 6 months ago | past
26.Understanding Peak, Max-Achievable and Delivered FLOPs (amd.com)
1 point by skidrow 7 months ago | past
27.DeepSeek-R1 and FP8 Mixed-Precision Training (colfax-intl.com)
1 point by skidrow 7 months ago | past
28.Outperforming cuBLAS on H100: A Worklog (cudaforfun.substack.com)
3 points by skidrow 7 months ago | past
29.Optimizing Matrix Multiplication on RDNA3 (seb-v.github.io)
118 points by skidrow 7 months ago | past | 26 comments
30.Outperforming cuBLAS on H100: A Worklog (cudaforfun.substack.com)
1 point by skidrow 7 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: