Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
skidrow's submissions
login
1.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
2 points
by
skidrow
38 days ago
|
past
2.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
4 points
by
skidrow
38 days ago
|
past
3.
Matrix Core Programming on AMD GPUs
(
salykova.github.io
)
116 points
by
skidrow
38 days ago
|
past
|
5 comments
4.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
3 points
by
skidrow
40 days ago
|
past
5.
Matrix Core Programming on AMD GPUs
(
salykova.github.io
)
2 points
by
skidrow
40 days ago
|
past
6.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
1 point
by
skidrow
40 days ago
|
past
7.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
2 points
by
skidrow
41 days ago
|
past
8.
Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture
(
salykova.github.io
)
24 points
by
skidrow
41 days ago
|
past
|
3 comments
9.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
2 points
by
skidrow
41 days ago
|
past
10.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
2 points
by
skidrow
42 days ago
|
past
11.
Advanced Matrix Multiplication Optimization on Multi-Core Processors (2024)
(
salykova.github.io
)
85 points
by
skidrow
42 days ago
|
past
|
3 comments
12.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
2 points
by
skidrow
42 days ago
|
past
13.
Introduction to Matrix Core Programming on AMD CDNA3 and CDNA4 Architecture
(
salykova.github.io
)
2 points
by
skidrow
42 days ago
|
past
14.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
2 points
by
skidrow
3 months ago
|
past
15.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
2 points
by
skidrow
3 months ago
|
past
16.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
1 point
by
skidrow
3 months ago
|
past
17.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
4 points
by
skidrow
3 months ago
|
past
18.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
2 points
by
skidrow
3 months ago
|
past
|
1 comment
19.
Compiler Explorer: An Essential Kernel Playground for CUDA Developers
(
nvidia.com
)
2 points
by
skidrow
3 months ago
|
past
20.
Creating custom kernels for the AMD MI300
(
huggingface.co
)
1 point
by
skidrow
3 months ago
|
past
21.
DeepSeek-R1 and FP8 Mixed-Precision Training
(
colfax-intl.com
)
2 points
by
skidrow
6 months ago
|
past
22.
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores (2024)
(
alexarmbr.github.io
)
147 points
by
skidrow
6 months ago
|
past
|
17 comments
23.
DeepSeek-R1 and FP8 Mixed-Precision Training
(
colfax-intl.com
)
2 points
by
skidrow
6 months ago
|
past
24.
Implementing a Fast Tensor Core Matmul on the Ada Architecture
(
spatters.ca
)
1 point
by
skidrow
6 months ago
|
past
25.
How to Write a Fast Matrix Multiplication from Scratch with Tensor Cores
(
alexarmbr.github.io
)
2 points
by
skidrow
6 months ago
|
past
26.
Understanding Peak, Max-Achievable and Delivered FLOPs
(
amd.com
)
1 point
by
skidrow
7 months ago
|
past
27.
DeepSeek-R1 and FP8 Mixed-Precision Training
(
colfax-intl.com
)
1 point
by
skidrow
7 months ago
|
past
28.
Outperforming cuBLAS on H100: A Worklog
(
cudaforfun.substack.com
)
3 points
by
skidrow
7 months ago
|
past
29.
Optimizing Matrix Multiplication on RDNA3
(
seb-v.github.io
)
118 points
by
skidrow
7 months ago
|
past
|
26 comments
30.
Outperforming cuBLAS on H100: A Worklog
(
cudaforfun.substack.com
)
1 point
by
skidrow
7 months ago
|
past
More
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: