Similar fun as the time I discovered one could use IFB to set qdiscs on incoming traffic (why would one would do that is left as exercise to the reader, but my journey included using the 'plug' qdisc and tcp-checkpoint/restore). The Linux kernel has so many building blocks....
From their publication history, they want to use all HPC niceties, to use most/any available HPC installations.
Nowadays that means mostly CUDA on NVIDIA and HIP on AMD on the device side. Curious how the spirv support is on NVIDIA GPUs, including nsight tooling and the maturity/performance of libraries available (if only the cub-stuff for collective operations).
The work done/supervised by Kristopher Micinski on using HPC hardware (not only GPUs but clusters) for formal methods is really encouraging. I hope we reach a breakthrough of affinity between COTS compute hardware and all kinds of formal methods, as GPUs found theirs with deep learning and subsequent large models.
One possible answer to 'what do we do with all the P100s, V100s, A100s when they're decomissionned from their AI heyday (apart from 'small(er) models'.
I feel I'm still doing it the old 2010 way, with all my hand-crafted dpdk-and-pipelines-and-lockless-queues-and-homemade-taskgraph-scheduler, any modern reference (apart from 'use seastar' ? ... which fair if it fills your needs) ?
Not since the Ozaki scheme has appeared. Good high-precision perf from low-precision tensor units has unlocked some very interesting uses of low-fp64-perf GPUs.
I think the remark is more about Tensor Cores (or Matrix Cores in AMD lingo) are distributed by SM (and not aside on an interconnect and individually programmable) so on the same SM you have your classical warps (cuda cores) AND the Tensor units and switching between one and the other might be confusing.
My vision of SMs has always been "assume AVX512 is the default ISA" and "tensor cores are another layer aside of this" (kind-of like AMX) and you have this heterogeneous "thing" to program. Don't know if it helps. The CUDA programming model hides a lot and looking at PTX code in nsight-compute is most enlightening.
Can confirm that most of the times when reproducing/implementing a paper or trying to extend it to another field, researchers are pretty OK (some very enthusiastic) to chat over email about it. As long as you've actually read the paper(s) or read the code (if any), and there's no expected free-work...
I sometimes get unpublished artefacts (matlab/python/fortran code, data samples) just by... asking nicely, showing interest. And I'm not even in academia or a lab.
reply