SPIR-V for OpenCL and for Vulkan are substantially different, with the translati...

Const-me · on Dec 17, 2023

Why would you want OpenCL? Pretty sure D3D11 compute shaders gonna be adequate for a Torch backend, and they even work on Linux with Wine: https://github.com/Const-me/Whisper/issues/42 Native Vulkan compute shaders would be even better.

Why would you want unified address space? At least in my experience, it’s often too slow to be useful. DMA transfers (CopyResource in D3D11, copy command queue in D3D12, transfer queue in VK) are implemented by dedicated hardware inside GPUs, and are way more efficient.

hjabird · on Dec 19, 2023

> Why would you want OpenCL?

OpenCL is stricter with the results of floating point operations, and makes different assumptions with respect to memory aliasing. Whether or not this is important the AI domain I don't know.

> Why would you want a unified address space?

A unified address space doesn't always imply that the memory can be accessed from anywhere (although that might also be supported with some memory allocation mechanisms), and you still may have to copy between host and device memory. But it makes it much easier to have pointers in your GPU kernels, instead of having to deal with objects like OpenCL buffers.

viraptor · on Dec 18, 2023

> Why would you want unified address space?

Mac APU I guess. Or Jetson/Tegra kind of things.

Const-me · on Dec 18, 2023

My laptop has a single GPU inside Ryzen 5 5600U i.e. unified memory, all consoles also have unified memory. These devices are fine with traditional GPU programming model, where shaders only have access to well-shaped pieces of memory accessible through resource views or UAVs.

CPU-style memory access in GPU kernels technically possible (CUDA did it) but unfortunately rather hard. The feature requires hardware support inside GPUs (need pointers, they need to be 64-bits, need 64-bit integer arithmetic instructions), and therefore not going to work on many current GPUs. It becomes harder to compile and optimize GPU-running code. On devices without physically unified memory, the performance of these kernels gonna suck.

Luckily, none of that is needed to implement D3D11 or Vulkan backend for PyTorch. PyTorch is not a general purpose GPGPU runtime, it’s merely a high-level library which manipulates tensors and implements a few BLAS routines operating on them. It’s easy to allocate a GPU resource for each tensor being manipulated.

zozbot234 · on Dec 18, 2023

Vulkan backend for PyTorch exists. It's mostly tested on Android, but it's there. PyTorch maintainers though are reluctant to advertise that as "support" because complete, reliable support for the zillion 'operators' PyTorch includes is quite a different challenge.