This is cool but following some of the links it seems like there are a lot of im...

This is cool but following some of the links it seems like there are a lot of immature parts of the ecosystem and things will not "just work". See for example this bug which I found from the blog post: https://github.com/odsl-team/julia-ml-from-scratch/issues/2

Summarizing, they benchmark some machine learning code that uses KernelAbstractions.jl on different platforms and find:

* AMD GPU is slower than CPU

* Intel GPU doesn't finish / seems to leak memory

* Apple GPU doesn't finish / seems to leak memory

Would also be interesting to compare the benchmarks to hand-written CUDA kernels (both in Julia and C++) to quantify the cost of the KernelAbstractions layer.