This is coming! Myself and others at OctoML and in the TVM community are actively working on multi-gpu support in the compiler and runtime. Here are some of the merged and active PRs on the multi-GPU (multi-device) roadmap:
The first target will be LLM's on multiple NVIDIA GPUs but as with all of MLC-LLM effort, the approach will generalize to other hardware including AMD's wonderful hardware.
Support in TVM’s graph IR (Relax) - https://github.com/apache/tvm/pull/15447 Support in TVM’s loop IR (TensorIR) - https://github.com/apache/tvm/pull/14862 Distributed dialect of TVM’s graph IR for multi-node (GSPMD-type): https://github.com/apache/tvm/pull/15289
The first target will be LLM's on multiple NVIDIA GPUs but as with all of MLC-LLM effort, the approach will generalize to other hardware including AMD's wonderful hardware.