I find you post intriguing. What would you say are the major janks with torch.compile, and what issues are addressed by TVM/AITemplate but not by torch.compile?
EDIT: If I understand correctly these libraries target deployment performance, while torch.compile is also/mostly for training performance?
- The gain in stable diffusion is modest (15%-25% last I checked?)
- Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles.
- Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.
- Some common augmentations (like TomeSD) break compilation, force recompiles, make compilation take forever, or kill the performance gains.
- There are othdr miscellaneous bugs, like compilation freezing the Python thread and causing networking timeouts in web UIs, or errors with embeddings.
- TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.
- AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.
- TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.
Note that the popular SD Web UIs don't support any of this, with two exceptions I know of: VoltaML (with WIP AIT support) and the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think). There is about 0% chance of ML compilation support in A1111, and the HF diffusers UIs are less bleeding edge and performance/compatibility focused.
And yes, triton torch.compile is aimed at training. There is an alternative backend (Hidet) that explicitly targets inference, but it does not work with Stable Diffusion yet.
EDIT: If I understand correctly these libraries target deployment performance, while torch.compile is also/mostly for training performance?