I find you post intriguing. What would you say are the major janks with torch.co...

brucethemoose2 · on June 27, 2023

- The gain in stable diffusion is modest (15%-25% last I checked?)

- Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles.

- Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.

- Some common augmentations (like TomeSD) break compilation, force recompiles, make compilation take forever, or kill the performance gains.

- There are othdr miscellaneous bugs, like compilation freezing the Python thread and causing networking timeouts in web UIs, or errors with embeddings.

- Dynamic input in Torch 2.1 nightly fixes many of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen...

- TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.

- AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.

- TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.

Note that the popular SD Web UIs don't support any of this, with two exceptions I know of: VoltaML (with WIP AIT support) and the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think). There is about 0% chance of ML compilation support in A1111, and the HF diffusers UIs are less bleeding edge and performance/compatibility focused.

And yes, triton torch.compile is aimed at training. There is an alternative backend (Hidet) that explicitly targets inference, but it does not work with Stable Diffusion yet.

saiojd · on July 1, 2023

Thanks for the info. I didn't know about the TomeSD stuff, really interesting. Why do you think that AITemplate is so much faster?