Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find you post intriguing. What would you say are the major janks with torch.compile, and what issues are addressed by TVM/AITemplate but not by torch.compile?

EDIT: If I understand correctly these libraries target deployment performance, while torch.compile is also/mostly for training performance?



- The gain in stable diffusion is modest (15%-25% last I checked?)

- Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles.

- Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.

- Some common augmentations (like TomeSD) break compilation, force recompiles, make compilation take forever, or kill the performance gains.

- There are othdr miscellaneous bugs, like compilation freezing the Python thread and causing networking timeouts in web UIs, or errors with embeddings.

- Dynamic input in Torch 2.1 nightly fixes many of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen...

- TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.

- AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.

- TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.

Note that the popular SD Web UIs don't support any of this, with two exceptions I know of: VoltaML (with WIP AIT support) and the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think). There is about 0% chance of ML compilation support in A1111, and the HF diffusers UIs are less bleeding edge and performance/compatibility focused.

And yes, triton torch.compile is aimed at training. There is an alternative backend (Hidet) that explicitly targets inference, but it does not work with Stable Diffusion yet.


Thanks for the info. I didn't know about the TomeSD stuff, really interesting. Why do you think that AITemplate is so much faster?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: