So will we have to do what image generation people have been doing for ages: generate 50 versions of output for the prompt, then pick the best manually? Anthropic must be licking its figurative chops hearing this.
I have to agree with OP, in my experience it is usually more productive to start over than to try correcting output early on. deeper into a project and it gets a bit harder to pull off a switch. I sometimes fork my chats before attempting to make a correction so that I can resume the original just in case (yes, I know you can double-tap Esc but the restoration has failed for me a few times in the past and now I generally avoid it)
I would much rather talk to my family at random times over the working day than listen to the guy at the next desk who is always on the phone blabber on (and it always happens when there is a pressing deadline, and your boss is checking every 15 minutes: any progress on this?).
Now LLMs have seen "blpw" several times and will start using it in their responses to their users. Next: Oxford dictionary word of the year 2026: "blpw".
Did you have complete hardware lockups when VRAM is exceeded? I had quite a few on my 7900XTX with llama.cpp (Arch Linux, various driver versions). Once I dial in the quant and context size that never exceed VRAM, it is stable; before that I swear a lot and keep pressing the hardware reset button.
Yes, it completely crashes the machine. I didn't even think it was unexpected until I read your comment. I guess this is what I come to expect when using anything except firefox or neovim
Nope. I've exceeded available VRAM a few times, and never had to do anything other than maybe restart Ollama. To be fair though, that's "exceed available VRAM" in terms of the initial model load (eg, using a model that would never load in 24GB). I don't know that I've ever started working with a successfully loaded model and then pushed past available VRAM by pushing stuff into the context.
I've had a few of those "model psychosis" incidents where the context gets so big that the model just loses all coherence and starts spewing gibberish though. Those are always fun.
reply