Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm more looking at the problem more like code

https://bbycroft.net/llm

My immediate thought is when the model responds "Oh I'm thinking about X"... that X isn't from the input, it's from attention, and thinking this experiment is simply injecting that token right after the input step into attn--but who knows how they select which weights



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: