Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah the previous comment is almost completely wrong. But on last point, it didn't get here by hard-coding and the reinforcement learning part at the end i think shows it definitely can "learn". I guess one thing I would still point to is it has a weakened sense of "self" still I think, which goes inline with it distinguishing things learned in training vs in deployment (though maybe this is just intentional idk)


Yes I agree - I really meant it doesn't directly learn from your interaction (i.e. it forgets you spoke as soon as you close the session, and only learns indirectly through the next training run - your words do not affect it's internal representation in real time like humans). The learning appears to require more information than humans too, who can do 'more with less'.

You are right that it clearly can learn though, the learnings are just baked-in with each cycle.


I think the Microsoft Tay incident is reason enough that OpenAI know it would be a bad idea to structure things in such a way that the internet can wreck it in a manner of hours, but I certainly think they could make it learn in real-time if they wanted to. They are running RLHF at least with the current ChatGPT, but in theory maybe they could do something like each user gets their own LoRA and it uses their conversations to fine-tune a version specific to them.

I suspect we'll start seeing ideas akin to that tried in the open source community.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: