Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a question because I do not understand how the models work: Are they able to create code themselves, or does code ALWAYS come from a specific source?

I assume that if I ask for a complex sequence in RXJS operators, that comes from the model inferring the code from lots of examples and docs. But if I ask for something really specific that might just come from a stackoverflow article or GitHub repo. The ambiguity about the sourcing is the main thing that makes me itchy about “AI”.



What you'll see in tools that have any exposure to enterprise requirements is an option to say "don't regurgitate your training data". Basically if it generates something that's too similar to any of its input documents, it's thrown away before you see it.

In Github Copilot the option is labeled "Suggestions matching public code". They can offer to block them because they control both the input dataset and the model at inference time. If you download an open source model I don't think you can do it out of the box, you'd need to have that input dataset to be able to do the filtering.


Occasionally I find GPT4 will blur a response indicating it's reproduced from a specific source and will ask me to rephrase my request/question.

So at least OpenAI has some safeguard in place to not do that. Have no clue how that behavior is determined or whether or not other providers do similar.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: