I have a question because I do not understand how the models work: Are they able...

regularfry · on Feb 23, 2024

What you'll see in tools that have any exposure to enterprise requirements is an option to say "don't regurgitate your training data". Basically if it generates something that's too similar to any of its input documents, it's thrown away before you see it.

In Github Copilot the option is labeled "Suggestions matching public code". They can offer to block them because they control both the input dataset and the model at inference time. If you download an open source model I don't think you can do it out of the box, you'd need to have that input dataset to be able to do the filtering.

brandall10 · on Feb 23, 2024

Occasionally I find GPT4 will blur a response indicating it's reproduced from a specific source and will ask me to rephrase my request/question.

So at least OpenAI has some safeguard in place to not do that. Have no clue how that behavior is determined or whether or not other providers do similar.