And it will be even more expensive to train it again on larger amounts of data and with a model with 10 times more parameters.
Only Big Tech giants like Microsoft, Google, etc can afford to foot the bill and throw away millions into training LLMs, whilst we celebrate and hype about ChatGPT and LLMs getting bigger and significantly more expensive to train when they get confused, hallucinate over silly inputs and confidently generate bullshit.
That can't be a good thing. OpenAI's ClosedAI model needs to be disrupted like how Stable Diffusion challenged DALLE-2 with an open source AI model.
I disagree, I run a small tech company that has a group that's been experimenting with stable diffusion and we noticed that an extreme version of the Pareto Principle applies here as well where you can get ~90% of the benefits for like 5% of the cost, combined with the fact that computing power is continuously getting cheaper.
Based on that groups success, they've recently proposed a mini project inspired by GPT that I am considering funding; the data its trained on is all publicly available for free, and most it comes from Common Crawl. I suspect that it will also yield similar results, where you can tailor your own version of GPT and get reasonably good models for a fraction of the price as well. We're no where close to the scale of Big Tech giants, but I've noticed for the better part of 15 years that small companies can actually derive a great deal of the benefits that larger companies have for a fraction of the cost if they play it smart and keep things tight.
This is happening already. The trick is to run a search against an existing search engine, then copy and paste the search results into the language model and ask it to answer questions based on what you provide it.
A small difference between the pattern you describe and the one of the inquiry is where responsibility lies for retrieving and incorporating the augmentation. You describe the pattern where an orchestration layer sits in front of the model, performs the retrieval, and then determines how to serve that information down to the model. The inquiry asks about whether the AI/model itself can perform the retrieval and incorporation function.
It’s a small difference, perhaps, but with some significance since the retrieval and incorporation occurring outside the model has a different set of trade offs. I’m not specifically aware of any work where model architectures are being extended to perform this function directly, but I am keen to learn of such efforts.
Yes, check out LangChain [0]. It enables you to wire together LLMs with other knowledge sources or even other LLMs. For example, you can use it to hook GPT-3 up to WolframAlpha. I’m sure you could pretty easily add a way for it to communicate with a human expert, too.
If an expert write a long test and you and "in summary: " at the end, the model will complete with something approximating truth (depend on size of model, training, etc)
Humains do a similar things. We have a model in our head of the subject discussed and we can summarize, but we will forget some parts, make errors, etc. GPT is very similar.
It is! You can specify on its prompt that it should "request additional info via search query, using the following syntax: [[search terms here]], before coming to a final conclusion" then you integrate it with a traditional knowledge base textual look up, and run it again with that information concatenated
Stable Diffusion could do it because the task turned out to be amenable to reasonably small models. But there's no evidence of that being the case with GPT.
That said, other organizations that can afford to foot the bill for it are the governments. This is hardly ideal, since such models will also come with plenty of strings attached - indeed, probably more than the private ones - but at least these policies are somewhat checked by democratic mechanisms.
Long-term I think the demand for more AI compute power will lead to much more investment in GPU design and manufacture, driving the prices down. Since the underlying tech itself is well-understood, I fully expect to see the day when one can train and run a customized GPT-3 instance for one's private use, although the major players will likely be far ahead by then.
Only Big Tech giants like Microsoft, Google, etc can afford to foot the bill and throw away millions into training LLMs, whilst we celebrate and hype about ChatGPT and LLMs getting bigger and significantly more expensive to train when they get confused, hallucinate over silly inputs and confidently generate bullshit.
That can't be a good thing. OpenAI's ClosedAI model needs to be disrupted like how Stable Diffusion challenged DALLE-2 with an open source AI model.