Yeah, twice in one day (and 3+ times in a month) is a bit much and this post adds little new except some oddities like singling kubernetes in particular as 'the technology that helped train GPT-3'.
I run GPT-J on a Titan RTX where I am writing a novel with it. To make it generate about 20k tokens or two pages of content takes a few minutes . I would say the output is comparable to other language models quality and so forth.
Note that refinement or transfer learning doesn’t apply anymore it’s more like using a zero shot classifier or in other words you have to craft the input like Siri or wolfram alpha but expect text back instead
It runs on about 15gb of VRAM and https://www.eleuther.ai/ released it under the Apache license.
It sounds like we're using very similar setups for writing. :)
I've primarily been using GPT-3 (and burning through millions of tokens) so I've been experimenting with GPT-J more lately and I've found it makes significantly more basic logic errors (e.g. mixing up pronouns, "forgetting" characters, using new characters), which makes me lean more towards constantly regenerating text versus revising it, and I feel like I'm backed into a box of having to babysit ~100-200 tokens at a time instead of generating significantly more at a time like I do with GPT-3.
I also built a quick tool that lets me adjust how much context I'm using in the prompt and generate 2-3 side-by-side completions to pick and choose from (just to speed up the flow of `click best suggestion` --> `keep generating from there`), but I haven't integrated GPT-J yet since it just feels... lower quality (it feels similar to ~Babbage, IMO).
But being comparatively* free, I'm still excited about GPT-J. Do you have any tips or processes you've found to make it spit out higher-quality text? 20k tokens at a time is quite a lot -- do you also have problems with winding paths / staying on a general "plot"?
Would love to hear any suggestions you have, because I'd sure love to move off GPT-3 to something comparable in quality!
I didn't experiment that much to get the difference between Babbage and GPTJ but that sounds similar than what I experienced. I am not sure what your novel is going to look like but due to the limitations between GPT-3 / GPTJ etc.. I plan on it being something like a stream of consciousness.
I thought of babysitting lower amounts of tokens but I was getting okay quality with a few thousand but when I looked into it I saw that in the middle usually there was some weird change.
I know that there is a GPTJ setting (and I believe GPT-3 has this) where you can see the best of X tries. Clicking best suggestion sounds like a good idea of a tool but it might be better just to recursively store all outputs. I am interested in the tool you are using are you able to share it?
The writing process I am using is that there was a YouTube reviewer that used a dataset of Light Novels and utilized a refined version of GPT-2 on that to create a title of some light novels and then basically cherry picked them / did analysis / had his friends do a Turing test. I would assume that using GPT-3 / GPTJ could do the same thing but doing a zero shot with a 10-20 token limitation.
(EDIT GPT-3 using Davinci and the Two-sentence Horror Story would allow you to generate interacting story titles and then from there you can use GPT-3 / GPTJ to do the rest.)
(also if anyone is interested in the novel I am writing email me at zitterbewegung at gmail dot com).
Best of N is super useful. With GPT-3 (and probably GPT-J) you can actually pass an n parameter to match best_of and get back all of the N texts generated, which is how my little tool works (displaying them all to pick from, since the "best" one by GPT-3's standards isn't always the "best" one from my author POV). It's also nice to not feel like i'm throwing away tokens, since best_of=3 uses 3x the tokens whether you look at all the generations or not!
Thanks for the details on the Light Novels process. I'll have to look into it and see if something like that works for me!
The tool I wrote is at https://young-savannah-97958.herokuapp.com/ (source code at https://github.com/indentlabs/gpt-3-writer), but only works with GPT-3 right now. You just stick in your API key (which doesn't get stored/saved) and customize completion settings if you want, then trigger completions while writing with the button in the bottom-right. It's very rough around the edges since I just built it for my own use, have a ton on the to-do list, and haven't publicized it, but it could be helpful. :)
What I do is I take the first title then I make derivative titles based on the starting title. Like “write a story about cookies” then “write a story about chocolate chip titles”.
However the fact that it is trained with 6B parameters compared to GPT-3’s 175B parameters indicate open source GPT still has a lot to catch on. GodSpeed and full speed !!!
1. Just wait a few years for better hardware. Not even joking on that one. The computational requirements of most modern GAN and transformer-based approaches, like the StyleGAN image generation I was playing around with recently, would have been daunting even on the largest supercomputers just 10 or 15 years ago and before that it was science fiction.
2. Find more shortcuts. We all knew neural nets could learn any function all along, in principle. If you have infinite time and space, some of the simplest multilayer feedback networks can learn anything learnable. I don't think anyone really expected to find such relatively efficient algorithms for that, like we have in the last few years, though.
I’m only running one Titan RTX which is slower than a 3090 and you can parallelize the system by buying multiple graphics card or by increasing the amount of VRAM used on the endpoint and buying an Ampere .
Transparent and competent automated content moderation, maybe, easily available for anyone to run their own communities by their own standards. Once matured, you can easily envision people sharing policies and templates, or providing moderation as a service, for any sort of social text interaction.
I was just thinking about this today. How many (and which?) ML books would one need to read to be able to throw comments into a model like this and have it decide if they're acceptable vs unacceptable? How practical would it be to build a frequently re-training model that could serve as autopilot for a small forum or subreddit moderation? And how much would that serve to reinforce existing filter bubbles and further divide the internet?
0 books. The huggingface hosted gpt-neo 125 model is capable of doing basic question/answer analysis to a sufficient level for most moderation. No finetuning needed if you have it work out yes or no zero shot answers to things like "is this text bullying, excessively negative, political, sexist..." etc, and then iterate over cases and log results. Modify your prompts to handle edge cases and it would probably be capable of replacing 80% of human moderation.
There are several easily found resources like this, and previous hn threads contain papers, documentation, and guide links spanning the spectrum from beginner to PhD researcher. You can get to a high level of proficiency without any books, but you'll need to learn the jargon for searches. It's a niche enough subject that results don't get too muddied on almost any search engine - good luck!
How much resources would it take to train something like this? Let's say in RTX-3090 months, or in $ running on AWS (or whoever offers TPUs)?
In the link it says training FLOPs: 1.5e22, a RTX 3090 has 285 TFLOPS/sec for tensor operations. If you can actually calculate it like that, 1000 GPUs would take 20 months.
I had a History teacher who told me democracy often doesn't really mean anything. I've been thinking about it ever since. "Free GPT-3" would probably be better.
I guess my off the cuff, unexamined criteria for democratizing is whether it’s a gating/enabling function for the occurrence of one party’s objectives as directly decided by the wider community.
So something like GoFundMe or crowdfunding, sure. Although doesn’t necessarily need to be monetary.
That's pretty much exactly what "democratizing" means, at least etymologically. It brings power (kratia) to the people (demos).
"Democracy" refers to a political system, but "democratizing" has pretty much always meant to open things up more widely. Example:
"The State wishes to democratize instruction by its "French instruction" and the standard must inevitably be the lowering of the very standard it sets up." -- 1894
And yeah -- guns do democratize violence. That's precisely why they were invented. A Google search for "guns democratize violence" turns up several hits.