Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dupe] GPT-J: GPT-3 Democratized (p3r.one)
118 points by appleskimer on July 4, 2021 | hide | past | favorite | 39 comments



Yeah, twice in one day (and 3+ times in a month) is a bit much and this post adds little new except some oddities like singling kubernetes in particular as 'the technology that helped train GPT-3'.


As the article notes, "There’s quite a similarity between cloud native and Artificial Intelligence."


Is it a dupe though if it's a blog post, as opposed to the earlier Github submission?


Debatable but then the title should be better


I run GPT-J on a Titan RTX where I am writing a novel with it. To make it generate about 20k tokens or two pages of content takes a few minutes . I would say the output is comparable to other language models quality and so forth.

Note that refinement or transfer learning doesn’t apply anymore it’s more like using a zero shot classifier or in other words you have to craft the input like Siri or wolfram alpha but expect text back instead

It runs on about 15gb of VRAM and https://www.eleuther.ai/ released it under the Apache license.

I use this endpoint written by kinoc using FastAPI https://gist.github.com/kinoc/f3225092092e07b843e3a2798f7b39... which is released under the MIT licence.


It sounds like we're using very similar setups for writing. :)

I've primarily been using GPT-3 (and burning through millions of tokens) so I've been experimenting with GPT-J more lately and I've found it makes significantly more basic logic errors (e.g. mixing up pronouns, "forgetting" characters, using new characters), which makes me lean more towards constantly regenerating text versus revising it, and I feel like I'm backed into a box of having to babysit ~100-200 tokens at a time instead of generating significantly more at a time like I do with GPT-3.

I also built a quick tool that lets me adjust how much context I'm using in the prompt and generate 2-3 side-by-side completions to pick and choose from (just to speed up the flow of `click best suggestion` --> `keep generating from there`), but I haven't integrated GPT-J yet since it just feels... lower quality (it feels similar to ~Babbage, IMO).

But being comparatively* free, I'm still excited about GPT-J. Do you have any tips or processes you've found to make it spit out higher-quality text? 20k tokens at a time is quite a lot -- do you also have problems with winding paths / staying on a general "plot"?

Would love to hear any suggestions you have, because I'd sure love to move off GPT-3 to something comparable in quality!


I didn't experiment that much to get the difference between Babbage and GPTJ but that sounds similar than what I experienced. I am not sure what your novel is going to look like but due to the limitations between GPT-3 / GPTJ etc.. I plan on it being something like a stream of consciousness.

I thought of babysitting lower amounts of tokens but I was getting okay quality with a few thousand but when I looked into it I saw that in the middle usually there was some weird change.

I know that there is a GPTJ setting (and I believe GPT-3 has this) where you can see the best of X tries. Clicking best suggestion sounds like a good idea of a tool but it might be better just to recursively store all outputs. I am interested in the tool you are using are you able to share it?

The writing process I am using is that there was a YouTube reviewer that used a dataset of Light Novels and utilized a refined version of GPT-2 on that to create a title of some light novels and then basically cherry picked them / did analysis / had his friends do a Turing test. I would assume that using GPT-3 / GPTJ could do the same thing but doing a zero shot with a 10-20 token limitation.

(EDIT GPT-3 using Davinci and the Two-sentence Horror Story would allow you to generate interacting story titles and then from there you can use GPT-3 / GPTJ to do the rest.)

(also if anyone is interested in the novel I am writing email me at zitterbewegung at gmail dot com).


Thanks for the answer!

Best of N is super useful. With GPT-3 (and probably GPT-J) you can actually pass an n parameter to match best_of and get back all of the N texts generated, which is how my little tool works (displaying them all to pick from, since the "best" one by GPT-3's standards isn't always the "best" one from my author POV). It's also nice to not feel like i'm throwing away tokens, since best_of=3 uses 3x the tokens whether you look at all the generations or not!

Thanks for the details on the Light Novels process. I'll have to look into it and see if something like that works for me!

The tool I wrote is at https://young-savannah-97958.herokuapp.com/ (source code at https://github.com/indentlabs/gpt-3-writer), but only works with GPT-3 right now. You just stick in your API key (which doesn't get stored/saved) and customize completion settings if you want, then trigger completions while writing with the button in the bottom-right. It's very rough around the edges since I just built it for my own use, have a ton on the to-do list, and haven't publicized it, but it could be helpful. :)


What I do is I take the first title then I make derivative titles based on the starting title. Like “write a story about cookies” then “write a story about chocolate chip titles”.


Can I run it on an RTX-3090? Where would I find information on how to do it?

EDIT: Reply below pointed out that the gist linked above specifically mentions 3090 at the top.


The link mentions Titan and 3090 specifically so I'd guess yes.


> I am writing a novel with it.

Could you say more?


Spam Amazon Kindle store with GPT-generated $.99 novel books.


Novel will be a stream of consciousness of a person that imitates a ;light novel.


However the fact that it is trained with 6B parameters compared to GPT-3’s 175B parameters indicate open source GPT still has a lot to catch on. GodSpeed and full speed !!!


That seems ridiculously slow. How is machine learning supposed to scale like this?


I mean there's two ways.

1. Just wait a few years for better hardware. Not even joking on that one. The computational requirements of most modern GAN and transformer-based approaches, like the StyleGAN image generation I was playing around with recently, would have been daunting even on the largest supercomputers just 10 or 15 years ago and before that it was science fiction.

2. Find more shortcuts. We all knew neural nets could learn any function all along, in principle. If you have infinite time and space, some of the simplest multilayer feedback networks can learn anything learnable. I don't think anyone really expected to find such relatively efficient algorithms for that, like we have in the last few years, though.


I’m only running one Titan RTX which is slower than a 3090 and you can parallelize the system by buying multiple graphics card or by increasing the amount of VRAM used on the endpoint and buying an Ampere .


Ironic that we have to create open source versions of things from “OpenAI”…


There is open and there is open. OpenAI is more like the latter, similar to OpenVMS.

EDIT: Or OpenWindows desktop environment.


So not really open, and just bullshit marketing.


ClosedAI doesn't really have the same ring to it...


Pretty much, it is like "OpenBionics" where everything is closed sourced.


open as in "open your wallet"


Haha


I was told that GPT-3 was too much power for mere mortals (without a paid subscription!) so what terror is this going to bring upon us?


Transparent and competent automated content moderation, maybe, easily available for anyone to run their own communities by their own standards. Once matured, you can easily envision people sharing policies and templates, or providing moderation as a service, for any sort of social text interaction.


I was just thinking about this today. How many (and which?) ML books would one need to read to be able to throw comments into a model like this and have it decide if they're acceptable vs unacceptable? How practical would it be to build a frequently re-training model that could serve as autopilot for a small forum or subreddit moderation? And how much would that serve to reinforce existing filter bubbles and further divide the internet?


0 books. The huggingface hosted gpt-neo 125 model is capable of doing basic question/answer analysis to a sufficient level for most moderation. No finetuning needed if you have it work out yes or no zero shot answers to things like "is this text bullying, excessively negative, political, sexist..." etc, and then iterate over cases and log results. Modify your prompts to handle edge cases and it would probably be capable of replacing 80% of human moderation.


Hmm, I'll still need to scrounge up enough context to understand most of that, but thanks.


https://jalammar.github.io/illustrated-gpt2/

There are several easily found resources like this, and previous hn threads contain papers, documentation, and guide links spanning the spectrum from beginner to PhD researcher. You can get to a high level of proficiency without any books, but you'll need to learn the jargon for searches. It's a niche enough subject that results don't get too muddied on almost any search engine - good luck!


How much resources would it take to train something like this? Let's say in RTX-3090 months, or in $ running on AWS (or whoever offers TPUs)?

In the link it says training FLOPs: 1.5e22, a RTX 3090 has 285 TFLOPS/sec for tensor operations. If you can actually calculate it like that, 1000 GPUs would take 20 months.

https://www.google.com/search?q=1.5e22+%2F+%281000+*+285e9+%...

Working with grid computing at CERN, I've had access to some pretty big computing resources myself, but the scale of this is mind-boggling...


Love this!

Could anyone recommend a tutorial how to run this? I need to generate answers to 1000 questions for my app. Still waiting for GPT-3 invite for months


If you don't need to host it, you could use it online

https://6b.eleuther.ai/ https://bellard.org/textsynth/


Does anyone else take fault with the increasingly commonplace usage of the word democratize as applied to technology?

For every valid case, I see others that make my head hurt. Is it just a buzzword for telling a story with emotional appeal to users and investors?

Making something available to someone who didn’t have it earlier isn’t democratizing. And ignoring future considerations is just lazy.

If guns were invented today they’d probably touted as democratizing violence.


I had a History teacher who told me democracy often doesn't really mean anything. I've been thinking about it ever since. "Free GPT-3" would probably be better.


Interesting.

I guess my off the cuff, unexamined criteria for democratizing is whether it’s a gating/enabling function for the occurrence of one party’s objectives as directly decided by the wider community.

So something like GoFundMe or crowdfunding, sure. Although doesn’t necessarily need to be monetary.

But I’d be interested in hearing counteropinions.


That's pretty much exactly what "democratizing" means, at least etymologically. It brings power (kratia) to the people (demos).

"Democracy" refers to a political system, but "democratizing" has pretty much always meant to open things up more widely. Example:

"The State wishes to democratize instruction by its "French instruction" and the standard must inevitably be the lowering of the very standard it sets up." -- 1894

https://www.google.com/books/edition/Education_from_a_Nation...

And yeah -- guns do democratize violence. That's precisely why they were invented. A Google search for "guns democratize violence" turns up several hits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: