Does anyone know an efficient way to "embed" models like this? I'm currently wor...

minimaxir · on May 18, 2020

The hard part of embedding is that the smallest 124M GPT-2 model itself is huge at 500MB, which would be unreasonable for performance/storage on the user end (and quantization/tracing can't save that much space).

Hence why I'm looking into smaller models, which has been difficult, but releasing aitextgen was a necessary first step.

sailingparrot · on May 18, 2020

The size of the model you need to get good enough generation with something like GPT-2 is going to be pretty impractical on a raspberry pi. You might maybe be able to fit a 3-layer distilled GPT-2 in RAM (not quite sure what the latest RPI have in term of RAM, 4GB?), but the latency is going to be pretty horrible (multiple seconds).

gremlinsinc · on May 19, 2020

why not put it on a server, and just use an api to communicate and get the results, then the embed of the code that interfaces w/ api should be much smaller, and the server can be as big as you need.

alphagrep12345 · on May 18, 2020

What do you mean by embed the model?