Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know an efficient way to "embed" models like this? I'm currently working in a Tamagotchi-style RPI toy and I use GPT-2 to generate answers to the chat. I wrote a simple API that returns from the server. If I could embed my model, it would save me having to have a server.


The hard part of embedding is that the smallest 124M GPT-2 model itself is huge at 500MB, which would be unreasonable for performance/storage on the user end (and quantization/tracing can't save that much space).

Hence why I'm looking into smaller models, which has been difficult, but releasing aitextgen was a necessary first step.


The size of the model you need to get good enough generation with something like GPT-2 is going to be pretty impractical on a raspberry pi. You might maybe be able to fit a 3-layer distilled GPT-2 in RAM (not quite sure what the latest RPI have in term of RAM, 4GB?), but the latency is going to be pretty horrible (multiple seconds).


why not put it on a server, and just use an api to communicate and get the results, then the embed of the code that interfaces w/ api should be much smaller, and the server can be as big as you need.


What do you mean by embed the model?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: