AirLLM enables 8GB MacBook run 70B LLM

rahimnathwani · on Dec 28, 2023

The acknowledgements section in the README links to notebook, which is where OP sourced the techniques:

https://www.kaggle.com/code/simjeg/platypus2-70b-with-wikipe...

The notebook might be an easier read than the repo, but I haven't read either yet.

EDIT: It's very slow, according to the comments in this thread by people who tried it: https://github.com/oobabooga/text-generation-webui/issues/47...

jug · on Dec 28, 2023

Could 2024 become a crisis for commercial AI?

1. We're only barely getting started with free MOE models and Mistral has already impressed.

2. Cloud AI is a poor fit for corporate use at least in EU due to GDPR, the NIS directive and more. You really dont want to exit the EU in where the data processing takes place.

3. There are indications of diminishing returns in LLM performance, where a one year later shot at it from Google, despite massive resources in terms of both experts and data set, still doesn't have Gemini Pro clearly surpass GPT 3.5 and Ultra probably not GPT 4. Meanwhile, competition like Mistral is closing in.

4. The NY Times lawsuit seems like a harbinger for what is to become a theme for AI companies in 2024. Open collaborations are harder to target as legal entities and there is not nearly as much money to gain if you win.

All this points toward a a) convergence of performance that will b) be in the benefit of open models to me.

Interesting times anyway especially as we are STILL only getting started.

ohyrahboy · on Dec 29, 2023

Am iterating on a Linux distribution that boots to an LLM, and has technical, scientific documentation and source code repos on disk. I treat Linux like a Python repl, rather than build the code, AI is trained to use Linux system calls to replicate the state defined in code.

So far it has inferred how to sys call its way to a working hyprland without any bells and whistles. Still working on bells and whistles and apps like Firefox, Godot, and normalizing the source code base along the way to eliminate redundancy.

For whatever reason my EE brain sees all the semantic gibberish of programming as … well, gibberish. Electrons and matter interactions are good enough for me, and I’d like some freedom from the contrived semantics that often come along.

aaomidi · on Dec 28, 2023

I feel like AI was a crisis for commercial AI.

twosdai · on Dec 28, 2023

Yeah it's hard to predict where the market will go.

It's possible that those forces are enough, but llm adoption at major institutions is slow. Everyone is interested in using chatgpt, but there isn't a clear beat use cases yet, or established paradigm to how it should be used.

akudha · on Dec 29, 2023

Is there a list of models (and instructions) that I can play around with, on my laptop, without sending data to an external API?

patrakov · on Dec 29, 2023

Instructions:

1. Install llama.cpp from https://github.com/ggerganov/llama.cpp. Alternatively, install https://github.com/oobabooga/text-generation-webui 2. Go to https://huggingface.co/TheBloke and search for GGUF. Download and put the model file in the same directory. Then find the "example llama.cpp command line" and run without the "-ngl 35" switch.

However, at this point, if your laptop has at least 32 GB of RAM, there is no point in trying anything except Mixtral 8x7b and its fine-tunes. It is fast (4 tokens per second on an 8-core Ryzen without any GPU acceleration - which would not work on integrated Ryzen APUs anyway because they don't have dedicated VRAM) and provides answers only slightly worse than ChatGPT 3.5. Its main deficiency is the tendency to forget the initial instruction - for example, when asked to explain a particular SAMBA configuration file, it started OK, but then continued mentioning directives that were not in the file under discussion.

akudha · on Dec 29, 2023

Thank You. One more question - can this be used at work, internally? I work for a non-profit, not sure if that makes a difference

viraptor · on Dec 29, 2023

From the work side, nobody can answer you apart from your work legal dept / manager.

From the licence side... read the licence and find out.

mdrzn · on Dec 28, 2023

Whoa, I don't understand enough to figure out if this is real and scalable or not, but if this is true it's a HUGE step forward. Can't wait to try and run a 70b LLM on my 32GB RAM desktop w/ Windows.

erikaww · on Dec 28, 2023

This has mixtral support! Can't wait to see the next wave of local MOE models. Perhaps cheap fast and local GPT-4 performance is not too far off.

ceeam · on Dec 28, 2023

At what SSD wear rate?

great_psy · on Dec 28, 2023

I did not dig too deep in the technicalities of it, but is there anything that would stop openAI from also implementing something like this ?

Presumably any advances open source community makes towards running on cheap hardware, will also massively benefit the big guys.