1. We're only barely getting started with free MOE models and Mistral has already impressed.
2. Cloud AI is a poor fit for corporate use at least in EU due to GDPR, the NIS directive and more. You really dont want to exit the EU in where the data processing takes place.
3. There are indications of diminishing returns in LLM performance, where a one year later shot at it from Google, despite massive resources in terms of both experts and data set, still doesn't have Gemini Pro clearly surpass GPT 3.5 and Ultra probably not GPT 4. Meanwhile, competition like Mistral is closing in.
4. The NY Times lawsuit seems like a harbinger for what is to become a theme for AI companies in 2024. Open collaborations are harder to target as legal entities and there is not nearly as much money to gain if you win.
All this points toward a a) convergence of performance that will b) be in the benefit of open models to me.
Interesting times anyway especially as we are STILL only getting started.
Am iterating on a Linux distribution that boots to an LLM, and has technical, scientific documentation and source code repos on disk. I treat Linux like a Python repl, rather than build the code, AI is trained to use Linux system calls to replicate the state defined in code.
So far it has inferred how to sys call its way to a working hyprland without any bells and whistles. Still working on bells and whistles and apps like Firefox, Godot, and normalizing the source code base along the way to eliminate redundancy.
For whatever reason my EE brain sees all the semantic gibberish of programming as … well, gibberish. Electrons and matter interactions are good enough for me, and I’d like some freedom from the contrived semantics that often come along.
Yeah it's hard to predict where the market will go.
It's possible that those forces are enough, but llm adoption at major institutions is slow. Everyone is interested in using chatgpt, but there isn't a clear beat use cases yet, or established paradigm to how it should be used.
However, at this point, if your laptop has at least 32 GB of RAM, there is no point in trying anything except Mixtral 8x7b and its fine-tunes. It is fast (4 tokens per second on an 8-core Ryzen without any GPU acceleration - which would not work on integrated Ryzen APUs anyway because they don't have dedicated VRAM) and provides answers only slightly worse than ChatGPT 3.5. Its main deficiency is the tendency to forget the initial instruction - for example, when asked to explain a particular SAMBA configuration file, it started OK, but then continued mentioning directives that were not in the file under discussion.
Whoa, I don't understand enough to figure out if this is real and scalable or not, but if this is true it's a HUGE step forward. Can't wait to try and run a 70b LLM on my 32GB RAM desktop w/ Windows.
https://www.kaggle.com/code/simjeg/platypus2-70b-with-wikipe...
The notebook might be an easier read than the repo, but I haven't read either yet.
EDIT: It's very slow, according to the comments in this thread by people who tried it: https://github.com/oobabooga/text-generation-webui/issues/47...