> This will reduce token size, performance & operational costs.
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.
That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.
The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard
You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.
Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context
How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.