Is anyone using (local) LLMs to directly search for (by scanning over) relevant ...

suprjami · 2025-03-11T12:15:58 1741695358

Generally this fails.

Most LLMs lose the ability to track facts over about 20k words of content, the best can manage maybe 40k words.

Look for "needle" benchmark tests, as in needle-in-haystack.

Not to mention the memory requirements of such a huge context like 128k or 1M tokens. Only people with enterprise servers at home could run that locally.

HashedViking · 2025-03-17T09:12:01 1742202721

Good take on that. I still think a q8 32B model with a 200k context would fit into the 48Gb VRAM of one of those modded RTX 4090.

wahnfrieden · 2025-03-12T00:27:58 1741739278

What about scanning over chunks of data to collect matches iteratively - that’s what I meant rather than loading full context limits

learningcircuit · 2025-03-11T12:50:20 1741697420

Very good answer. It is very hard with small LLM.