Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is anyone using (local) LLMs to directly search for (by scanning over) relevant materials from a corpus rather than relying on vector search?


Generally this fails.

Most LLMs lose the ability to track facts over about 20k words of content, the best can manage maybe 40k words.

Look for "needle" benchmark tests, as in needle-in-haystack.

Not to mention the memory requirements of such a huge context like 128k or 1M tokens. Only people with enterprise servers at home could run that locally.


Good take on that. I still think a q8 32B model with a 200k context would fit into the 48Gb VRAM of one of those modded RTX 4090.


What about scanning over chunks of data to collect matches iteratively - that’s what I meant rather than loading full context limits


Very good answer. It is very hard with small LLM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: