Generally this fails. Most LLMs lose the ability to track facts over about 20k w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		suprjami 11 months ago \| parent \| context \| favorite \| on: Local Deep Research – ArXiv, wiki and other search... Generally this fails. Most LLMs lose the ability to track facts over about 20k words of content, the best can manage maybe 40k words. Look for "needle" benchmark tests, as in needle-in-haystack. Not to mention the memory requirements of such a huge context like 128k or 1M tokens. Only people with enterprise servers at home could run that locally.

HashedViking 11 months ago | [–]

Good take on that. I still think a q8 32B model with a 200k context would fit into the 48Gb VRAM of one of those modded RTX 4090.

wahnfrieden 11 months ago | | [–]

What about scanning over chunks of data to collect matches iteratively - that’s what I meant rather than loading full context limits

learningcircuit 11 months ago | [–]

Very good answer. It is very hard with small LLM.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact