More

faxmeyourcode · 2026-03-23T18:52:18 1774291938

Wow, I didn't realize some RFID could reach 15 feet out - that's good to know. I naively thought you essentially had to be touching the surface of the tag.

faxmeyourcode · 2026-03-04T17:14:20 1772644460

Especially for super constrained applications. I don't care if the language model that I use for my extremely specific business domain can solve PhD math or remember the works of Shakespeare. I'd trade all of that for pure task specific accuracy.

arkmm · 2026-03-04T19:02:03 1772650923

Can you share more details about your use case? The good applications of fine tuning are usually pretty niche, which tends to make people feel like others might not be interested in hearing the details.

As a result it's really hard to read about real-world use cases online. I think a lot of people would love to hear more details - at least I know I would!

faxmeyourcode · 2026-03-10T15:15:44 1773155744

If you treat LLMs as generic transformers, you can fine tune with a ton of examples of input output pairs. For messy input data with lots of examples already built, this is ideal.

At my day job we have experimented with fine tuned transformers for our receipt processing workflow. We take images of receipts, run them through OCR (this step might not even be necessary, but we do it at scale already anyways), and then take the OCR output text blobs and "transform" them into structured receipts with retailer, details like zip code, transaction timestamps, line items, sales taxes, sales, etc.

I trained a small LLM (mistral-7b) via SFT with 1000 (maybe 10,000? I don't remember) examples from receipts in our database from 2019. When I tested the model on receipts from 2020 it hit something like 98% accuracy.

The key that made this work so well is that we had a ton of data (potentially billions of example input/output pairs) and we could easily evaluate the correctness by unpacking the json output and comparing with our source tables.

Note that this isn't running in production, it was an experiment. There are edge cases I didn't consider, and there's a lot more to it in terms of accurately evaling, when to re-train, dealing with net new receipt types, retailers, new languages (we're doing global expansion RN so it's top of mind), general diversity of edge cases in your training data, etc.

faxmeyourcode · 2026-03-04T17:06:32 1772643992

Labeling or categorization tasks like this are the bread and butter of small fine tuned models. Especially if you need outputs in a specific json format or whatever.

I did an experiment where I did very simple SFT on Mistral 7b and it was extremely good at converting receipt images into structured json outputs and I only used 1,000 examples. The difficulty is trying to get a diverse enough set of examples, evaling, etc.

If you have great data with simple input output pairs, you should really give it a shot.

faxmeyourcode · 2026-03-03T16:56:56 1772557016

> Filip also told me that he asked Claude to continue on the even case after the odd case had been resolved. “But there after a while it seemed to get stuck. In the end, it was not even able to write and run explore programs correctly anymore, very weird. So I stopped the search.”

Interesting snippet towards the end. I wonder if they were using claude.ai or claude code. Sounds like they ran out of context and entered the "dumb zone."

afspear · 2026-03-03T17:20:22 1772558422

What would be super cool is if this dumb zone could be quantified and surfaced to the user. I've noticed that copilot now has a little circle graph that indicates context use percentage and it changes color based on percentage. I'll bet these are very naive metrics on used tokens vs context availability. I wonder if there could be meta data streamed or sent along with the tokens that could show that you've entered the dumb zone.

pcloadlett3r · 2026-03-04T13:27:53 1772630873

In another part he says Filip restarted Claude many times so it seems they are aware of context polution and ways to avoid it (also why they kept telling Claude to write everything to a file). It could just be that Claude was caught between a rock and a hard place; dissapointing the user vs solving a problem it couldn't solve.

joshrw · 2026-03-03T19:27:06 1772566026

Then it needs to do context compacting, otherwise the results become garbage

simianwords · 2026-03-03T18:26:23 1772562383

They mentioned plan document

brcmthrowaway · 2026-03-03T21:05:11 1772571911

What is dumb zone?

kami23 · 2026-03-03T23:00:59 1772578859

When the LLMs start compacting they summarize the conversation up to that point using various techniques. Overall a lot of maybe finer points of the work goes missing and can only be retrieved by the LLM being told to search for it explicitly in old logs.

Once you compact, you've thrown away a lot of relevant tokens from your problem solving and they do become significantly dumber as a result. If I see a compaction coming soon I ask it to write a letter to its future self, and then start a new session by having it read the letter.

There are some days where I let the same session compact 4-5 times and just use the letter to future self method to keep it going with enough context because resetting context also resets my brain :)

If you're ever curious in Claude once you compact you can read the new initial prompt after compaction and see how severe it gets cut down. It's very informative of what it forgets and deems not important. For example I have some internal CLIs that are horribly documented so Claude has to try a few flags a few times to figure out specifics and those corrections always get thrown away and it has to relearn them next time it wants to use the CLI. If you notice things like that happening constantly, my move is to codify those things into my CLAUDE.md or lately I've been making a small script or MCP server to run very specific flags of stuff.

discardable_dan · 2026-03-04T11:51:19 1772625079

Shouldn't compaction be exactly that letter to its future self?

kami23 · 2026-03-04T23:06:23 1772665583

Look at the compaction prompt yourself. It's in my opinion way too short. (I'm running on Opus 4.5 most of the time at work)

From what my colleague explained to me and I haven't 100% verified it myself is that the beginning and end of the window is the most important to the compaction summary so a lot of the finer details and debugging that will slow down the next session get dropped.

kqr · 2026-03-04T08:51:08 1772614268

What prompt do you use for the letter-to-self? I've been trying that technique myself to manually reset context without losing the important parts (e.g. when it has barked up the wrong tree and I'm sensing that misstep might influence its current generation in a pathological way), but I've not had much success.

SatvikBeri · 2026-03-04T23:17:24 1772666244

It tends to be pretty manual. I mention the goal of the next session, the current stage of progress, the tests for the next steps, and any skills I want it to load next time.

Having a specific goal seems to make a big difference vs. asking it to summarize the session.

kami23 · 2026-03-04T23:04:53 1772665493

If the session was something where it struggled and had to do multiple attempts I have it write about 'gotchas' or anything it had to attempt multiple times.

The letters are usually more detailed than what I see in the compacted prompt.

ulrikrasmussen · 2026-03-04T03:52:30 1772596350

So you use the letter to itself in addition to the compacted context? I am curious what you ask it to include in the letter and how it is different from a custom instruction passed to /compact?

LPisGood · 2026-03-04T00:06:24 1772582784

> I ask it to write a letter to its future self, and then start a new session by having it read the letter

Is that not one kf the primary technologies for compactification?

kami23 · 2026-03-04T22:19:38 1772662778

You should do your own experiment when you see compaction about to start use the end of your window to have it write one first, and then let the session compact and compare. I was surprised by how small the compact message is.

When I tell it to write a letter to itself I usually phrase it.

'write a letter to yourself Make notes of any gotchas or any quirks that you learned and make sure to note them down.'

It does get those into the letter but if you check compaction a lot of it is gone.

fourthark · 2026-03-04T03:07:44 1772593664

I think the point is that you have a better idea of what you want it to remember and even a small hint can have big impact.

Just saying "write up what you know", with no other clues, should not perform any better than generic compaction.

faxmeyourcode · 2026-02-26T19:16:55 1772133415

Yea, I agree. The dataset is < 100MB... so duckdb can very easily handle this on an old macbook air.

https://duckdb.org/2025/05/19/the-lost-decade-of-small-data

faxmeyourcode · 2026-02-26T04:20:24 1772079624

Neel, this is really cool. How long have you been working on this, and where did you guys get inspiration from? Did you work on vlms earlier or something like that? Just curious.

Also, thanks for choosing a technical blog post for presenting this information.

nee1r · 2026-02-26T06:42:35 1772088155

thanks! got a lot of inspiration from VPT https://arxiv.org/abs/2206.11795 is a great paper, would recommend a read

we all have various backgrounds, me particularly i did a lot of material science x ai research and just fundamental architecture research before

faxmeyourcode · 2026-02-09T15:43:42 1770651822

I love the idea behind MyVisualRoutine as a father with a disabled kiddo, thanks for sharing.

The app is beautiful - much better than I could build - what tech is it using if you don't mind me asking? Is it flutter, react native, something else? Just want to get better at mobile dev.

tskulbru · 2026-02-09T16:41:13 1770655273

Thank you so much! Then you know my pain. All apps i found were either shit, real shit, or didnt solve my personal need. Hopefully this might help others the way it did for our family. The app is built using flutter because im going to release it on Android too very soon, and I couldnt be bothered creating it twice. Initially the idea was more of a game-like-app, and then it made sense to use flutter. Now though, its not really game-like and couldve just as well been native (apart for me not bothering doing it twice). If i wouldve done it again from scratch, for this app, i would still have chosen flutter.

faxmeyourcode · 2026-02-09T15:00:06 1770649206

Love this. An example of complete and total dominion over the machine. Great quote here too lol

> Prometheus stole fire from the gods and gave it to man. For this he was chained to a rock and tortured for eternity.

tosti · 2026-02-10T05:46:24 1770702384

The next step would be to embed a JavaScript engine in coreboot I guess

falcor84 · 2026-02-09T21:28:18 1770672498

Talking about quotes, I also absolutely loved this note at the end of the readme:

> If this makes you grin, you are probably holding the torch.

faxmeyourcode · 2026-02-06T14:21:58 1770387718

Everybody is different, I simply cannot stand the sight of chatgpt styled writing. Give me paragraphs.

faxmeyourcode · 2026-01-29T05:40:24 1769665224

I've lobbied to replace our internal tool with a django admin panel. I prototyped it and it showed that it would reduce our code by > 15k lines.

Any internal webapps I need to build like this will 100% be set up with django in the future due to this. I don't need it to be pretty, I just want the UI, database migrations, users, roles, groups, etc for free

stuaxo · 2026-01-30T09:44:55 1769766295

It's so easy to add bits to the admin as you need them.

I wish building the frontend of a Django app was as easy as the admin (though Wagtail can get close and HTMX).