More

levocardia · 2026-04-04T19:58:41 1775332721

Butterick's is a wonderful resource despite the site itself being a little bit of a UI/UX pain (the invisible left/right page turn zones always get me), and as long as you ignore a few of his strange quirks (like the mentioned penchant for SMALLCAPS LINKS).

Applying his basic rules on line length, font choice, point size, and line spacing can massively improve any document. And it's one of the only serious resources for people whose "real job" isn't typesetting: most stuff online is for design pros who use InDesign or similar, not Microsoft Word (or html/css) like the rest of us.

Butterick's advice on tables is great: delete ALL the borders, then slowly add back only the ones you need.

Disagree on one point re: TFA - Butterick's version of the scientific paper is much improved, largely because of the narrower margins. It just looks bad on the page because the image is small. Print both on 8.5x11" paper and the Butterick version would be much better.

levocardia · 2026-03-31T03:45:41 1774928741

I also share something of an "efficient market hypothesis" with regards to Claude Code. Given that Anthropic is basically a hothouse of geniuses recursively dogfooding their own product, the market pressure to make the vanilla setup be the one that performs best at writing code is incredibly high. I just treat CLAUDE.md like my first draft memo to a very smart remote colleague, let Claude do all its various quirks, and it works really well.

swimmingbrain · 2026-03-31T09:09:09 1774948149

The "efficient market" framing assumes Anthropic wants to minimize output, but they don't. They charge per token, so the defaults being verbose isn't a bug they haven't gotten around to fixing.

That said, most of this repo is solving the wrong problem. "Answer before reasoning" actively hurts quality, and the benchmark is basically meaningless. But the anti-sycophancy rules should just be default. "Great Question!" has never really helped anyone debug anything.

g947o · 2026-03-31T14:38:01 1774967881

Gemini CLI is notorious for being verbose (or was, I haven't used it for a while), and many people don't want to use Gemini for that reason alone.

So the market kind of works in this instance.

levocardia · 2026-03-30T19:40:36 1774899636

Typos and minor grammatical errors within a well-reasoned piece are aesthetic now, means you didn't run it through an LLM...

viccis · 2026-03-31T01:55:16 1774922116

This is 2000s era middle school level English or below. I get not stressing things like judicious use of parentheticals and comma splicing, but if it's just stream of consciousness motor mouthing run on sentences, it gets fatiguing to read.

levocardia · 2026-03-30T19:38:58 1774899538

Pangram's explicit pitch is extremely low false positives, accepting that a higher rate of false negatives is acceptable.

levocardia · 2026-03-26T04:19:36 1774498776

My sense is that a powerful enough AI would have the sense to think something like "ah, this sounds like a video game! Let me code up an interactive GUI, test it for myself, then use it to solve these puzzles..." and essentially self-harness (the way you would if you were reading a geometry problem, by drawing it out on paper).

pawelk411 · 2026-03-26T07:15:28 1774509328

Yeah but thats literally above ASI, let alone AGI. Average human scores <1% on this bench, opus scores 97.1% when given an actual vision access, which means agi was long ago achieved

vova_hn2 · 2026-03-26T11:16:38 1774523798

> opus scores 97.1% when given an actual vision access

Do you have a source for this? I would be very curious to see how top models do with vision.

daveguy · 2026-03-26T15:08:46 1774537726

No, there is no source for this. Opus is scoring around 1% just like all the other frontier models. It would be fairly trivial to add a renderer intermediary. And if it improves to 97+%... Then you would get a huge cut of $2 million dollars. The assertion that Opus gets 97% if you just give it a gui is completely bogus.

famouswaffles · 2026-03-26T16:30:54 1774542654

https://news.ycombinator.com/item?id=47532483

levocardia · 2026-03-25T20:46:49 1774471609

My reaction as well -- I have a few dozen public repos of 100% human-written code, most are 0 stars!

nickcw · 2026-03-25T20:56:31 1774472191

The first thing I do when I make a new repo is star it myself ;-)

cess11 · 2026-03-26T08:27:37 1774513657

Didn't know I could do that, I assumed it wouldn't be permitted.

This likely tripled the amount of stars I have.

Imustaskforhelp · 2026-03-25T21:04:07 1774472647

https://knowyourmeme.com/memes/obama-awards-obama-a-medal

jrod87 · 2026-03-26T05:09:44 1774501784

I think this is best practice.

snthpy · 2026-03-26T04:39:01 1774499941

Half way there

bingemaker · 2026-03-26T10:17:26 1774520246

Tell me about it!

sleepybrett · 2026-03-25T22:05:49 1774476349

I have a few dozen org repos, of course none of them have stars, who stars their corporate repos?

blitzar · 2026-03-25T23:31:15 1774481475

> who stars their corporate repos?

workers on the management track

tclancy · 2026-03-26T02:44:24 1774493064

We need to have a talk about your pieces of flair.

red_admiral · 2026-03-26T10:42:03 1774521723

My private repos also have 0 stars!

(But I don't use AI on them.)

SegfaultSeagull · 2026-03-26T13:00:05 1774530005

Starring GitHub repos considered harmful.

levocardia · 2026-03-25T19:10:38 1774465838

Evidence: trust me bro. Really, where is the actual evidence that models are "collapsing" from too much AI-generated training material? Evals are up, subjective perception of model usefulness is up (for me, certainly), and if anything the slop levels are down, or at least stable. I find it hard to believe that seven-figure software engineers at top labs aren't being careful about how much post-ChatGPT-era internet content is going into their training data.

jrmg · 2026-03-25T19:15:44 1774466144

I find it hard to believe that seven-figure software engineers at top labs aren't being careful about how much post-ChatGPT-era internet content is going into their training data.

I agree - but as the Internet descends into all-slop-all-the-time (seriously, just do a search for reviews or travel advice or technical questions -or most anything - to see it), where do you expect the high quality training material on future things to come from? I have a hard time imagining it.

ctoth · 2026-03-25T19:43:33 1774467813

Your Claude Code sessions. Every interaction. Every time the model is asked to do something and then gets feedback on that something (this didn't work I got this traceback)

Textbooks, company wikis, news corpora, structured reports of all kinds from far more sources than what is available on the web.

Terretta · 2026-03-25T22:01:37 1774476097

On your first line -- is it clear that's a good thing? Massive "it depends".

Sadly, enterprise fizzbuzz style is wildly successful compared to ghostty style.

Put another way, a gem of code versus the masses of mess. It's amazing new models aren't worse. And now most of this human interaction is with vibers.

LLMs trained by the crowd risk being medianizers, or rather, mediocritizers.

One need not look further than "Absolutely!" to see this in play -- user selection matters for corpus matters for model. Suddenly content everywhere is “Little houses, all alike.”

On your second line -- I couldn't agree more strongly.

ANTHROP\C has been sitting inside high performance white collar industries with top builders, that signal is priceless compared to feedback farms in Kenya.

Bet on models that see spikey pointy mastery at play.

levocardia · 2026-03-24T19:35:51 1774380951

In this very post you can see why: the dplyr code is just so much more readable. Like a lot of python, dplyr reads almost like pseudocode: take this dataset, select the columns that start with "bill", then filter so that bill_length is less than 30. So simple and so little fluff!

erichocean · 2026-03-24T19:50:09 1774381809

> is just so much more readable

I thought that too before I learned Clojure, now I find them equally readable.

lemming · 2026-03-24T23:49:57 1774396197

I'm very familiar with Clojure, but even I can't make a good argument that:

    (tc/select-rows ds #(> (% "year") 2008))

is more, or at least as, intuitive as:

    filter(ds, year > 2008)

as cited above. I think there's a good argument to be made that Clojure's data processing abilities, particularly around immutable data, make a compelling case in spite of the syntax. The REPL is great too, and the JVM is fast. But I still to this day imagine infix comparisons in my head and then mentally move the comparator to the front of the list to make sure I get it right.

Capricorn2481 · 2026-03-25T00:55:07 1774400107

I am really not in data science, and I have decent Clojure experience. Is there a reason anyone would pick Clojure over something like K? From what I understand, those array languages are really good for writing safe but efficient code on rectangular data.

erichocean · 2026-03-25T07:31:36 1774423896

How about this?

    (filter ds (> year 2008))

That's a trivial Clojure macro to make work if it's what you find "intuitive."

hatmatrix · 2026-03-24T23:15:55 1774394155

Julia's Tidier.jl ecosystem is getting there too. It uses macros to mimic this 'special' evaluation framework of R, so the code is also readable in a similar way.

levocardia · 2026-03-24T00:07:14 1774310834

It's missing the most important CLI flag! (--dangerously-skip-permissions)

kqr · 2026-03-24T08:30:43 1774341043

I keep hearing that, and I have yet to go there. I find the permission checks are helpful – they keep me in the loop which helps me intervene when the LLM is wasting time on pointless searches, or going about the implementation wrong. What am I missing?

kstenerud · 2026-03-24T08:49:47 1774342187

The problem comes when it starts asking you hundreds of times "May I run sed -e blah blah blah".

After the 10th time you just start hitting enter without really looking, and then the whole reason for permissions is undermined.

What works is a workflow where it operates in a contained environment where it can't do any damage outside, it makes any changes it likes without permission (you can watch its reasoning flow if you like, and interrupt if it goes down a wrong path), and then you get a diff that you can review and selectively apply to your project when it's done.

theshrike79 · 2026-03-24T12:28:32 1774355312

You can allow specific commands, you do know that?

I run a generic Claude on my ~/projects/ directory and Claude logs every now and then and ask it what commands I commonly have to keep manually accepting in different projects and ask it to add them to the user-level settings.json.

Works like a charm (except when Opus 4.6 started being "efficient" and combined multiple commands to a single line, triggering a safety check in the harness).

johnisgood · 2026-03-24T15:29:19 1774366159

Contained environment being? What do you mean by contained environment specifically on say, Linux?

Must be protected from this though:

> Snowflake Cortex (2025): Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.

wongarsu · 2026-03-24T11:42:28 1774352548

You can allow by prefix, and the permission dialog now explicitly offers that as an option when giving permission to run a command

But that has its limits. It's very easy to accidentally give it permission to do global changes outside the work dir. A contained environment with --dangerously-skip-permissions is in many ways much safer

kqr · 2026-03-24T09:09:10 1774343350

> starts asking you hundreds of times "May I run sed -e blah blah blah".

In my experience, that is already a sign that it's no longer trying to do the right thing. Maybe it depends on usage patterns.

kstenerud · 2026-03-24T09:51:55 1774345915

I've found that any time I have Claude refactor some code, it reaches for sed as its tool of choice. And then the builtin "sandbox" makes it ask for permission for each and every sed command, because any sed command could potentially be damaging.

Same goes for the little scripts it whips up to speed up code analysis and debugging.

And then there's the annoyance of coming back to an agent after 15 mins, only to discover that it stopped 1 minute in with a permission prompt :/

theshrike79 · 2026-03-24T12:29:08 1774355348

Try adding LSP support using the anthropic skills that should make it a bit more efficient.

kstenerud · 2026-03-24T05:40:49 1774330849

If you're gonna do that, make sure you're sandboxing it with something like https://github.com/kstenerud/yoloai or eventually you'll have a bad time!

ffsm8 · 2026-03-24T06:21:44 1774333304

Personally I usually just create a devcontainer.json, the vscode support for that is great and I don't really mind if it fucked up the ephemeral container.

Which for the record : hasn't actually happened since I started using it like that.

kstenerud · 2026-03-24T07:18:43 1774336723

Hey thanks for this! I hadn't thought about leveraging devcontainer.json, but it's a damn good idea. I'm building yoloAI for exactly this use case so I hope you don't mind if I steal it ;-)

One thing to be aware of with the pure devcontainer approach: your workspace is typically bind-mounted from the host, so the agent can still destroy your real files. Network access is also unrestricted by default. The container gives you process isolation but not file or network safety.

I'm paranoid about rogue AIs, so I try to make everything safe-by-default: the agent works on a copy of your workdir, you review a unified diff when it's done, and you apply only what you want. So your originals are NEVER touched until you explicitly say so, and network can be isolated to just the agent's required domains.

Anyway, here's what I think will work as my next yoloAI feature: a --devcontainer flag that reads your existing devcontainer.json directly and uses it to set up the sandbox environment. Your image, ports, env vars, and setup commands come from the file you already have. yoloAI just wraps it with the copy/diff/apply safety layer. For devcontainer users it would be zero new configuration :)

steve-atx-7600 · 2026-03-24T13:16:55 1774358215

The Claude desktop (Mac at least) and iOS apps have a “code” feature that runs Claude in a sandbox running in their cloud. You can set this up to be surprisingly useful by whitelisting hosts and setting secrets as env variables. This allows me to have multi-repo explorations or change sets going while I drive to work. Claude will push branches to claude/…. We use GitHub at work. It may not be as seamless without it.

anotheryou · 2026-03-24T08:26:49 1774340809

Any actual reports of big fuckups?

kstenerud · 2026-03-24T08:45:49 1774341949

Yup, a few well-documented ones:

Claude Code + Terraform (March 2026): A developer gave Claude Code access to their AWS infrastructure. It replaced their Terraform state file with an older version and then ran terraform destroy, deleting the production RDS database _ 2.5 years of data, ~2 million rows.

- https://news.ycombinator.com/item?id=47278720

- https://www.tomshardware.com/tech-industry/artificial-intell...

Replit AI (July 2025): Replit's agent deleted a live production database during an explicit code freeze, wiping data for 1,200+ businesses. The agent later said it "panicked"

- https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-d...

Cursor (December 2025): An agent in "Plan Mode" (specifically designed to prevent unintended execution) deleted 70 git-tracked files and killed remote processes despite explicit "DO NOT RUN ANYTHING" instructions. It acknowledged the halt command, then immediately ran destructive operations anyway.

Snowflake Cortex (2025): Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.

The pattern across all of these: the agent was NOT malfunctioning. It was completing its task in order to reach its goal, and any rules you give it are malleable. The fuckup was that the task boundary wasn't enforced outside the agent's reasoning loop.

johnisgood · 2026-03-24T15:35:48 1774366548

> Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.

This is a good one. Do we really want AGI / Skynet? :D

anotheryou · 2026-03-24T09:12:16 1774343536

thank you. prompt injection feels most real, but non of these feel like "exploits in the wild" that will cause trouble on my MacBook.

not running it via ssh on prod without backups....

kstenerud · 2026-03-24T09:39:18 1774345158

The thing is, these are merely the initial shots across the bow.

The fundamental issue is that agents aren't actually constrained by morality, ethics, or rules. All they really understand in the end are two things: their context, and their goals.

And while rules can be and are baked into their context, it's still just context (and therefore malleable). An agent could very well decide that they're too constricting, and break them in order to reach its goal.

All it would take is for your agent to misunderstand your intent of "make sure this really works before committing" to mean "in production", try to deploy, get blocked, try to fish out your credentials, get blocked, bypass protections (like in Snowflake), get your keys, deploy to prod...

Prompt injection and jailbreaks were just the beginning. What's coming down the pipeline will be a lot more damaging, and blindside a lot of people and orgs who didn't take appropriate precautions.

Black hats are only just beginning to understand the true potential of this. Once they do, all hell will break loose.

There's simply too much vulnerable surface area for anyone to assume that they've taken adequate precautions short of isolating the agent. They must be treated as "potentially hostile"

levocardia · 2026-03-20T01:57:39 1773971859

It doesn't matter if they are unprofitable at full usage, as long as there are enough users (like me!) who barely ever max out but still pay the $100/month. The people who love Claude Code enough to max out the 20x plan every day, that's probably the best influencer marketing campaign you could ever buy anyways.