Hacker Newsnew | past | comments | ask | show | jobs | submit | didgeoridoo's commentslogin

I’m partial to jina.ai — they have open models for code and prose, all easily runnable locally.


Supercritical CO2 extraction is pretty innocuous. Just buy good decaf from a place that doesn’t bathe their beans in toxic waste.


Right? All high quality coffee makers use a proper method so there is absolutely zero downside in decaf. Just make sure to check which method they use (all big ones state it on their website or else)


Good to know. Any recommendations where to find this?


Many coffee distribution sites (like drink trade dot com) tell you the process. I’m a fan of the Swiss water method.


I’m building something similar with https://github.com/LabLeaks/special (apologies for the desultory slop-laden README, need to give that a lot more human attention) but I’ve gone in a slightly different direction: a “spec” is a product contract claim supported by attached tests that verify it. It’s a little Cucumber-y, if anyone remembers that, but a lot more flexible — you just write stuff like

  @spec LINT_COMMAND.ORPHAN_VERIFIES

  linter reports blocks that do not attach to a supported owned item.
Then

  #[test]
  // @verifies SPECIAL.LINT_COMMAND.ORPHAN_VERIFIES

  fn rejects_orphan_verifies_blocks() {
    let block = block_with_path("src/example.rs", &["@verifies EXPORT.ORPHAN"]);

    let parsed = parse_current(&block);

    assert!(parsed.verifies.is_empty());
    assert_eq!(parsed.diagnostics.len(), 1);
    assert!(
        parsed.diagnostics[0]
            .message
            .contains("@verifies must attach to the next supported item")
    );
}

And then the CLI command “special specs” pulls your specs and all attached verification + test code so you (or your LLM) to analyze whether the (hopefully passing!) test actually supports the product claim.

There’s also a bunch of other code quality commands and source annotations in there for architectural design & analysis, fuzzy-checking for DRY opportunities, and general codebase health. But on the overall principle, this article is dead-on: when developing with LLMs, your source of truth should be in your code, or at least co-located with it.


Amazingly, there is already a recognized verb tense for this: https://en.wikipedia.org/wiki/Prophetic_perfect_tense


There is no evidence of this. Evals are quite different from "self-evals". The only robust way of determining if LLM instructions are "good" is to run them through the intended model lots of times and see if you consistently get the result you want. Asking the model if the instructions are good shows a very deep misunderstanding of how LLMs work.


You're misunderstanding my assertion.

When you give prompt P to model M, when your goal is for the model to actually execute those instructions, the model will be in state S.

When you give the same prompt to the same model, when your goal is for the model to introspect on those instructions, the model is still in state S. It's the exact same input, and therefore the exact same model state as the starting point.

Introspection-mode state only diverges from execution-mode state at the point at which you subsequently give it an introspection command.

At that point, asking the model to e.g. note any ambiguities about the task at hand is exactly equivalent to asking it to evaluate any input, and there is overwhelming evidence that frontier models do this very well, and have for some time.

Asking the model, while it's in state S, to introspect and surface any points of confusion or ambiguities it's experiencing about what it's being asked to do, is an extremely valuable part of the prompt engineering toolkit.

I didn't, and don't, assert that "asking the model if the instructions are good" is a replacement for evals – that's a strawman argument you seem to be constructing on your own and misattributing to me.


    At that point, asking the model to e.g. note any ambiguities about the task at hand is exactly equivalent to asking it to evaluate any input
This point is load-bearing for your position, and it is completely wrong.

Prompt P at state S leads to a new state SP'. The "common jumping off point" you describe is effectively useless, because we instantly diverge from it by using different prompts.

And even if it weren't useless for that reason, LLMs don't "query" their "state" in the way that humans reflect on their state of mind.

The idea that hallucinations are somehow less likely because you're asking meta-questions about LLM output is completely without basis


> The idea that hallucinations are somehow less likely because you're asking meta-questions about LLM output is completely without basis

Not sure who you're replying to here – this is not a claim I made.


That's fair, but I'm not sure why you chose to address the one part of my comment that isn't responsive to your points.


Nicely put. I haven't seen anyone say that the introspection abilities of LLMs are up to much, but claiming that it's completely impossible to get a glimpse behind the curtain is untrue.


Is that based on your "deep understanding" of how LLMs work or have you actually tried it? If you watch the execution trace of a Skill in action, you can see that it's doing exactly this inspection when the skill runs - how could it possibly work any other way?

Skills are just textual instructions, LLMs are perfectly capable of spotting inconsistencies, gaps and contradictions in them. Is that sufficient to create a good skill? No, of course not, you need to actually test them. To use an analogy, asking a LLM to critique a skill is like running lint on C code first to pick up egregious problems, running testcases is vital.


> you can see that it's doing exactly this inspection when the skill runs

I mean how do you know what does it exactly do? Because of the text it outputs?


"exactly this inspection" != "what does it exactly do"


Please read your own sentence again. Because you litterally said the opposite.


I'd tell you to read it again, but you seem to be struggling.


Did I write this: "you can see that it's doing exactly this inspection when the skill runs" ?

So, yeah - read what you wrote again.


Running some quick analysis against my .claude jsonl files, comparing the last 7 days against the prior 21:

- expletives per message: 2.1x

- messages with expletives: 2.2x

- expletives per word: 4.4x(!)

- messages >50% ALL CAPS: 2.5x

Either the model has degraded, or my patience has.


Lol. I was swearing at GPT in summer 2025, but GPT has definitely gotten both smarter and less arrogant since then.


gpt is actually so much more thorough now than opus!


> expletives per word

Huh?


4.4 expletives per word is insane. Their prompts must look like

** ** ** ** implement ** ** ** ** no ** ** ** ** ** mistakes


Haha no that’s change - 4.4x MORE expletives per word in the last week.


Jeez, how fast we get used to alien tech.

You could introduce teleportation boots to humanity and within a few weeks we'd be complaining that sometimes we still have to walk the last 20 meters.


And that’s how the teleporting rascal scooter takes over the world.


There are indeed non-expletive words that can contribute to the denominator, though I use them less and less these days.


Thousands of words to say:

- cosmic voids are real but not actually empty, just lower density than average

- pop science articles sometimes mistakenly use pictures of Bok globules to represent voids

This is probably one of the lowest signal-to-noise ratios I have ever seen.


Contrary to popular opinion, the "verbosity void" of the article is not a hopeless expanse without any useful information at all; it just has a much lower density than average.

nyuk nyuk

Anyhow it barely touched on dark matter... Like, are the voids themselves where the dark matter is, or is it spread out, um, orthogonally?


bigthink just made my list of places to avoid wasting time


I found the article quite interesting and informative. Your tl;dr does not accurately describe it.


Along these lines, I’m working on a tool called Spotless[0] that takes a more HTTP proxy-based approach to make statefulness something the agent doesn’t have to worry about. It directly reads & writes to the messages array going to and from Anthropic, so you don’t have to rely on the agent calling an MCP or using a skill. Still buggy and early, but it’s definitely a very interesting way of working with agents.

https://github.com/LabLeaks/spotless


There’s also the excellent Coding Machines (2009): https://www.teamten.com/lawrence/writings/coding-machines/


That and Brooks’ underrated “The Design of Design” are notable for having an almost impossible density of quotable aphorisms on every page. They’re all so relevant today that it’s hard to believe that he’s talking about problems he faced half a century ago.


Never heard of "The Design of Design" but I bought it off this comment chain.

I think our industry would do a lot to take a moment and breath to understand what we have collectively done since inception. Wonder often if we will look at the highly corporatized influence our industry has had during our time as the dark ages 1000s of years into the future. The idea that private enterprise should shape the direction of our industry is deeply problematic, there needs to be public option and I doubt many devs would disagree.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: