More

arcfour · 2026-04-07T06:17:41 1775542661

> I'm not even sure we're any closer to AGI than we were before LLMs.

I mean this is very obviously untrue. It'd be like saying we aren't any closer to space flight after watching a demonstration of the Wright Flyer. Before 2022-2023 AI could barely write coherent paragraphs; now it can one-shot an entire letter or program or blog post (even if it's full of LLM tropes).

Just because something is overhyped doesn't mean you have to be dismissive of it.

davebren · 2026-04-07T12:34:56 1775565296

Point is that LLMs could be a local minima we are now economically stuck in until the hype wears off.

jcgrillo · 2026-04-07T14:04:56 1775570696

Or we could be stuck here for decades pending a breakthrough nobody alive today can even conceive of, or we could be compute limited by a half dozen orders of magnitude. Or it could happen next week. That's the nature of breakthroughs--you just can't have any idea when or how (or if) they'll happen.

jcgrillo · 2026-04-07T14:15:13 1775571313

In hindsight there's an obvious evolutionary pathway from the Wright Flyer to Gemeni/Apollo/Soyuz.. but at the time in 1903 there absolutely was not, and anyone telling you so would be a crank of the highest degree. So it may turn out that LLMs have some place on the evolutionary path to AGI, or it could turn out they're a dead end like Cayley's ornithopters. Show me AGI first, then we can discuss whether LLMs had something to do with it.

arcfour · 2026-04-07T16:26:14 1775579174

In order to get to space, you must first be capable of flight through the atmosphere. That should be apparent to anyone even then because the atmosphere is in between space and the ground.

Regardless of whether spaceflight is still 1000 or 100 or 50 years away, you are still closer than you were before you demonstrated the ability to fly.

arcfour · 2026-04-01T20:44:55 1775076295

Workerd (the platform for Workers) is open source though? You could run your own? And people do run their own, at least according to Cloudflare.

nulltrace · 2026-04-02T00:06:08 1775088368

Open source runtime, not the orchestration layer on top.

arcfour · 2026-04-02T01:15:34 1775092534

Are you saying that by providing a globally available service, they're engaging in vendor lock-in, because otherwise you'd have to build your own globally available and distributed service...? ...yeah?

But if you wanted to run Workerd on EC2 or Google Cloud or whatever, you could...so not really sure how that applies here.

anon7000 · 2026-04-02T03:34:02 1775100842

You cannot say you’re the spiritual successor to WordPress, if your software doesn’t support running plugins out of the box on arbitrary installs. WordPress is very easy to host and scale, you only need a basic server and a CDN. A spiritual successor would follow that and also have all the new shiny stuff.

arcfour · 2026-03-31T13:47:57 1774964877

You're perfectly free to scrape the web yourself and train your own model. You're not free to let Anthropic do that work for you, because they don't want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.

Stole? Courts have ruled it's transformative, and it very obviously is.

AI doomerism is exhausting, and I don't even use AI that much, it's just annoying to see people who want to find any reason they can to moan.

petcat · 2026-03-31T13:58:53 1774965533

> Stole? Courts have ruled it's transformative, and it very obviously is.

The courts have ruled that AI outputs are not copyrightable. The courts have also ruled that scraping by itself is not illegal, only maybe against a Terms of Service. Therefore, Anthropic, OpenAI, Google, etc. have no legal claim to any proprietary protections of their model outputs.

So we have two things that are true:

1) Anthropic (certainly) violated numerous TOS by scraping all of the internet, not just public content.

2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.

dpark · 2026-03-31T15:41:45 1774971705

> 2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.

Regardless of whether LLM training amounts to theft, thieves are still allowed to put locks on their own doors.

gruez · 2026-03-31T15:32:17 1774971137

>The courts have ruled that AI outputs are not copyrightable.

"not copyrightable" doesn't imply they can't frustrate attempts to scrape data.

petcat · 2026-03-31T15:44:53 1774971893

Nobody is saying they can't try to stop you themselves. That's where the Terms of Service violation part comes in. They can cancel your account, block your IP, etc. They just can't legally stop you by, for instance, compelling a judge to order you to stop.

dpark · 2026-03-31T15:56:10 1774972570

> They just can't legally stop you by, for instance, compelling a judge to order you to stop.

They probably can, actually. TOS are legally binding.

More likely they would block you rather than pursuing legal avenues but they certainly could.

petcat · 2026-03-31T16:16:01 1774973761

The Supreme Court already ruled on this. Scraping public data, or data that you are authorized to access, is not a violation of the Computer Fraud and Abuse Act.

Now, if you try to get around attempts to block your access, then yes you could be in legal trouble. But that's not what is happening here. These are people/companies that have Claude accounts in good standing and are authorized by Anthropic to access the data.

Nobody is saying that Anthropic can't just block them though, and they are certainly trying.

dpark · 2026-03-31T16:30:40 1774974640

I didn’t say anything about the computer fraud and abuse act. TOS are legally binding contracts in their own right if implemented correctly.

alpha_squared · 2026-03-31T15:02:24 1774969344

> You're perfectly free to scrape the web yourself and train your own model.

Actually, not anymore as a result of OpenAI and Anthropic's scraping. For example, Reddit came down hard on access to their APIs as a response to ChatGPT's release and the news that LLMs were built atop of scraping the open web. Most of the web today is not as open as before as a result of scraping for LLM data. So, no, no one is perfectly free to scrape the web anymore because open access is dying.

two_tasty · 2026-03-31T14:54:54 1774968894

"...free to scrape the web yourself and train your own model."

Yes, rich and poor are equally forbidden from sleeping under bridges.

kspacewalk2 · 2026-03-31T15:00:29 1774969229

Meaning what? The poor gets to sleep in the guest room of the rich guy's house because muh inequality?

Anthropic paid a lot of money for a moat and want to guard it. It is not wrong, in any sense of the word, for them to do so.

salawat · 2026-03-31T15:42:08 1774971728

Rich people aren't going to find themselves needing to sleep under a bridge, so the law really only exists as a constraint on the poor. Duh. The flex that "well a rich guy couldn't do it either" is A) at best a myopic misunderstanding perpetuated by out of touch people and B) hopelessly naive, because anny punishment for the rich guy actually sleeping under a bridge is so laughably small it may as well not even exist. Hence, the whole bit of "a legal system to keep these accountable, but not for me".

kspacewalk2 · 2026-03-31T16:30:06 1774974606

Okay, you explained what Anatole France meant, which is probably helpful for those few who didn't get it from the quote itself. Perhaps now you can explain what on earth this has to do with Anthropic not wanting to let other for-profit businesses mooch off its investment of time, brainpower and money?

dpark · 2026-03-31T15:48:20 1774972100

You explained what “rich and poor are equally forbidden from sleeping under bridges” means, but not what this has to do with the statement that one is free to do their own scraping and training, which I’m pretty sure is what kspacewalk was asking.

jtbayly · 2026-03-31T13:55:56 1774965356

Wut?They did exactly the same thing!

Try this: If you want to train a model, you’re free to write your own books and websites to feed into it. You’re not free to let others do that work for you because they don’t want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.

arcfour · 2026-03-31T14:10:15 1774966215

[flagged]

buzzerbetrayed · 2026-03-31T14:24:40 1774967080

[flagged]

jollymonATX · 2026-03-31T15:45:03 1774971903

Yeah these folks skin is often very thin. One poke too hard and it's "whatever" and them scuttling off. Really hope there is a day they introspect.

arcfour · 2026-03-31T17:53:59 1774979639

I introspect all the time. I just disagree with you so I have thin skin? Lol.

I think it's transformative. I also think that it's a net positive for society. I lastly think that using freely available, public information is totally fair game. Piracy not so much, but it's water under the bridge.

I hope you introspect some day, too, and realize it's acceptable for people to have different views than you. That's why I don't care; you aren't going to change my mind and I can't change yours either, so it's moot and I don't care to argue about it further.

jollymonATX · 2026-03-31T19:21:20 1774984880

You had appeared to scuttle off but alas I was wrong (and sorry to imply you are a crab of some sort) however your comment followup on not changing minds might be a tad shell-ish. I'm open minded actually on the issue and these are major issues of our time. I'm personally impacted by this and it does make me wonder "will I write X thing again" and it is a very hard question to answer frankly. When you see your works presented in summary on search and a major decline in traffic you really do think about that. It impacts my ability to make money as I once did prior to 2024 (when it really hit) without doubt. Edit/spelling

airstrike · 2026-03-31T14:13:38 1774966418

Guess who else spent a lot of time and money and secret sauce?

Do you hear the words coming out of your mouth?

nunez · 2026-03-31T14:57:52 1774969072

Lol; like heck we are. Try scraping the NYTimes at LLM scale. You can time how quickly you’ll get 420’ed or, at worst, hit with a C&D.

nunez · 2026-03-31T17:31:30 1774978290

(429'ed, I meant)

unethical_ban · 2026-03-31T14:16:38 1774966598

Let's talk ethics, not law. Why is it okay for these companies to pirate books and scrape the entire web and offer synthesized summaries of all of it, lowering traffic and revenue for countless websites and professions of experts, but it is not okay for others to try to do the same to an AI model?

Is the work of others less valid than the work of a model?

sfn42 · 2026-03-31T14:49:19 1774968559

I don't think anyone's saying it's not okay - I think the point is that Anthropic has every right to create safeguards against it if they want to - just like the people publishing other information are free to do the same.

And everyone is free to consume all the free information.

gruez · 2026-03-31T15:48:07 1774972087

>Why is it okay for these companies to pirate books

Courts have ruled it's not, and I don't think anyone is arguing it's okay.

>but it is not okay for others to try to do the same to an AI model?

The steelman version is that it's okay to do it once you acquired the data somehow, but that doesn't mean anthropic can't set up roadblocks to frustrate you.

p1esk · 2026-03-31T14:38:41 1774967921

I don’t see why it’s not ok to do that to an AI model. Or are you asking why they don’t want you to do it?

andersonpico · 2026-03-31T15:45:39 1774971939

Your selective respect for work is a glaring double standard. The effort to produce the original content they scraped is order of magnitudes bigger than what it took to train the model, so if this wasn't enough to protect the authors from Anthropic it shouldn't be enough to protected Anthropic from people distillating their models.

Your legal argument is all over the place as well. What is more relevant here: what the courts ruled or what you consider obvious? How is distillation less transformative than scraping? How does courts ruling that scraping to train models is legal relate to distillation?

Nobody is scoring you on neutrality points for not using AI much and calling this doomerism is just a thought-terminating cliche that refuses to engage with the comment you're replying.

In fact, your comment is not engaging with anything at all, you're vaguely gesturing towards potentitial arguments without making them. If you find discussing this exhausting then don't but also don't flood the comments with low effort whining.

hax0ron3 · 2026-03-31T21:01:56 1774990916

It is transformative, but if I make a bunch of requests to their API and use the responses to distill my own model, that is also transformative.

loremium · 2026-03-31T16:14:31 1774973671

reminds me of `don't look up` a bit. there clearly is an imbalance in regards to licenses with model providers, not even talking about knowledge extraction (yes younger people don't learn properly now, older generations forget) shortly before the rug-pull happens in form of accessibility to not rich people

arcfour · 2026-03-31T13:42:51 1774964571

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

arcfour · 2026-03-31T05:03:36 1774933416

It prompts for transitive dependencies, too. I have never had workerd as a direct dependency of any project of mine but I get prompted to approve its postinstall script whenever I install cloudflare's wrangler package (since workerd needs to download the appropriate Workers runtime for your platform).

arcfour · 2026-03-31T05:00:27 1774933227

The prompt would be to approve the new malicious package (plain-crypto-js)'s scripts, too, which could tip users off that something was fishy. If they were used to approving one for axios and the attackers had just overwrote axios's own instead of making a new package, it would probably catch people out.

arcfour · 2026-03-31T04:57:34 1774933054

PNPM makes you approve postinstall scripts instead of running them by default, which helps a lot. Whenever I see a prompt to run a postinstall script, unless I know the package normally has one & what it does, I go look it up before approving it.

(Of course I could still get bitten if one of the packages I trust has its postinstall script replaced.)

erikerikson · 2026-03-31T14:25:58 1774967158

How does this stance work with your CICD?

jadar · 2026-03-31T14:33:36 1774967616

I suppose you would have to commit your node_modules, or otherwise cache your setup so that all prerequesite modules are built and ready to install without running post-install scripts?

arcfour · 2026-03-30T01:31:10 1774834270

They do, but the fact that they have to do this means there are fewer bots because it's less economical to go to such lengths, compared to something much less complex (which is orders of magnitude cheaper).

arcfour · 2026-03-30T01:28:19 1774834099

> They exist only if the request passed through Cloudflare's network. A bot making direct requests to the origin server or running behind a non-Cloudflare proxy will produce missing or inconsistent values.

...I don't think that's possible even if you are a bot? I would be very surprised if OAI had their origin exposed to the internet. What is a "non-Cloudflare proxy"? Is this AI slop?

It's likely just looking at the CF properties as part of a bot scoring metric (e.g. many users from this ASN or that geoip to this specific city exhibit abusive patterns).

arcfour · 2026-03-26T04:05:42 1774497942

I'm glad I wasn't alone in finding it ridiculous/annoying. The version in the post isn't even a joke anymore...