> I'd argue back that LLMs likely have a better understanding of a11y convention...

aspenmartin · 2026-05-29T12:21:38 1780057298

> No, other people did. They wrote about it, and LLM can sometimes use that. Once they no longer write about it, what then?

It can read the code? Historical discussions around it? Commit histories?

> But even then, people aren't entitled to the knowledge "created" by doing the work. If attribution and compensation were tackled in earnest, if you could only train on the materials of the people you pay to produce those materials, it might be much quicker and cheaper to just learn CSS.

OSS code and people’s public writings are available to anyone all the time. Common Crawl, the open source web crawl dump, has been around for over a decade. No one had any problem with these systems being developed on them, until they finally started to become useful, so what’s the sort of legal or ethical framework you’re pointing to?

customguy · 2026-05-29T12:37:21 1780058241

> It can read the code? Historical discussions around it? Commit histories?

Assume everybody is now using LLM because they're better, and because the people who created artisanal things in their free time out of sheer generosity no longer have free time, or any food at all, or simply no longer feel generous. And the few people who are such specialists that they would be slowed down by them only do proprietary work, for lots of money.

What then? LLM learning from LLM doesn't really work, does it?

This is not intended as some kind of gotcha, to me this is a huge elephant on the couch.

> No one had any problem with these systems being developed on them, until they finally started to become useful, so what’s the sort of legal or ethical framework you’re pointing to?

That it's perfectly fine for people to say "I was fine with that, but I'm not fine with this". They can give you detailed explanations for their individual decisions, every single one of them, but there is no point in discussing them in aggregate because that aggregate is an abstraction. And they're optional, too, it's not like people have to give an explanation, and aren't simply free to change their mind for no or for bad reasons.

stellar678 · 2026-05-29T21:48:47 1780091327

Presuming that conventions, standards, best-practices will continue to evolve as people do this work - whether with or without AI - I can't see any reason why the resulting advancements wouldn't be published in a way that builds a record for the future.

My documentation tends to be more thorough and well-maintained when I'm building software with a coding agent. A million tokens of context tends to be better at that kind of thing than my own brain when I'm neck-deep in solving a difficult problem.

And presuming the best practices are adequate and don't need to evolve (which seems unlikely) - then the current public historical record is also adequate.

aspenmartin · 2026-05-29T13:00:30 1780059630

> Assume everybody is now using LLM because they're better, and because the people who created artisanal things in their free time out of sheer generosity no longer have free time, or any food at all, or simply no longer feel generous. And the few people who are such specialists that they would be slowed down by them only do proprietary work, for lots of money.

> What then? LLM learning from LLM doesn't really work, does it?

Oh what no that’s exactly how it works, even today. RL with verification is done with synthetic data and rejection sampling. If something can’t get done purely with an agent that needs to get done it’s done with human help, this will always be the case it will just get rare-er.

> That it's perfectly fine for people to say "I was fine with that, but I'm not fine with this".

Agree with you there, but there’s a theme or insinuation (not saying you’re saying this) that these companies “stole work” (which definitely a lot of copyright violations sure), but it’s just unclear to me what principles or legal frameworks these companies or institutions should have used to develop the technology. I don’t really even know whether I mean to imply it’s not unethical, moreso I’m looking for a steel man argument to this. But of course people are entitled to their value systems and judgements and to point out real harm.

customguy · 2026-05-29T13:08:55 1780060135

> there’s a theme or insinuation (not saying you’re saying this) that these companies “stole work” (which definitely a lot of copyright violations sure), but it’s just unclear to me what principles or legal frameworks these companies or institutions should have used to develop the technology.

Oh, I'm absolutely one of the people saying that a lot of companies stole a lot of work, and that it would be better to dissolve them and make all their assets public domain, than to stand for it.

The legal and moral framework is to ask for permission, accept "no". The same framework they use against you in an instant, with an army of lawyers, when you do to them what they did to everybody.

None of this in principle, technically, requires slurping up everything and ignoring consent, that just made it quicker and cheaper, that's why they did it. While they did that, I'm sure other labs made progress in the same direction at much smaller pace, in a defensible manner, of which they should get to keep the fruit.

8note · 2026-05-29T16:08:34 1780070914

still though, whats causing the old code to disappear? the old discussions to disappear?

theyve already been added to all the big labs' datasets, its not like its going anywhere.

but even moreso, accessibility tools exist because people need them, and will make it known when the accessibility is broken.

the screen reader is still gonna expose an api or have observable outputs.

There's very real forcing functions that will keep making useful content about what people want and need from accessibility tools, and how to interact with existing tools.

you're still building for people and the harness operator as time goes on, will probably be the actual user, and will push the LLM to adjust the code to be great for them

visarga · 2026-05-30T08:18:18 1780129098

> What then? LLM learning from LLM doesn't really work, does it?

It does work, it is called RLVR, reinforcement learning from verified rewards, is is based on testing code by execution. It's become a major area of improvement in the last year. But you are also forgetting the amount of steering and problem solving going into coding agents today, and the huge logs they create which can feedback into training. We automated stackoverflow, LLM learns from usage and self play.

oblio · 2026-05-29T12:37:13 1780058233

> It can read the code? Historical discussions around it? Commit histories?

And if everyone bunkers up and all that open content dries up starting in 2026, let's say, what happens?

kristianc · 2026-05-29T12:51:24 1780059084

It won't happen, for two reasons. One is that great deal of open-source software and hobbyist knowledge sharing has never been driven by financial reward anyway and people will continue to do it anyway. Finer grained controls over opt-outs would be great (the equivalent of a search engine 'nofollow' would be great and will hopefully come with time).

Many kinds of technology faced this kind of tragedy of the commons argument in the past and it never bears out. Printing presses copied manuscripts, search engines copied and indexed web pages, open-source software was incorporated into commercial products, Wikipedia repackaged knowledge produced elsewhere.

In almost all cases the total amount of creation increases because the technology lowered costs, expanded audiences, or created new forms of value. The speed of creation of new 'View Source' outpaces the number of people pulling back.

bayarearefugee · 2026-05-29T16:50:02 1780073402

> great deal of open-source software and hobbyist knowledge sharing has never been driven by financial reward anyway and people will continue to do it anyway.

A lot of open-source software was supported by developers having stable well-paying jobs that didn't burn them out and afforded them enough free time to work on passion projects on the side, so that even if their company wasn't directly supporting their OSS development, there was still an indirect link.

Not only is this likely to increasingly change in the future as people need to spend more time navigating the disruption AI will have on labor, it already visibly has been changing over the past year.

One of the top posts on HN today is someone leaving open source and tech completely to work at Home Depot -- while this is an extreme case it isn't wholly unique to what I'm seeing in many places since 2025.

watwut · 2026-05-29T13:33:39 1780061619

It will happen and it already started to happen. It started to happen even before LLM, when google started to hide smaller personal blogs in its search result. Expectation of the monetary reward has nothing to do with it, discoverability does. Culture of creating content does not exist when people cant see what others created and know no one will see what they created. A lot of smaller open source was monkey see monkey do thing - we have seen other open source projects and wanted something like that. Likewise with tutorials, we have seen other people write cool tutorials and felt like creating own and showing it out.

That is not the dynamic with LLM. You see LLM output, but original creator is hidden. And if you write your own, no one will find it. Worst, other people will tell you "LLM could have write it" in reaction ... so people wont bother.

> search engines copied and indexed web pages

Notably, search engines sent people toward web pages. And when search engines stopped doing that and started to copy content, those original pages started to die out.

> Printing presses copied manuscripts

Printing press made dissemination easier. It is an equivalent of early internet, not of LLM.

> open-source software was incorporated into commercial products

Commercial product using open source library had different user then the library it is using. And crucially, it is not hiding that library from the library user.

> Wikipedia repackaged knowledge produced elsewhere

Yes, and we collectively create less encyclopedias. They are not worth writing and checking for correctness anymore, so we don't do that all that much anymore.

visarga · 2026-05-30T08:24:16 1780129456

The centralized choke point of web search is getting relaxed now. Unlike search engines and social networks, you can download a LLM and run it. A small one, but capable of using a library of search stubs to directly fetch information from hubs, feeds and other search engines. You can own the agent who can solve the web search part for you.

Imagine you have a 4B model and keep an equal size corpus of search stubs, small MD documents linking to hubs, feeds and search engines for millions of topics. You can use the LLM to read the stub and perform the search for you, all orchestrated locally, with greater privacy and independence. You can dis-intermediate the search chokepoint now. You can set the criteria for what to include, exclude, how to rank and present the results.

This works because good entry points for any topic change slowly over time. The construction of search stubs is trivial with existing AI agents, and can be shared as open source. A few GB for the model, a few for the search routing layer, and you got a sovereign local agent.

If this holds, access control shifts from whatever Google thinks maximizes profit to whatever the community thinks has value.

watwut · 2026-05-30T09:33:18 1780133598

None of that will create a community of people sharing. You wont find what another guy wrote and he knows that no one will see it if he writes it.

But most crutially, what you described is not an actual thing people do.

citrin_ru · 2026-05-29T23:01:34 1780095694

I'm personally was fine with contributing to open-source without any financial reward. But I'm reluctant to release anything in public now because it will be eventually incorporated into the training set for the technology which will (or at least can) lave me without a job and chances to find one.

justinclift · 2026-05-30T15:15:11 1780154111

> [a] great deal of open-source software and hobbyist knowledge sharing has never been driven by financial reward

It seems like a lot of the underlying sentiment which was driving that was to contribute to the world and make things better for our fellow people.

Working on OSS software sure doesn't feel as rewarding in the last few years, so I've personally stepped back from 99% of what I do. As have more public people that have worked on OSS for a long time.

customguy · 2026-05-29T13:02:47 1780059767

> In almost all cases the total amount of creation increases because the technology lowered costs

But this doesn't lower the cost of learning and writing CSS, it just scoops up some of it and offers that cheaply, and even that only because it's offered below cost. If anything I'd say it increases the cost, because now you don't get paid to get and be good at what an LLM is supposedly good enough at, and have less free time to do it anyway. You may not even have a computer because your current one broke and you can't afford a new one.

aspenmartin · 2026-05-29T13:03:45 1780059825

Well that historical content and code still exists right? Are you just saying “what if we’re in a world of walled gardens now that OSS dies because people don’t want their work stolen” in which case: these companies will get data and they don’t need OSS anymore. It’s already webcrawled or licensed or commissioned, they pay people to generate novel traces when they need it or at the very least sets of prompts and tests for verification. Then synthetic data gets added to the training set, the ones that are verified.

oblio · 2026-05-29T13:12:05 1780060325

This is super hilarious :-)))

Do you think creating the orders of magnitude of content the internet produced organically and which LLM creators are stealing is cheap? If they actually have to pay for content creation while competing with content creators on the you know, content creation front via LLM-generation, the entire business model of LLMs collapses.

You can't have the mountains of data needed for LLMs in the decades to come, if your LLMs put the writers and artists out of work.

aspenmartin · 2026-05-29T14:49:22 1780066162

It’s literally how these models are trained today. They of course use open source data but that’s no longer the most important source, it’s high quality prompts and verifiable tests and a lot of inference compute. They also have massive flywheels from users from which they can mine good data or at the very least again good prompts which can be just as important.

oblio · 2026-05-30T03:02:12 1780110132

And everything we know about these companies points to unsustainability, before we even get to very high impact content lawsuits which haven't even been settled. Let alone lots of data sets being pulled out of public view and being moved to anti LLM licenses (with explicit licensing for training).

We will see how this shakes out in the coming years, as Anthropic, OpenAI & co file for IPO or run out of private funding. Grok is already on the ropes as seen from the SpaceX IPO.

aspenmartin · 2026-05-30T12:01:54 1780142514

You think this train is going to stop because of a lawsuit? And again, if all data was officially off limits for these companies, it wouldn't matter. They have code traces from their users which is arguably much better, they can license code (you'd be surprised to know that these companies are not just stealing everyones data they are paying for it), and they can create data via paying people to do it.

And yes, we will see how it shakes out, Anthropic or OpenAI may collapse just as netscape did, but I hope your implication is not "AI in general will be extinguished like the blockchain" or something

oblio · 2026-05-31T15:09:19 1780240159

I've read probably hundreds of historical books at this point and the only thing most historians agree on is:

Nothing was set in stone. The way many historical things happened the way they did was due to accident, sheer chance.

> And yes, we will see how it shakes out, Anthropic or OpenAI may collapse just as netscape did, but I hope your implication is not "AI in general will be extinguished like the blockchain" or something

I think the current LLM economy will collapse, leaving behind a few survivors. There will be widespread adoption of cheap OSS LLMs and of more limited, economically viable functionality provided by people with deep pockets like Google. As LLM economics start making more sense, LLMs will be everywhere, once the hardware becomes cheaper and more available.

Regarding lawsuits, do you think Disney & co will take this lying down? The freaking DMCA - an American law - is enforced <<internationally>>. It will take a long time but LLMs will be domesticated.

aspenmartin · 2026-06-01T19:39:11 1780342751

> Nothing was set in stone. The way many historical things happened the way they did was due to accident, sheer chance.

I agree with you on a technical level and even in a non-cynical "humanity really can rally and change things that seem insurmountable" but you have read way more history than I have. All I know is you have such a frantic geopolitical aspect to this, and such a staggering amount of funding and addressable market, which means unlike blockchain this is both powered by business _and_ government (no government would give up control of the money supply, shocking to me that people believe this), that I see zero path to winding anything down.

> I think the current LLM economy will collapse, leaving behind a few survivors. There will be widespread adoption of cheap OSS LLMs and of more limited, economically viable functionality provided by people with deep pockets like Google. As LLM economics start making more sense, LLMs will be everywhere, once the hardware becomes cheaper and more available.

Cheap OSS LLMs are used everywhere, all the time. They are great, and with subsidies from say China, they could even be competitive with frontier models, but that model of the world requires this mysterious OSS development running at a big big loss. It takes almost a billion dollars to train a frontier model. For many many cases, you do not need frontier model performance. When I do say video captioning, I use small OSS VL models.

Is your theory predicated on OSS models filling some sort of performance gap between the frontier? Or a compromise for less spend at a lower performance? What to you doesn't make sense about LLM economics? Like, LLM's are everywhere. If you think "oh people will just settle for slightly less performance for cheaper" that should have already played out, we've had the same dynamics at every scale: frontier performance is expensive, but then that same performance will cost roughly 10x less in 1 years time. But you don't see people stopping at like GPT-4 and not adopting the frontier models of today.

I think you're right about the value of OSS LLMs but I don't see what would change the calculus to make frontier models somehow less important. It's like in the 90's when we were like "1 GIGABYTE OF RAM? how will that ever be necessary!?!?" and sure, you don't need 1 GB of ram for everything! We have embedded systems. But it's not like there isn't a booming market for >> 1 GB memory modules.

> Regarding lawsuits, do you think Disney & co will take this lying down? The freaking DMCA - an American law - is enforced <<internationally>>. It will take a long time but LLMs will be domesticated.

Not saying lawsuits will be fruitless, I'm sure they will chip something off of the industry, but by now it just won't matter. We're talking about trillions in spend, multiple countries, a government that sees this as a non-negotiable game to win from a geopolitical and military standpoint, and our government knows that they can't execute this themselves despite what I imagine is their distaste for Silicon Valley tech CEOs and their grandstanding. Maybe a lawsuit kneecaps someone, which would be huge, but that doesn't matter for AI generally. Maybe a lawsuit restricts data use, that's fine, these companies have deep pockets for licensing and commissioning datasets; they have opt-in-by-default user flywheels.

customguy · 2026-05-29T13:30:37 1780061437

That sounds like it would reduce the blazing progress of the last decades to a snail's pace, some twilight where software is just average, as it always was and always will be. That people will always do the thing the opposite of which is now incentivized doesn't convince me, basically. If just using the LLM gets you ahead in a time of severe pressure, then most people will do that, and by the time anyone realizes they kinda need a FEW people to actually be able to reason about something from start to finish, it might be to late.

We're not such a smart species. It's not like we managed so far. We're just adding unsolved problems, and distract ourselves with even bigger problems. The world could have been fed and clothed by the mid 20th century and we could have solved climate change by the 1980s (talking out of my ass here but with confidence in my general point with that), but instead we now throw everything into the furnace. in the hopes it will create a deus ex machina, like in that very bad Isaac Asimov story. I think we are absolutely capable of lobotomizing ourselves (as a species) like a toddler playing with an electrical socket shocking itself. I don't say this to be snarky, I honestly think we're that unserious and ignorant about what we do and the environment we do it in.

But I also really should look into what you answered about LLM learning from themselves, I heard it mentioned before but I still have no real clue. I will try to rectify that. I mean, I really, really want to be wrong on this, only a monster wouldn't.

watwut · 2026-05-29T14:06:29 1780063589

> by the time anyone realizes they kinda need a FEW people to actually be able to reason about something from start to finish, it might be to late.

I dont think it will be "too late" by any reasonable definition. All those things are learnable and companies that will really need to overcome it, will. But, they wont be open with their knowledge. Learning/training will be expensive and once people acquire it, they wont share it like open sources and programming tech blogs did.

jonathanlydall · 2026-05-29T15:05:05 1780067105

The successful standards, platforms, libraries, tools, etc. will be the ones that LLMs can understand. Like a good GitHub readme, or website, or Discord community, I strongly feel that making sure you've (perhaps personally) written enough about your offering for AI to understand it will be an important factor in how successful it can be in markets or communities.

I wrote a similar HN comment around this yesterday, but the short version is that we found for our product that the years of investment in our Docs (which were seemingly never good enough) are now paying enormous dividends in that LLMs seem to understand our product really well. This has manifested in the LLM in our product being highly effective and a few additional clients who found us through AI chats. Turns out the problem with our Docs wasn't so much with their content, but rather that people just weren't looking at them much.

nailer · 2026-05-29T12:57:07 1780059427

> They wrote about it, and LLM can sometimes use that. Once they no longer write about it, what then?

That’s a good question but I suspect when new technologies come out, the normally indecipherable specs released by industry groups (which is why we needed blogs) will be deciphered by LLMs. Not saying this is good or bad (it’s likely both) just saying it.

sasmithjr · 2026-05-29T12:38:50 1780058330

> Once they no longer write about it, what then?

The AI will no longer be able to reproduce new a11y conventions/guidelines, but if no one is writing about it, do any new a11y conventions/guidelines even exist at that point?