Hacker Newsnew | past | comments | ask | show | jobs | submit | jakelazaroff's commentslogin

Sometimes! The server can also send a retry-after header to indicate when the client is allowed to request the resource again: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

… which isn't part of the body of a 429…

That's no different from any art. It's like saying that woodworkers' ethos is having low attachment to screws, or guitarists' ethos is having low attachment to picks. Code is a tool; the creative, human endeavor is making an artifact that people can perceive and interact with.

How does the slash make it clearer? It's totally inert, so if you try to do the same thing with a non-void tag the results will not be what you expect!

It indicates that the content that follows is not inside of the tag without the reader needing to remember how HTML works. Tags should have either a self-closing slash, or a closing tag.

The third way of a bare tag is where the confusion comes from.


It doesn't indicate that, though. If you write <div />, for example, the content that follows is inside of the tag. So the reader still needs to remember how HTML works, because the slash does nothing.

Contrary to <img /> or <br />, <div /> is necessarily a mistake or intentionally misleading. The unfamiliar reader should not stumble upon <div /> too often. <div /> is a bug. It's a bit like using misleading indentation in C-like programming languages. Yeah, it can happen, and is a source of bugs, but if the page is well written, the regularity of having everything closed, even if it's decorative for the spec, can help the unfamiliar reader who doesn't have all the parsing rules in mind.

Now, we can discuss whether we should optimize for the unfamiliar reader, and whether the illusion of actual meaning the trailing slash in HTML5 can be harmful.

I would note that exactly like trailing slashes, indentation doesn't mean anything for the parser in C-like languages and can be written misleadingly, yet we do systematically use it, even when no unfamiliar reader is expected.

At this point, writing a slash or not and closing all the tags is a coding style discussion.

Now, maybe someone writing almost-XHTML (closing all tags, putting trailing slashes, quoting all the attributes) should go all the way and write actual XHTML with the actual XHTML content type and benefit from the strict parser catching potential errors that can backfire and that nobody would have noticed with the HTML 5 parser.


That surprised me, but sure enough this shows you're right:

<div style="color: black">a <div style="color: red">b <div style="color: green" />c </div>d </div>e

produces black a, red b, green c, red d, black e


The inert self closing syntax is misleading, though, because if you use it for a non-void element then whatever follows will be contained within the tag.

e.g. how do you think a browser will interpret this markup?

    <div />
    <img />
A lot of people think it ends up like this (especially because JSX works this way):

    <div></div>
    <img>
but it's actually equal to this:

    <div>
        <img>
    </div>

Ah, that's true. I think the WHATWG discouraged the syntax, so this might be why.

This is really easy to detect though, unlike arbitrary rules on what belongs on the inside of an unclosed tag.

You cannot nest <li>. An <li> may only be a child of <ol>, <ul> or <menu>.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...


"At the edge" means "on a server located close to where you are". It's used to serve the HTML.

Looks like the only JavaScript running on the client is for installing the service worker and some Cloudflare tracking junk.


Workers doesn’t require JS to serve static content though. You upload it as a static asset and it does it for you.

Right, but any old CDN can do that. Why does it need CF workers?

> E.g. density doesn't decrease housing prices: https://www.strongtowns.org/journal/2023/4/26/upzoning-might...

The article you cited doesn't support that assertion. Its thesis is that upzoning alone — i.e. relaxing regulations such that it is legal to build higher-density housing, without further interventions — may not be sufficient to create enough vacancies to lower rents.


It says exactly what I'm saying. Density increases do NOT result in lower housing prices.

And you need a state-driven corrupt system of subsidies for socialized housing to make it "affordable". For the right kinds of people.


Did you mistakenly link the wrong article? This one definitely does not say that.

Could you quote a passage that supports your interpretation?


You are right, that particular article alone doesn't spell it out completely. But other articles from this author do: https://archive.strongtowns.org/journal/2022/6/15/a-parallel...

The cited article alone simply admits that upzoning won't result in cheaper housing. Because the market is broken (and only socialized housing can fix it), but we must do upzoning anyway.


That article also doesn't support your assertion. For example, they specifically call out parking minimums and minimum lot sizes (both density-lowering regulations) as major drivers of high housing costs.


Note that pure HTML and CSS implementations of tabs using <details> and <summary> fail to meet several important accessibility criteria [1].

While you can make something that visually appears to act as a set of tabs, building it accessibly unfortunately still requires JavaScript.

[1] https://adrianroselli.com/2019/04/details-summary-are-not-in...


This is false, recently the details element has gotten support for grouping them: the [name] attribute. This effectively enforces tab-like semantics where only one of the grouped details elements can be open at a time.

This is a quite recent addition and the modern web is evolving too fast so I wouldn't put it past myself for missing this :)

Yay for progress and for JavaScript free solutions!


No, it's still true. I'm aware of that hack, but unfortunately it doesn't solve the problems with pure HTML and CSS tabs.

Crucially, the `name` attribute does not semantically turn a group of <details> elements into a set of tabs. All it does is introduce the (visual) behavior where opening one <details> element closes the others.

I posted a list of accessibility issues with the <details> hack in another comment: https://news.ycombinator.com/item?id=46415271


The pure-css effects I mentioned both don't use <detail>/<summary>.


Same caveat applies to the "checkbox hack" or any other pure CSS solution. You cannot create accessible versions of most complex controls like tabs without JavaScript.

(That first example could be created semantically and accessibly with <details> / <summary> though!)


what about, make it work in pure html and css, and enrich it with js to make it accessible?

rather than not working at all with js disabled


That's actually a common strategy called "progressive enhancement". The only thing is that your order is backwards: you should first make it accessible in pure HTML and CSS, and then use JavaScript to layer your fancy interactions on top.

So, for the tabs example, your baseline pure HTML and CSS solution might involve showing the all tab panels simultaneously, stacked vertically. Once the JavaScript loads, it would rearrange the DOM to hide all but one and add the tab-like behavior.


for "accessible", do you mean getting focused when pressing TAB key?


Here is a non-exhaustive list of issues you'll run into with various pure HTML and CSS implementations:

- Tabs should have an ARIA "tab" role [1], but <summary> doesn't accept roles [2].

- Focusing a tab must activate the corresponding tab panel [3], which requires JavaScript.

- Tabs should be navigable by arrow keys [4], which also requires JavaScript.

I want to be clear that I'm not trying to tear down your work. Your project looks cool and eliminating JavaScript is a noble goal. But unfortunately, as of today, it's still required to correctly build most complex controls on the web.

[1] https://developer.mozilla.org/en-US/docs/Web/Accessibility/A...

[2] https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

[3] https://w3c.github.io/aria/#tab

[4] https://www.w3.org/WAI/ARIA/apg/patterns/tabs/


> Tabs should be navigable by arrow keys [4], which also requires JavaScript.

It supports this now (with JavaScript). If not, try to refresh the page.


typo: s/with/without/


That is no longer true! You can do it in CSS with a combination of `@starting-style` and `transition-behavior: allow-discrete`. [1]

Another gotcha you'll run into is animating the height. A couple other new features (`interpolate-size: allow-keywords` and `::details-content`) will let you get around that. [2]

Modern CSS is awesome.

[1] https://developer.chrome.com/blog/entry-exit-animations

[2] https://nerdy.dev/open-and-close-transitions-for-the-details...


The major issue with this is that modern CSS is almost its own job, to the point we used to have Interface Developers at some place I’ve worked (HTML+CSS specialists). I did frontend for over a decade and eventually lost the train on CSS changes, I don’t even know what’s going on there anymore.

It’s still awesome, but it’s becoming increasingly silly to ask someone to know modern HTML, CSS, JavaScript, Typescript, some build tools, a couple of frameworks, etc.

The amount of JS we ship to clients is a reflection of cost-cutting measures at your workplace, not that every FE dev shuns CSS.


When I started dabbling in web development, writing HTML and CSS was already its own job, and professional JavaScript developers basically did not exist. This was before TypeScript, before Node, before Ajax, before React or even jQuery. If anything has exploded in complexity in the intervening years, it's the JavaScript part of the equation.

I agree that it's increasingly silly to ask someone to be an expert in all of frontend. But the primary driver of that is not all the new CSS features we're getting.


Agreed. Having a "HTML + CSS" engineer on the team was largely due to the number of hacks needed to make css work -- purposely adding invalid html that would only be parsed by specific browsers, ie5 vs ie6 vs netscape being wildly different (opera mobile was out of this world different), using sprites everywhere because additional images would have a noticeable lag time (especially with javascript hover), clearfix divs to overcome float issues. To be clear, I'm not saying "things were harder back then" or "css is simple now", but things with CSS were so wild and the tooling was so bad, that it what a unique skill of it's own that is less needed now, and the shift has been for people to focus on going deeper with js.


<sarcasm>I'm an expert -> just write everything in static html, use css only where defaults are neglected. After relevant tables in the db are updated generate a new static document. Only if that won't fly use json but feel appropriately dirty when you do.

These are fair points. And developers themselves thought “hey, what if I also write Node?”, thus making things worse. Businesses accepted that all too willingly.

I can't see how a bunch of esoteric incantations are better than just some straight-forward easy to understand and follow JavaScript.


Because you need 20x the JS to do the same thing and it’s still not hardware accelerated. These new CSS properties are well supported and will only get better.


   <span onclick="
      banana.
      classlist.
      toggle('hidden')
   ">click</span>
   <div id="banana" class="hidden">loris</div>

That's a poor example since it isn't accessible and doesn't work if a pending JS or CSS network request is preventing it from functioning.

This is the HTML version, it's not susceptible to halted execution, and it is accessible.

<details>

  <summary>Click</summary>

  loris
</details>

Is that "straight-forward easy to understand and follow JavaScript" the whole thing written from scratch? Or does it use libraries (that use libraries, that use libraries)?

Because I've written my share of javascript-from-scratch in my time - before npm and such. And even if my use-case was limited, in order to get edge-cases and details working - issues long solved by their HTML/CSS counterparts - we needed more and more JS. Many of which handwritten polyfills, agent-detection, etc.

Seriously, things like scrollbars (because the client insisted on them being consistent across user-agents) or dropdowns (because they had to be styled) "visited" state on links, in pure JS are thousands of lines of code. Maybe not today, anymore, IDK, with current APIs like the history API or aria labeling. But back then, just in order to make the dropdown work with screen readers, or the scrollbars react well to touchpads -in the direction the user was used to based on their OS- took us thousands of lines of JS, hacks, workarounds and very hard to follow code - because of the way the "solutions" were spread out over the exact right combination of JS, HTML and CSS. Edit: I now recall we got the web-app back with the comment "When I select "Language" and start typing "Fr" I expect French to be picked and "enter" to then put the language in French". We spent another few days on functions that collect character inputs in memory and then match them with values. All because "flags in front of the names were of crucial importance".

So, maybe this is solved in modern HTML/CSS/JS. But I highly doubt it. I think "some straight-forward ... JavaScript" is either an `import { foo } from foobar` or a pipe-dream in the area of "programmers always underestimate hours"


using flags for language is a bad pattern I wish would die. I'm not clicking on the British flag!


Certainly. But the problem here wasn't "we want flags", but that the client (via the designer) demanded something that couldn't fit in a select box and so we had to build our own.

Now, I think part of the problem is that such elements weren't architectured properly when invented. Like many other HTML elements, they should've had some way to style and/or improve them.

E.g. an H1 Header, I can apply CSS to and change it from the default to something matching the business style. I can add some behaviour to it, so I can bookmark it's id anchor. I can add some behaviour to turn the H1-6 into a nice table-of-contents. Or an image can be improved with some CSS and JS to load progressively. But most form elements, and the dropdown in particular, is hard to improve.

And, yes, I am aware of the can of worms if "any element is allowed inside an <option>". Or the misuse designers will do if we can add CSS to certain <options> or their contents. Though I don't think "webdevs will abuse" was ever the reason not to hand power to them. It was mostly a disconnect between the "designers of the specs" and the "designers/builders of websites".

Because that "abuse" is never worse than what is still done en-masse: where we simply replace the "select" with hundreds of lines of CSS, divsoup, and hundreds or thousands of lines of JS. Where entire component libraries exist and used all over the place, that completely replicate the behaviour of existing (form) elements but with divs/spans, css and js. And despite the thousands of hours of finetuning, still get details wrong in the area of a11y, on mobile platforms, on obscure platforms, with a plugin, with a slow connection and so on.


Luckily things are slowly changing for the better. You can actually style a <select> now! Browser support is still scant but it'll gracefully degrade to a normal looking <select>. https://developer.mozilla.org/en-US/docs/Learn_web_developme...


What about strings + flags?


String of the language, in it's language: [Flag] Suomi (Finnish)


Because a team of browser engineers have already written and reviewed the code to do it for you; and (hopefully) it’ll be performant, properly tested and accessible… ;-D


JS animations run on the main thread with everything else, so if your browser is busy doing something else the animation ends up being janky. Using CSS solves that problem.


1. CSS is declarative, so it avoid subtle bugs

2. CSS integrates better with HTML, as it has selectors to automatically bind to elements (yes there custom elements for JS)


We've gotten so far away from semantic documents so we could build "apps".

Data used to be first class. You would deliver everything in the HTML container and the style sheets or client could do whatever it wanted/needed with that data.

Native search, native copy, no clever javascript tricks to hide or remove information from the document.

The HTML data container should be immutable.


> We've gotten so far away from semantic documents so we could build "apps".

Exactly. We're still pretending that the browser is some kind of document display application when it's an application runtime. We keep adding more HTML tags and infinite number of CSS properties and features (that never get it right) when what we should have as a better application GUI API. Throw all the hardware acceleration and threading into that instead of @starting-style, transition-behavior: allow-discrete, interpolate-size: allow-keywords and ::details-content and breath some sanity into the platform.

We've effectively re-implemented that desktop/mobile GUI using a bunch of cobbled together technologies and continue to get more esoteric and complicated every year. Hell, I'm not even sold on JavaScript -- it's just as clunky and weird as everything else.

Move document rendering into high-level implementation on top of a better designed low-level API much like how PDF display in browsers is done with JavaScript.


It sounds like you want a game engine.

I want a hypertext document viewer.

> @starting-style, transition-behavior: allow-discrete, interpolate-size: allow-keywords and ::details-content

This is sane from a declarative document styling syntax.


As a developer, I want a sane platform. Sometimes I want to write documents and sometimes I want to write applications.

> This is sane from a declarative document styling syntax.

Is it? CSS intentionally avoided mixing animation with live layout resolution and now we have a "switch" to enable it. I wouldn't call that elegant.

If we could just hook into layout with code this could have been resolved years ago instead of waiting for browser makers to invent yet another set keywords.


In my opinion the problem is the lack of good GUI editing apps for purely HTML documents and no standard for self-contained HTML docs (that would bundle all the ressources into a single clickable file).

Word for the web basically, but with support for multimedia.

In that sense the web has failed, there is epub but it's not really good.


Is CSS too hard to learn or something?


I'm not particularly familiar with modern webdev, can anyone share a minimal example?


JavaScript encumbered pages break at least once for NoScript users.


To be fair, that's on the user. It's a trade-off the user is making, knowing that there's poorly made sites out there and sites that actively depend on JavaScript to function (sometimes because JavaScript is the only way they can function, but usually because someone's never heard of progressive enhancement). In the past, turning off JavaScript was a functional way to prevent things from running and to make sites load faster; today ads and progressive enhancement and optional functionality are hardly the only usage of JavaScript: lazy loading variable-size content (via fragments or otherwise) causes scroll issues if you're trying to go for performance on a complex layout. CSS containment and content-visibility with contain-intrinsic-size help solve this, but they're pretty new.


* Only works on chromium browser for now.

** Works badly even on chromium browsers (granted I was doing weird stuff, but the animation did not work properly).

I estimate that in a few years we will have animations working properly and mostly everywhere on details elements. Not before.


You say awesome, I say layers upon layers of opaque incantations I have to remember. Thank god for LLMs.


You're doing it wrong. You don't have to remember the incantations. You just have to remember that they exist, and then google them or ask an LLM when you need them.

If you use something enough you'll remember. If you don't, you just look it up when you need it. This is basic programming, nobody remembers everything.


“Layers upon layers of opaque incantations”

You’ve described software.


And how might you do this in Javascript?


they just said "LLMS"!


My point was that js would be vastly more complicated than these html/css "incantations".


Most front end engineers could do it in JS without ever having to look something up. But the CSS to do it is still obscure to most.


so, why not answer my question and say how you'd do it in js...?


@starting-style Has less than 90% browser support making it a non-starter for the time being at least.


It'll just degrade gracefully into not animating the element's entry, so unless the animation is somehow crucial you should still be fine to use it.

If you really need to detect whether it's supported there are hacky methods: https://www.bram.us/2024/07/11/feature-detect-css-starting-s...


I agree, but must also observe that I have never met a designer who was willing to admit without a knock-down drag-out fight that any animation they put in was not somehow crucial.


I've never met a designer who wasn't completely fine with my suggestions for more pragmatic solutions. Like just styling a default scrollbar instead of implementing my own scrollbar to make it exactly like the design. Using a default drop-down menu instead of rolling my own just so I can round the corners of the selects.

The designers I've worked with are fine with these things. We have more important things to work on than small style details. We can go back and change these things later if anyone actually cares, but generally nobody ever does.


I've never met a designer who cares how it gets done but I have hard time believing they were OK with the corners not being rounded as per the design. They may agree on shipping without the rounded corner, as long as the ticket to round that corner is registered.

I suppose though that we have just had very different life experiences, as that is what the HN guidelines would require of us.


I think you're both right.

I have also met a lot of completely unreasonable designers that would insist on the most minimal things (even to the detriment of usability), and would act like assholes towards developers.

I have also had situations where developers would beg to work with a certain designer because their experience made development a breeze, even for complex layouts. Funny enough, the projects where this designer worked would always get done, and the visual result was always great.


The trick is they'll see it working for themselves. :)


It depends on what you're doing. It's common for clients to wonder why the design they saw had fancy animations yet they don't see them on their MacBook...


91% of usage for browsers tracked by caniuse [1].

The biggest gap is Chrome versions > 2 years old.

[1] https://caniuse.com/?search=%40starting-style


To someone who believes that AI training data is built on the theft of people's labor, your second paragraph might sound like an 1800s plantation owner saying "can you imagine trying to explain to someone 100 years from now we tried to stop slavery because of civil rights". You're not addressing their point at all, just waving it away.


I always wonder if the general sentiment toward genai would be positive if we had wealth redistribution mechanisms in place, so everyone would benefit. Obviously that's not the case, but if you consider the theoretical, do you think your view would be different?


To be honest, I'm not even sure I'm fully on board with the labor theft argument. But I certainly don't think generative AI is such an unambiguous boon for humankind that we should ignore any possible negative externalities just to advance it.


> "To someone who believes that AI training data is built on the theft of people's labor..."

i.e. people who are not hackers. Many (most?) hackers have been against the idea of copyright and intellectual property from the beginning. "Information wants to be free." after all.

Must be galling for people to find themselves on the same side as Bill Gates and his Open Letter to Hobbyists in 1976 which was also about "theft of people's labor".


[flagged]


It's not free. There is a license attached. One you are supposed to follow and not doing so is against the law.


There's a deeper discussion here about property rights, about shrinkwrap licensing, about the difference between "learning from" vs "copying", about the realpolitik of software licensing agreements, about how, if you actually wanted to protect your intellectual property (stated preference), you might be expected to make your software proprietary and not deliberately distribute instructions on how to reproduce an exact replica of it in order to benefit from the network effects of open distribution (revealed preference) - about wanting to have your cake and eat it too, but I'd be remiss to not point out that your username is not doing your credibility any favors here.


I'm not whining in this case, just pointing out "they gave it out for free" is completely false, at the very least for the GNU types. It was always meant to come with plenty of strings attached, and when those strings were dodged new strings were added (GPL3, AGPL).

If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Some companies outright bar their employees from reading GPLed code because they see it as too high of a liability. But if a computer does it, then suddenly it is a-ok. Apparently according to the courts too.

If you're going to allow copyright laundering, at least allow it for both humans and computers. It's only fair.


> If I had a photographic memory and I used it to replicate parts of GPLed software verbatim while erasing the license, I could not excuse it in court that I simply "learned from" the examples.

Right, because you would have done more than learning, you would have then gone past learning and used that learning to reproduce the work.

It works exactly the same for a LLM. Training the model on content you have legal access to is fine. Aftwards, somone using that model to produce a replica of that content is engaged in copyright enfringement.

You seem set on conflating the act of learning with the act of reproduction. You are allowed to learn from copyrighted works you have legal access to, you just aren't allowed to duplicate those works.


The problem is that it's not the user of the LLM doing the reproduction, the LLM provider is. The tokens the LLM is spitting out are coming from the LLM provider. It is the provider that is reproducing the code.

If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.


> The problem is that it's not the user of the LLM doing the reproduction, the LLM provider is.

I don't think this is legally true. The law isn't fully settled here, but things seem to be moving towards the LLM user being the holder of the copyright of any work produced by that user prompting the LLM. It seems like this would also place the enfringement onus on the user, not the provider.

> If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.

If you produce code using a LLM, you (probably) own the copyright. If that code is already GPL'd, you would be the one engaged in enfringement.


You seem set on conflating "training" an LLM with "learning" by a human.

LLMs don't "learn" but they _do_ in some cases, faithfully regurgitate what they have been trained on.

Legally, we call that "making a copy."

But don't take my word for it. There are plenty of lawsuits for you to follow on this subject.


> You seem set on conflating "training" an LLM with "learning" by a human.

"Learning" is an established word for this, happy to stick with "training" if that helps your comprehension.

> LLMs don't "learn" but they _do_ in some cases, faithfully regurgitate what they have been trained on.

> Legally, we call that "making a copy."

Yes, when you use a LLM to make a copy .. that is making a copy.

When you train a LLM... That isn't making a copy, that is training. No copy is created until output is generated that contains a copy.


Everything which is able to learn is also alive, and we don't want to start to treat digital device and software as living beings.

If we are saying that the LLM learns things and then made the copy, then the LLM made the crime and should receive the legal punishment and be sent to jail, banning it from society until it is deemed safe to return. It is not like the installed copy is some child spawn from digital DNA and thus the parent continue to roam while the child get sent to jail. If we are to treat it like a living being that learns things, then every copy and every version is part of the same individual and thus the whole individual get sent to jail. No copy is created when installed on a new device.


> we don't want to start to treat digital device and software as living beings.

Right, because then we have to decide at what point our use of AI becomes slavery.


[flagged]


[flagged]


> "Learning" was used by the person I responded too.

Not in the same sense.

> If you had read my comment with any care you would have realized I used the words "training" and "learning" specifically and carefully.

This is completely belied by "It works exactly the same for a LLM."

> That doesn't count as a "copy" since it isn't human-discernable.

That's not the reason it _might not_ count as a copy (the law is still not settled on this, and all the court cases have lots of caveats in the rulings), but thanks for playing.

> If you don't like being called out for lack of comprehension, then don't needlessly impose a semantic interjection

If you want to not appear mendacious, then don't claim equivalence between human learning and machine training.

> It is pretty clear this is a transformative use and so far the courts have agreed

In weak cases that didn't show exact outputs from the LLM, yes. In any case, "transformative" does not automagically transform into fair use, although it is one considered factor.

> Very mature.

Hilarious, coming from the one who wrote "if it helps your comprehension."

You must be one of those assholes who think it's OK to say mean things if you use the right words.

Bless your heart.


You both broke the site guidelines badly in this thread. Could you please review https://news.ycombinator.com/newsguidelines.html and stick to the rules? We ban accounts that won't, and I don't want to ban either of you.


> This is completely belied by "It works exactly the same for a LLM."

I specifically used the word "training" in the sentence aftwards. "It" clearly refers to the sentence prior which explains that infringement happens when the copy is created, not when the original is memorized/learned/trained.

> If you want to not appear mendacious, then don't claim equivalence between human learning and machine training.

I never claimed that. I already clarified that with my previous comment. Instead of bothering to read and understand you have continued to call names.

> Hilarious, coming from the one who wrote "if it helps your comprehension."

You seemed confused, you still seem confused. If you think this genuine (and slightly snarky) offer to use terms that sidestep your pointless semantic nitpick is "being an asshole"... then you need to get some more real world experience.


You both broke the site guidelines badly in this thread. Could you please review https://news.ycombinator.com/newsguidelines.html and stick to the rules? We ban accounts that won't, and I don't want to ban either of you.


I'm polite in repose to being repeatedly called names and this is your response?

If you think my behavior here was truly ban worthy than do it because I don't see anything in the I would change except for engaging at all


This is the sort of thing I was referring to:

> Instead of bothering to read and understand you have continued to call names.

> You seemed confused, you still seem confused

> your pointless semantic nitpick

> you need to get some more real world experience

I wouldn't personally call that being polite, but whatever we call it, it's certainly against HN's rules, and that's what matters.

Edit: This may or may not be helpful (probably not!) but I wonder if you might be experiencing the "objects in the mirror are closer than they appear" phenomenon that shows up pretty often on the internet - that is, we tend to underestimate the provocation in our own comments, and overestimate the provocation in others' comments, which in the end produces quite a skew (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...).


Sorry, and thanks.

I know moderation is a tough gig.


We spread free software for multiple purposes, one of them being the free software ethos. People using that for training proprietary models is antithetical to such ideas.

It's also an interesting double standard, wherein if I were to steal OpenAI's models, no AI worshippers would have any issue condemning my action, but when a large company clearly violates the license terms of free software, you give them a pass.


> I were to steal OpenAI's models, no AI worshippers would have any issue condemning my action

If GPT-5 were "open sourced", I don't think the vast majority of AI users would seriously object.


OpenAI got really pissy about DeepSeek using other LLMs to train though.

Which is funny since that's a much clearer case of "learning from" than outright compressing all open source code into a giant pile of weights by learning a low-dimensional probability distribution of token sequences.


I can't speak for anyone else, but if you were to leak weights for OpenAI's frontier models, I'd offer to hug you and donate money to you.

Information wants to be free.


> The difference is that people who write open source code or release art publicly on the internet from their comfortable air conditioned offices voluntarily chose to give away their work for free

That is not nearly the extent of AI training data (e.g. OpenAI training its image models on Studio Ghibli art). But if by "gave their work away for free" you mean "allowed others to make [proprietary] derivative works", then that is in many cases simply not true (e.g. GPL software, or artists who publish work protected by copyright).


What? Over 183K books were pirated by these big tech companies to train their models. They knew what they were doing was wrong.


Perhaps you should Google the definition of metaphor before commenting.


[flagged]


You're changing the subject. What about the actual point?


I mean, yeah, if you omit any objectionable detail and describe it in the most generic possible terms then of course the comparison sounds tasteless and offensive. Consider that collecting child pornography is also "storing the result of an HTTP GET".


[flagged]


[flagged]


What didn’t they engage with?

It’s really hard to parse this thread because you and the other gentleman keep telling anyone who engages they aren’t engaging.

You both seem worked up and perceiving others as disagreeing with you wholesale on the very concept that AI companies could be forced to compensate people for training data, and morally injuring you.

Your conduct to a point, but especially their conduct, goes far beyond what I’m used to on HN. I humbly suggest you decouple yourself a bit from them, you really did go too far with the slavery bit, and it was boorish to then make child porn analogy.


If you believe my conduct here is inappropriate, feel free to alert the mods. I think it's pretty obvious why describing someone's objections to AI training data as "storing the result of an HTTP GET" is not a good faith engagement.


[flagged]


We've banned this account. Please don't use multiple accounts in arguments on HN. It will eventually get your main account banned as well.

https://news.ycombinator.com/newsguidelines.html


The objection to CSAM is rooted in how it is (inhumanely) produced; people are not merely objecting to a GET request.


Yes, they're objecting to people training on data they don't have the right to, not just the GET request as you suggest.

If you distribute child porn, that is a crime. But if you crawl every image on the web and then train a model that can then synthesize child porn, the current legal model apparently has no concept of this and it is treated completely differently.

Generally, I am more interested in how this effects copyright. These AI companies just have free reign to convert copyrighted works into the public domain through the proxy of over-trained AI models. If you release something as GPL, they can strip the license, but the same is not true of closed-source code which isn't trained on.


Indeed, and neither is that what people are objecting to with regard to AI training data.


That's not true, since cartoon drawings and certain manga also fall in that category. Do you have any evidence that manga is produced inhumanely?


> believes that AI training data is built on the theft of people's labor

I mean, this is an ideological point. It's not based in reason, won't be changed by reason, and is really only a signal to end the engagement with the other party. There's no way to address the point other than agreeing with them, which doesn't make for much of a debate.

> an 1800s plantation owner saying "can you imagine trying to explain to someone 100 years from now we tried to stop slavery because of civil rights"

I understand this is just an analogy, but for others: people who genuinely compare AI training data to slavery will have their opinions discarded immediately.


We have clear evidence that millions of copyrighted books have been used as training data because LLMs can reproduce sections from them verbatim (and emails from employees literally admitting to scraping the data). We have evidence of LLMs reproducing code from github that was never ever released with a license that would permit their use. We know this is illegal. What about any of this is ideological and unreasonable? It's a CRYSTAL CLEAR violation of the law and everyone just shrugs it off because technology or some shit.


You keep conflating different things.

> We have evidence of LLMs reproducing code from github that was never ever released with a license that would permit their use. We know this is illegal.

What is illegal about it? You are allowed to read and learn from publicly available unlicensed code. If you use that learning to produce a copy of those works, that is enfringement.

Meta clearly enganged in copyright enfringement when they torrented books that they hadn't purchased. That is enfringement already before they started training on the data. That doesn't make the training itself enfringement though.


> Meta clearly enganged in copyright enfringement when they torrented books that they hadn't purchased. That is enfringement already before they started training on the data. That doesn't make the training itself enfringement though.

What kind of bullshit argument is this? Really? Works created using illegally obtained copyrighted material are themselves considered to be infringing as well. It's called derivative infringment. This is both common sense and law. Even if not, you agree that they infringed on copyright of something close to all copyrighted works on the internet and this sounds fine to you? The consequences and fines from that would kill any company if they actually had to face them.


> What kind of bullshit argument is this? Really? Works created using illegally obtained copyrighted material are themselves considered to be infringing as well.

That isn't true.

The copyright to derivative works is owned by the copyright holder of the original work. However using illegaly obtained copies to create a fair use transformative work does not taint your copyright of that work.

> Even if not, you agree that they infringed on copyright of something close to all copyrighted works on the internet and this sounds fine to you?

I agree that they violated copyright when they torrented books and scholarly arguments. I don't think that counts at "close to all copyrighted works on the Internet".

> The consequences and fines from that would kill any company if they actually had to face them.

I don't actually agree that copyright that causes no harm should be met with such steep penalties. I didn't agree when it was being done by the RIAA and even though I don't like facebook, I don't like it here either.


>We know this is illegal

>It's a CRYSTAL CLEAR violation of the law

in the court of reddit's public opinion, perhaps.

there is, as far as I can tell, no definite ruling about whether training is a copyright violation.

and even if there was, US law is not global law. China, notably, doesn't give a flying fuck. kill American AI companies and you will hand the market over to China. that is why "everyone just shrugs it off".


The "China will win the AI race" if we in the West (America) don't is an excuse created by those who started the race in Silicon Valley. It's like America saying it had to win the nuclear arms race, when physicists like Oppenheimer back in the late 1940s were wanting to prevent it once they understood the consequences.


okay, and?

what do you picture happening if Western AI companies cease to operate tomorrow and fire all their researchers and engineers?


Less slop


China is doing human gene editing and embryo cloning too, we should get right on that. They're harvesting organs from a captive population too, we should do that as well otherwise we might fall behind on transplants & all the money & science involved with that. Lots of countries have drafts and mandatory military service too. This is the zero-morality darwinian view, all is fair in competition. In this view, any stealing that China or anyone does is perfectly fine too because they too need to compete with the US.


All creative types train on other creative's work. People don't create award winning novels or art pieces from scratch. They steal ideas and concepts from other people's work.

The idea that they are coming up with all this stuff from scratch is Public Relations bs. Like Arnold Schwarzenegger never taking steroids, only believable if you know nothing about body building.


The central difference is scale.

If a person "trains" on other creatives' works, they can produce output at the rate of one person. This presents a natural ceiling for the potential impact on those creatives' works, both regarding the amount of competing works, and the number of creatives whose works are impacted (since one person can't "train" on the output of all creatives).

That's not the case with AI models. They can be infinitely replicated AND train on the output of all creatives. A comparable situation isn't one human learning from another human, it's millions of humans learning from every human. Only those humans don't even have to get paid, all their payment is funneled upwards.

It's not one artist vs. another artist, it's one artist against an army of infinitely replicable artists.


So this essentially boils down to an efficiency argument, and honestly it doesn't really address the core issue of whether it's 'stealing' or not.


What kind of creative types exist outside of living organisms? People can create award winning novels, but a table do not. Water do not. A paper with some math do not.

What is the basis that an LLM should be included as a "creative type"?


Well a creative type can be defined as an entity that takes other people's work, recombines it and then hides their sources.

LLMs seem to match.


Precisely. Nothing is truly original. To talk as though there's an abstract ownership over even an observation of the thing that force people to pay rent to use.. well artists definitely don't pay to whoever invented perspective drawings, programmers don't pay the programming language's creator. People don't pay newton and his descendants for making something that makes use of gravity. Copyright has always been counterproductive in many ways.

To go into details though, under copyright law there's a clause for "fair use" under a "transformative" criteria. This allows things like satire, reaction videos to exist. So long as you don't replicate 1-to-1 in product and purpose IMO it's qualifies as tasteful use.


What the fuck? People also need to pay to access that creative work if the rights owner charges for it, and they are also committing an illegal act if they don't. The LLM makers are doing this illegal act billions of times over for something approximating all creative work in existence. I'm not arguing that creative's make things in a vacuum, this is completely besides the point.


Never heard anything about what you are talking about. There isn't a charge for using tropes, plot points, character designs, etc. from other people's works if they are sufficently changed.

If an LLM reads a free wikipedia article on Aladdin and adds a genie to it's story, what copyright law do you think has been broken?


Meta and Anthropic atleast fed the entire copyrighted books into the training. Not the wikipedia page, not a plot summary or some tropes, they fed the entire original book into training. They used atleast the entirety of LibGen which is a pirated dataset of books.


It's very much based on reason and law.

> There's no way to address the point

That's you quitting the discussion and refusing to engage, not them.

> have their opinions discarded immediately.

You dismiss people who disagree and quit twice in one comment.


> It's very much based on reason and law.

I have no interest in the rest of this argument, but I think I take a bit of issue on this particular point. I don't think the law is fully settled on this in any jurisdiction, but certainly not in the United States.

"Reason" is a more nebulous term; I don't think that training data is inherently "theft", any more than inspiration would be even before generative AI. There's probably not an animator alive that wasn't at least partially inspired by the works of Disney, but I don't think that implies that somehow all animations are "stolen" from Disney just because of that fact.

Obviously where you draw the line on this is obviously subjective, and I've gone back and forth, but I find it really annoying that everyone is acting like this is so clear cut. Evil corporations like Disney have been trying to use this logic for decades to try and abuse copyright and outlaw being inspired by anything.


It can be based on reason and law without being clear cut - that situation applies to most of reason and law.

> I don't think that training data is inherently "theft", any more than inspiration would be even before generative AI. There's probably not an animator alive that wasn't at least partially inspired by the works of Disney ...

Sure, but you can reason about it, such as by using analogies.


What makes something more or less ideological for you in this context? Is "reason" always opposed to ideology for you? What is the ideology at play here for the critics?


> I mean, this is an ideological point. It's not based in reason

You cant be serious


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: