Of actual practical consequence are optional html5 tags [0]. They allow you to write html by hand as easily as if you wrote markdown. Some graybeards even say that optional html5 tags make markdown entirely unnecessary.
Thus, lists don't need closing item tags:
This is correct html:
<ul>
<li> first
<li> second
<li> third
</ul>
Tables become uncluttered:
Raw html tables
<table>
<tr>
<td> are
<td> easier
<td> than
<tr>
<td> any
<td> markdown
<td> flavor
</table>
And you can even write the entire document in that style:
<!doctype html>
<title>Hello</title>
<p>This is a complete conforming html document.
No need for stupid html, header, body elements. They are optional!
Yeah, I was not at ease with this kind of stuff but I more and more like it.
I'm hesitating between using a fully XML compliant code to be able to automatically spot stupid mistakes thanks to the parser, or this. If I'm not validating the XML I don't see much point in writing these extra tags. It's just less data to transfer. Compression is a thing but still.
I do see one point: the version with the extra tags is closer to the actual representation of the document in the browser. You have to explain to newcomers that a full DOM with html, head and body elements will be rendered even if they are not present in the source code, with consequences for CSS and JavaScript to style and manipulate the document. The speech is just easier when the source matches the DOM.
For the missing tag endings, it requires quite deeper knowledge about parsing HTML to read a document. It's clear that it makes taking notes faster, but the code is perhaps less clear to someone who has not read it. Still a bit hesitant on this.
Nitpick: you might need <meta charset=utf8> before the title.
If an extra tool is fine, a really common trope in the world of minimal markup languages/libraries/DSLs are ways to write low-furniture html/xml and generate the delivery files.
DMark (https://denisdefreyne.github.io/d-mark/) is the only one I've used, but IIRC there are probably ~dozens of these out there (and many eschew furniture more than DMark).
I didn't know about DMark. I actually designed a syntax that provides a way to implement custom tags, but this is not quite easy.
While my syntax is even lighter than this, it's reasonable and DMark seems to be very easy and enjoyable to extend. And I say this as someone who doesn't know Ruby and for whom Ruby hasn't clicked (yet?). I'll definitely have to check it out, either to adopt it or to improve my own thing, ten years later.
Thanks for mentioning it, there's taste in this design. Very nice.
It is. I found it several years back but I finally built something on it recently and extending it was pretty simple (as someone who likewise doesn't know ruby, though I've written a little for Chef recipes).
I ended up writing a python parser for it (mainly just to get access to a different css parser), and the author also has a rust parser.
Yep, that would be a solution for any project which has a build pipeline (anyway).
For the better or the worse, when I deal with HTML directly, I usually just copy the HTML pages as they are on the server and have no mandatory Javascript, if at all. It's easier, and it's also "free backup on production".
i love s-expressions, but they have closing tags, so to me they are precisely not what the parent example shows. i prefer writing html with closing tags because then i can use any xml parser and my editor can jump from opening to closing tag and back.
Well, sorta. XML style syntax already has open and closing marks for the tags. They just additionally have marks for the end of non-attribute children. s-expressions not having that is still a win.
yes, s-expressions are a win (that's why i love them) but i don't see the xml structure that way. for me the tags themselves are very elaborate opening and closing markers. that's simply how i perceive it. if a closing tag is missing when there should be one, the structure looks broken to me.
the example s-expression matches that structure and i perceive it as the same because the first element in any list has a special meaning (in lisp itself it is the function name, while the rest are arguments)
to replicate the html5 structure in the top comment without closing tags as s-expressions i would come up with something different:
(table
(tr)
(td) one
(td) two
(td) three
(tr)
(td) another table
(td) with more stuff)
I think I see your point, but it feels weird to see this form, to me.
To your point, I can see the opening tag as ( and the closing as ), but then that would lead to something more like:
(table
(tr
(td one
(td two
(td three
(tr
(td ...
Which really just highlights how odd it is to see the opening tags without the closing ones. Which probably goes a long way to explaining why I don't like it. :D
Probably more natural to have something like:
(html
table
tr
td "one"
td "two"
tr
td "...")
In this way, you could optionally have the closing tags of /td and such. Still looks very weird to me.
Raw tables are "easier than any markdown" only if you don't mind it looking completely different.
My favorite simple tables look like:
| are | easier | than |
| any | markdown | flavor |
Granted, I don't know of any that allow colspan/rowspan well. And even fewer that support newlines in a cell. Styling can also be annoying. But simple tables are surprisingly annoying in most any of these options.
HTML tables are easier in the sense that if the length of the cell contents changes, or you add or remove lines within a single cell, that’s easy in HTML but inconvenient in Markdown (or requires special editor support).
Fair, "easier to write." They lose heavily on the "easy to read." (And folks that recognize my table as the org-mode table know that resizing the pipes is automatic for you, such that that shortcoming is addressed. :D )
Markdown was originally only intended as an authoring format, not a viewing format (it was just intended as an easier way to author HTML), and it is still used a lot in that manner (e.g. GitHub readmes).
Ish. Per https://en.wikipedia.org/wiki/Markdown, the goal was "to write using an easy-to-read and easy-to-write plain text format, optionally convert it to structurally valid XHTML (or HTML)."
My first reaction to your comment was that it was probably missing a "/s" at the end.
Rationale - I last did HTML when Netscape Navigator/ Communicator was still around. IIRC, usual discussion among us college students was how MS IE encouraged bad programming practices by ignoring (many?) unclosed tags, whereas Netscape wasn't as forgiving.
I've not seen much HTML since then (I definitely plan to, now). Your comment brought back some fond memories. Thank you! And mind blown.
To be pedantic, it's the tags that are optional, not the elements. The html, header and body elements are still there even if you don't write the tags.
Quoting from the document you linked:
"Omitting an element's start tag in the situations described below does not mean the element is not present; it is implied, but it is still there. For example, an HTML document always has a root html element, even if the string <html> doesn't appear anywhere in the markup."
...and this is why I still only do XHTML strict. I mean giving up predictability and the whole world of XML tools and libraries for these "benefits"... I really don't see the point.
Giving up what now? All the tools you need for HTML5 have existed for many, many years. You'd be "giving up" your tooling in the same way you're "giving up your car" if you buy a new car. You have a new car now, what's the problem?
Sorry, but how do I use XQuery with HTML5? How do I use all my generic XML tools with HTML5? I'm sure equivalents exist, but why learn them just so I can drop a few tags? Can't even use xmlindent if I do that to prettify it. If I need to embed HTML inside an XML document it's easier if it's XHTML than HTML5. It's much more convenient for most roles if HTML fits within existing XML flows. The only benefit of HTML5 is for those manually authoring them.
You use HTML5 query selectors, not xQuery. You don't use your generic XML tools, you use generic HTML5 tools. As for "why learn them so I can drop a few tags": because you don't, you make the tooling do it for you. Because that's what tooling is for.
If you need XML: use XML tools.
If you need HTML: use HTML tools.
If you need XHTML... you're writing web content for IE, and IE is dead. XHTML died with it. Feel free to stick with what you know, but that's a pretty ridiculous argument for why you shouldn't use the thing that's used these days instead of the thing we used 15 years ago =)
(hot damn, 15 years. It's been longer than most folks realize).
I'm mostly HTML5 at this point and have been for some time. The only exception I would make is things (say, Standard Ebooks) that haven't made the jump yet so their linters freak out if you don't do things exactly as they expect.
For the record, anyone who writes `<br />` as `<br>` is...misinformed.
It's not our (SE's) choice to use XHTML in ebooks, it's part of the epub spec. So we merely expect valid XHTML per the epub spec, and valid XHTML requires self-closing tags.
If I add a tr, does it nest "than" in the first row or does it move to the second row? Surely it can't nest because each tr effectively has a /tr before it.
>Raw html tables
<table>
<tr>
<td> are
<td> easier <tr><td>but ambiguous?
<td> than
<tr>
<td> any
<td> markdown
<td> flavor
</table>
I notice you closed the title, presumably because otherwise it's ambiguous as to whether the paragraph is in the title. It's so easy to make well-formed documents by closing tags intentionally rather than pushing the closing to be inferred by context by different rendering engines I can't see why you would want sloppy mark-up.
Implicit end tags aren’t sloppy at all. Documents using them are completely unambiguous and completely well‐formed, as has been the case since the inception of the language.
In fact, the fact that implicit end tags and implicit elements are unambiguous is why they exist as a feature at all. <td> cannot contain <tr>—the idea doesn’t make sense—so <td> followed by <tr> implicitly closes the <td>. <li> can’t contain <li>, so it implicitly closes. <p> can’t contain <p>, and so on. XML doesn’t help at all here: even if you use a strict XML parser, you can make a construct like “<p><p></p></p>”, and it’ll be ostensibly “well‐formed,” but it won’t be valid HTML.
The XML way of “make everything verbose, even when it’s already unambiguous” was a late addition to the language. And it leads to surprise like what’s visible throughout this thread, such as people expecting <div /> to act like <div></div> when it really acts like <div>.
The XML way is very good for XMLs purpose, which is enabling generic tooling that doesn't need to understand the semantics of the particular application language, which is great for building a foundation for tooling to support thousands of application languages most of which will have very few users so it is very good if much of the tooling cost is smeared across lots of other languages.
HTML is important enough that it warrants direct investment in tooling and doesn’t need the XML approach.
> “make everything verbose, even when it’s already unambiguous”
The idea is to make the language less context-sensitive and more consistent. It's better if the meaning of `<p>` and `<p/>` don't change based on the parent tag. Now, obviously they failed at making the semantic changes necessary to achieve the goal, but a true XML-based language would've been far simpler both for humans and computers than what we've got now. Most of the author's points are entirely valid, but the conclusion really should've been "HTML is bad."
But the meaning does change based on the parent tag. <p> is not allowed to be nested, just like <li> and many other elements. Those are rules anyone writing markup has to remember, whether XML or not. Allowing <li> to close the previous <li> is a natural extension of that rule, and strikes a good balance between the rigor of XML and brevity of Markdown, whose context rules I find to be both less consistent and incredibly difficult to remember. (Can a list contain a list? Does ^ superscript a whole line or just a word? What elements can contain a table?)
All whitespace remains meaningless. That is, the optional closing tag has nothing to do with a newline. Rather, they are defined as optional if the next tag is one of a set, or their parent is closed.
As the HTML Validator will warn, there really should also be a declaration of the language, typically by having a `lang` attribute on the `html` element.
This is the way. I really like that I can include arbitrary HTML when I need it, without complex escaping. Anyone who builds web pages already knows the syntax.
This is a fantastic article with some facts I didn’t know. I hadn’t thought about how the self-closing tag is effectively nothing more than a stylistic choice.
That said, it failed to persuade me. I’d still favour the self-closing style in any style guide. At least now it would be a more informed (and better scoped) preference.
Considering two style guides:
- <tag> is an opening tag
- </tag> is a closing tag
- <tag /> is a self-closing tag
Note: it remains the responsibility of the author to know which tags are self-closing.
Seems more useful than:
- <tag> is an opening tag or a self-closing tag
- </tag> is a closing tag
Admittedly, it’s subjective. That’s why any consistent guide is better than none.
It's only redundant if you expect people to memorize your style guide. If the style guide is instead enforced by a lint rule, then having the /> be required means that no one has to memorize the list of self-closing tags.
The writer of the HTML will be told off if they try to use /> on a tag that won't actually self-close, and they will also be told off if they neglect /> on a tag that self-closes.
Once written, the reader knows that a tag that ends in /> closes itself, and if it ends in > then there's definitely a closing tag somewhere.
But it doesn't even "work for particular names" because the `/>` part isn't what closes the element.
By the time the parser has seen the sequence `<br` it already knows which element this is, because it can only be the BR element, and has already finalized the DOM node for it because that's the rule for the BR tag. The moment it sees the opening angled bracket and tag, that node is created in an already closed state.
So those extra two `/>` do, in the most literal sense possible, nothing at all. They're allowed to be there for historical reason but they literally do nothing: they're treated as bytes to be ignored and the DOM parser skips over them while it looks for the next active token in order to continue building the DOM.
The parser treats `B<` and `p` as attributes of `br`, and determines that the `<br` tag ends with the `>` at the end of line 1.
If I end the `<br` with either `/>` or `>`, it parses correctly. Which means that `/>` and `>` are both considered valid endings for the `<br` tag, but you have to have one or the other and it will keep looking until it finds one.
Right, I think the person you’re replying to is partially correct in that the parser gets to “<br “ and recognizes that “this is be a br tag”, but it doesn’t close because tags can have any arbitrary attributes, and the W3C says br may also contain any of the global attributes. So, the parser indeed doesn’t just stop because it knows what tag it is and then doesn’t allow any attributes, it has to keep going until it reaches a valid closing “>”.
> Note: it remains the responsibility of the author to know which tags are self-closing.
And they should be able to get almost all of the way there by remembering one very simple rule: self-closing tags to insert something at that point in the document, open/close tags to describe the content they surround.
<div />
Hello
…should reformat to:
<div>Hello</div>
I would argue that this should probably reformat to
<div></div>
Hello
Although this is different from how browsers would interpret it, I think it's much more likely that whoever wrote <div/> intended to have an empty div.
You have a point and it is attractive, but I think a tool that reformats HTML should not change its actual meaning, the result should render the same way as the input unless explicitly asked. The original page could have been tested and a difference would be surprising.
It’s because the trailing “/“ has zero semantics now, by definition. You can add or remove it from any tag, and it doesn’t change the meaning of your HTML.
<div/>
some text
is thus equivalent to
<div>
some text
which in the absence of any further tags or content is interpreted as
<div>
some text
</div>
(Note however the exception for foreign content like SVG, as noted in the article.)
A td element's end tag may be omitted if the td element is immediately followed by a td or th element, or if there is no more content in the parent element.
Edit: I don't know the "div" rules. Don't see that here.
Edit2: I'm actually curious on if what I put here is wrong.
Sorta? The thread was about how odd it is that <div> has implicit end tags?
So, makes sense that the topic seems different, but it is the explanation for why a div grows to contain everything under it. For example, this is needed to know why the div in the following text closes.
Right? I get that the original confusion is thinking you can make a self closing div. But the fact that it expands to cover things that come after it requires you to know about the implicit closing. Hard to understand one without understanding the other.
Edit: I suppose it would be better for me to say that none of the examples on the immediate parent post are different, because none of them close the div? It isn't that the one div expands to cover everything after it, is that nothing caused it to be closed. Indeed, if those are just snippets from a document, you can't be sure that stuff that comes after it is also not included in the div.
If I understand it correctly, The `<foo />` syntax is an XML invention. It was never added to HTML. Traditional HTML parsers see the slash as a malformed attribute and ignore it, and that behavior was codified in HTML5.
The whole "we want to look like XML, but not /be/ XML" has to be one of the most annoying aspects of it. I shouldn't be too surprised on this particular edge, but holy crap I also wouldn't be shocked to know that I did it wrong in some pages.
I would tend to agree, which is why I made this suggestion. I think it's much more likely that what was meant is to insert an empty tag, even if that's not how browsers would interpret it.
> The only way you know that <input /> is acceptable and <div /> isn't, is learning and remembering which elements self-close. It sucks, but that's the way it is.
People remember which elements self-close by repeatedly seeing them written as self-closing tags. So while it indeed makes no difference from the browser standpoint whether you're closing your self-closing tag or not, it can be valuable to beginners.
No, people remember which elements self-close by remembering which elements can’t have meaningful content. In the days before XHTML, approximately nobody was confused about the distinction of which elements are self-closing and which have content.
Have you ever had to write an HTML5 parser, or read the spec? It's hysterical. The spec formalizes the parsing of common errors. There are rules for parsing malformed comments. There are rules for handling bad syntax.
The XHTML vision was that people would write it in a XML tree editor, which takes care of the nesting and syntax. Then you can't get the syntax wrong. There are JSON tree editors like that. But everybody wanted to use text editors. Tree editors exist, but have never been popular for some reason. Org mode is probably the most popular form of tree editor.
I'm surprised there isn't a move to write HTML in JSON syntax.
Yeah it's been a while since I was really in the markup weeds but that surprised me also. Though now that I think about it, I think I have run into that trying to create an empty div, realizing I needed to write it as <div></div> and rolling my eyes at the "inconsistency"
I'm pretty sure it was interpreted like that, at least in IE in the XHTML days. Between the end of the XHTML era and the beginning of HTML5 they formalized the distinction between self-closing and non-self-closing tags like <div> and <br/>. I think that was a mistake.
You misremember—it’s the other way around. IE never supported self‐closing arbitrary elements, because IE never supported rendering an (X)HTML page using an XML parser.
Requiring the slash to self‐close, and allowing any element to be self‐closed, were features of XML that never existed in HTML. When XHTML came around, rendering it required an XML‐capable parser, signaled by the webserver serving the page with the application/xhtml+xml MIME type instead of type/html. Doing so came with a slew of other effects required by XML like hard‐erroring when markup was not well‐formed.
This was a big change for both browsers and web designers, so the XHTML 1.0 spec featured a compatibility mode: since XHTML 1.0 was essentially a one‐to‐one representation of HTML 4 elements as XHTML elements, XHTML documents could be rendered with a plain HTML parser, so long as they followed various rules to look like an HTML page, such as self‐closing only elements that are self‐closable in plain HTML. To incentivize people to move to the new MIME type, the W3C made this compatibility allowance temporary and removed it from the next version, XHTML 1.1.
Of course, the consequence was inevitable. Effectively nobody used XHTML 1.1, and effectively nobody used application/xhtml+xml. W3C’s insistence on promoting breaking compatibility with the overwhelming majority of the web led to its loss of control of the HTML specification to WHATWG, a coalition of browser vendors. W3C’s nascent XHTML 2.0 withered and died, and WHATWG’s desired vision for HTML5 became the new standard de jure, not just de facto.
The HTML spec is bigger than the web, and consumed by other specifications produced by standards bodies like W3. So for example, ePub expects well-formed XHTML, which includes closing tags.[1] The HTML spec to this day still has XML-compatibility sections.[2][3]
Of course, feel free to do what you want on the web.
For ebooks, we don't rely on the browser to know what to do with sloppy code, because the reader may be not a browser. It may be a Kindle or worse. So being able to strictly validate is a benefit. Also, being able to use xml-based tools for editing and creating is a bonus.
I think that HTML5 parsing rules were a mistake. It makes it so unnecessarily complicated to parse and emit HTML. Instead of having a generic emitter that can format any set of tags you need to understand all of the special tags and deal with it. If it was just allowing some invalid constructs it wouldn't be so bad (but you would need to deal it with reading because people will generate whatever browsers accept) but you even need to be aware of these special rules when emitting. There are tags where adding in self-closing slashes breaks the markup, they need to have a separate start and end tag even if there is no content. So every emitter needs to check this list of hardcoded rules.
Welcome to 1999! XHTML has been created exactly for this reason, and that's even where the self-closing idea came from.
HTML5 syntax hasn't been designed, it's been reverse-engineered from the mess HTML has become while the XHTML has been failing to get proper (XML parsing mode) adoption.
The other nice thing abuot xhtml is you could generate it with xslt. So your AJAX calls (returning results as XML in those days) were easy to format for display and easy to sort, filter, and query without making another round-trip.
The third comment on the article is a great example of how this is widely misunderstood today:
>>> Only dumb people would write <div/> and expect it to be open. Don't change the language to accomodate people who can't understand basic syntax.
>> But it would be open. Is it a good language design if "only dumb people" get it right?
> I actually didn't know that <div/> is open, and can't believe this is so. So no, HTML is not a good language if this is possible. This makes the definition of open and closed tags meaningless.
Yet another reason to dislike JSX apparently: its parser deviates from the HTML5 spec. And at this point JSX is so well established (ossified?), fixing it would break countless sites when upgrading.
I'm fine with JSX being stricter. "HTML soup" is extremely hard and slow to parse, compared to JSX.
The community gave up shipping strict XHTML to the browser because the breakage could happen at runtime, whereas JSX disappears (and shows errors) at build time, so you can't ship broken JSX.
I think it's more a reason to ignore the HTML specs. They've never really meant anything and JSX is the closest thing to an actual standard that the web world has ever seen.
I don't think so. The HTML5 parsing spec is very precise now, and browsers follow it. To my knowledge, how you parse an HTML document is now very well defined.
Plus, JSX isn't even served to browsers at all. Now, I don't agree with the parent that it's wrong that JSX parses differently than HTML. They got rid of the issues caused by indentation and spacing code ending up meaning something and causing a different rendering than non spaced HTML document, which is a good thing. It could also totally be fixed if needed, you define version of your dependencies in charge of interpreting it. HTML is the "unfixable" one.
What do you mean?! The HTML standard is very consistent across browsers these days, while JSX is not an actual standard for how to format documents, its just an alternate interface to write JS expressions.
First, "standards"-compliant HTML was never that important because the browsers were all so tolerant of malformed HTML anyway, and continue to be. And these days, the standards matter even less when there's really just Webkit and Blink. Gecko is a rounding error. You can write strict HTML to your heart's delight, or just kinda half-ass it. It doesn't really matter either way, since most users will use the same renderer anyway.
Then, even if you DO adhere to strict standards, they are so low level as to be useless. HTML + CSS defines a barebones, ancient layout system for simple documents -- which most of the web isn't anymore. Want to do a simple nav bar? There is no simple "nav bar" standard. You hack it by overloading a list and repositioning the components using CSS, then sprinkle in ARIA args on top of these overloaded components to tell the screen reader exactly how you decided to abuse the markup. Want to do a responsive popup modal with a scrollable table inside? You create your own recipe for tag soup, with your own media queries and wraps and scrollbar hacks and hidden elements. And every dev ends up doing this. At the level of abstraction of an actual UI, there isn't a standard, just different hacks on top of HTML. "Semantic" HTML and JSON-LD etc. all came and went. In the end you have every webpage reinventing the wheel to create the simplest of UIs, because there wasn't really a "standard", just super low level rendering primitives. There is no real system for componentization or templatization, and every server side language invented their own.
Enter JSX, which primarily gives you a more useful system of abstraction: that of components, composition, and inheritance, rather than meaningless tags like "div" or "span". This system allowed the creation of an ecosystem (as opposed to snippets) where different packages from different teams (or companies, or OSS) can interact by passing parameters/properties back and forth, something that was pretty much impossible to do when you were just overlaying HTML on itself (early jQuery libs tried to do that and ran into constant problems with clashing event handlers, z-axis issues, CSS namespace conflicts, etc.). From there you can create style and component systems (like MUI or Chakra), or roll your company's own, from which downstream app and UI designers can actual compose business applications. This was all just basic functionality offered by the likes of WinForms and Visual Studio back in the 90s or before, but which HTML never really had until JSX took over. Then from there we got an extensive ecosystem of powerful typing, transpilation (including to native and Electron), linting, etc., all built on top of simple syntax and somewhat reusable components. There's a very simple reason React took over and became the real defacto standard: It was useful. JSX outside React also sees adoption from other frameworks because it is useful.
Purity is meaningless and pointless. Whatever standards are set today will be reinvented and ignored and superseded by the next big company or movement. From Netscape to Microsoft to Meta, every major company overloaded HTML because it was (and remains) uselessly low level. CSS and JS themselves took years to be adopted and fought long wars against ActiveX, Flash, and even Canvas -- again because HTML was so limited.
JSX frees you from the underlying rendering concerns (which users don't really care about, and frankly most devs and managers don't either) in favor of UI components and useful business logic. Even if the resulting output can still be tag soup (if you don't use a good UI framework), at least it's consistent tag soup because of component reusability and inheritance. And in the best cases, it can generate much cleaner output than HTML because you start from strictly typed and strictly linted XML-like syntax instead of overloaded HTML tags and CSS overrides, giving you a degree of intra-codebase consistency that's very difficult to achieve, and inter-package interoperability that was straight on impossible in the past.
I'd argue that JSX and React are the best things to have happened to the Web since, oh, I don't know... the image map?
> because the browsers were all so tolerant of malformed HTML anyway, and continue to be.
You must be young and not remember. IE, Firefox, and Safari were all tolerant of malformed HTML but in wildly different and sometimes utterly incompatible ways! HTML5 was like mana from heaven when it was released. It codified exactly how HTML should be parsed, especially in cases where the input was far from perfect. It is an error to handwave that away and does not validate JSX's behavior.
Speaking of which, HTML5 precedes JSX by several years. They CHOSE not to follow the HTML5 parsing standard. This means the imperfect markup you put in JSX gets a rendering offset from what is actually delivered to the browsers. Since browsers all render according to HTML5 parsing rules, JSX by definition misses the target.
To top it all off, JSX simply CAN'T fix this behavior. HTML5 parsing rules would break existing JSX-based apps.
You may love JSX, but it is unambiguously broken in this capacity. It's not about "purity". It's about consistency. And JSX fails this metric when applied to the actual web.
I do remember... have been doing web stuff since Netscape Navigator, before CSS and JS were invented. I remember that long after HTML5, CSS ACID tests continued to be a problem for years, not to mention JS incompatibilities despite ECMAScript. It wasn't the standards that actually made cross-browser rendering realistic, it was polyfills, CanIUse, and later, the dominance of Webkit + Blink.
Even these days, compatibility is done through transpilations and automatic polyfills. JSX DOES fix that behavior because it completely abstracts it away (together with webpack, esbuild, etc.) and renders it an afterthought. You can write and think in meaningful components, not HTML soup. The build step isn't just a meaningless cost but makes a lot of the problems of the old days less relevant.
Edit: What this ecosystem allows you to do is build on top of other people's work. So one team can spend days/months just publishing polyfills and builders, another team can build components, another team builds UIs, and together they can work together to build a cross-browser, cross-platform app -- something that is relatively doable now, but was a huge undertaking in the barebones HTML days. It's only a "de facto" standard, but it did more for app development than any HTML standards body ever did. I say that because I DO remember, and how businesses struggled year after year after year not despite but because of the fake "standards" that browsers never fully followed.
I think you’re way off here. The basic consistency of parsing between browsers is absolutely the result of the HTML 5 standard. Polyfills and transpilation are rarely necessary in modern web dev.
(Transpilation can improve your dev experience, but you can easily get by in a pinch with the JS features that modern browsers support natively.)
95% of what you said has nothing to do with JSX, and most of it would apply to any good framework, so its not even react-specific.
Why do you thing that a specific syntax for writing createElement gives all these positives?
What do you think all of this renders down to? That's right, html, js and css. So if those are not properly standardized and working then your house of cards falls no matter how well you have built it.
JSX is the first major JS component ecosystem, which spawned builders like webpack and with them easy polyfills and linters. The standards are thus enforced during dev and build, then polyfilled or transpiled to whatever you need, whether that's different browsers or even native, etc. The underlying HTML no longer matters except to the lib maintainers.
JSX is not a "component ecosystem". JSX is a way to embed HTML-like syntax in JS and convert that to function calls. It does not grant any additional capabilities besides letting you write `<div>` instead of `react.createElement('div')` or `div()` or `h('div')`. Are you actually meaning react when you say JSX?
Builders, polyfills, linters, components, frameworks, etc. existed before and exists independently of JSX and react.
> The underlying HTML no longer matters except to the lib maintainers.
Would you hire a frontend dev that didn't know the difference between a <head> and a <body>? A <input> and a <select>?
Even if you would (which IMO is a very bad idea), the only reason you can build anything is that there is a well standardized base, which consists of HTML, JS and CSS. If those where not as stable, standardized and consistent as they are you would not be able to build rich web apps on top.
I've written HTML for at least 20 years and far prefer the JSX syntax. It's simpler to learn and remember and has clearer whitespace behavior. JSX doesn't need to confirm to HTML5. Given the web's backwards compatibility requirements, something like JSX is the most successful way to create a stricter subset.
The time to rail against self closing tags is long past. No matter how good your arguments and opinions are, it's here to stay (with and without the '/>'). There are better and more meaningful windmills to tilt against.
If it doesn't do anything - if it doesn't matter, then why care?
I mean this quite literally. Is it worth expending any time or energy on changing habits and tooling when it doesn't matter, and will probably never matter?
> If it doesn't do anything - if it doesn't matter, then why care?
Because it's confusing.
I've been doing this a long time and didn't actually know that /> is meaningless, and it would explain some
random bugs I've worked around in the past.
I've seen people "fix" stuff by adding a "/>" as that's "better".
Other than that, I don't really care because as you mentioned it doesn't really matter. But the needless churn is a bit annoying at times, so not a bad thing to get the message out that it's a XHTML thing and doesn't matter in HTML.
In which case any other windmill could only be more meaningful. :)
I do get what you mean, and self-closing tags makes parsing HTML a royal pain... but it's been the status quo for decades now (self closing tags were in HTML 1.0 IIRC). It's not like they'll change it and break >90% of the internet† just to get rid of an oddity in the HTML spec.
† I do literally mean >90% - self closing tags are everywhere. They're in generated HTML, they're in manually created HTML, they're created by templating languages, and so on.
If a good subset of developers keep the ending slash anyway, what's the point of removing it?
I used to do proper XHTML with the correct mime type. I don't anymore, but I like the syntactic ending the closing slash on a void element gives. (I also end my <p> tags, and all the rest)
I really don't understand how the author thinks it is misleading.
So your argument essentially boils down to: "It is only easier to read when developers write correct HTML." That isn't as "gotcha" of a rebuttal as you think it is.
It is easier to read in the common case. That's more than enough.
PS - Also a lot of IDEs will flag <picture /> with "missing closing tag" or similar.
No, my argument is that / is completely useless and is a false signal in HTML, it really is just ignored. I never argued that it makes anything easier.
An unclosed <picture> doesn't automatically become valid, but at least there's nothing suggesting that it can be done.
I think "foreign content" is still the weirdest state in HTML5. Outside of it, it's rather clear: Depending on the content type, you're either using the XML parser or the HTML5 parser which is just its own thing and entirely divorced from XML (and SGML).
But during embedded SVG sections, you're still using the HTML5 parser, only that it temporarily pretends to read XML. I have no idea what the rules are there and what kind of subsets or supersets of XML the parser is accepting here.
Counterpoint: There are far more XML parsers and other XML tooling than HTML(5) parsers in the world. There is a reason for that, namely the HTML5 parsing algorithm.
I was always a fan of polyglot markup, a serialisation profile which is both valid HTML5 when send as text/html, well-formed XHTML when send as application/xhtml+xml and of course either when parsed in the absence of a media type. That seems to me as a good example of the first clause of Postel’s law.
This history is not quite correct: HTML was defined in SGML originally, and that did not support self-closing tags. Pulling in a DOCTYPE header and validating with, say, the W3C validator would flag self-closing tags.
The only thing I agree with here is that Prettier should remove the /> on tags that don't self-close, i.e. it should turn <div/> into <div>. Other than that, the use of self-closing tags is a cross between an aesthetic and a reminder of which tags self-close. I work with HTML pretty rarely, so while I can remember that <br> is self-closing, I don't necessarily remember what other tags work the same way, and the inclusion of /> in HTML I'm reading is a good reminder that it's self-closing.
<script> should probably have been superseded by <link> for external files ages ago, then you could have your self-closing tag and <script>var foo=42;</script> would still work too.
I stepped on that beartrap just last week, Chrome at least will swallow the rest of the page looking for the missing </script>.
Context: bulk export of thousands of legacy CGI HTML pages using "chrome --headless" to archive to PDF. The cause was a side effect of xmlstarlet which I used to inject links for CSS and JS rendering fix ups for PDF. The result was perfectly rendered 1-page blank PDFs :/
I think this misses a big selling point of XHTML, to be honest. It's not that you can reach for an XML parser, it's that you can reach for an XML *generator".
I prefer not to write HTML, but have it generated by an eDSL. Elm has Html, Kotlin has kotlinx.html, but there are many more. Nice ones are generated from the W3C html spec (as XML), kotlinx.html does this for instance.
This way HTML is "just code", I can put break points in, it allows for some type-safety and lets me use IDE tooling to refactor/etc.
I loved HAML (and RoR) when I was a dyn-lang believer. I've since learned some Haskell and now believe in strong typing. This also drove me away from HAML.
Kotlinx.html is pretty close being a type safe HAML.
I like the idea that '<', '/', '>' describe the structure, and the tag name describes the meaning of the element. I prefer to act as if html5 worked in this simple way.
Ugh. I no longer know how to write html, and that's fine. I let the computer write it. I use slim/haml/whatever and let the robots track the nuances of html.
It's been fixed. Send Content-Type: application/xhtml+xml and it will be parsed as an empty tag. But nobody brothers to turn that on. Without an opt-in it's unfixable due to backward compatibility.
There are a number of problems here, so I apologize if my approach seems scattered. I'll start narrow and go broader.
The major substantive issue with this proposal is adeptly pointed about arzke, here: https://news.ycombinator.com/item?id=36616677 Namely, using <br /> instead of <br> is a useful annotation to beginners who haven't fully internalized the rules about self-closing yet; it also, minimally, reduces cognitive load on more experienced developers, just like any other such annotation (e.g., a C comment when you skip fclose() because you're using stdin or stdout). While such annotations have no material affect on the final product, they are indispensable for beginners and externalize bookkeeping, reducing cognitive load even for more seasoned devs.
As a result, there are broader implications: proposals like these make the Web a less-friendly place for beginners. I believe the "old Web" had something valuable that's been lost. The creation of things like NeoCities (and its generally positive reception on HN: https://news.ycombinator.com/item?id=33648618 ) makes me believe that view is widely shared. Proposals that make learning HTML harder, even if only slightly, run counter to these shared values. (While omitting the annotation on self-closing tags is the tiniest of tiny drops in the bucket, so is the bandwidth saved; so while my objection is vulnerable to a criticism that the danger is de minimis, so is the OP position.)
Worst, though, is the strident smugness with which OP dismisses all the criticisms of his position, in his blog comments, in the related Twitter thread, and here on HN. "There's a whole section in the article about the risk to beginners" is only slightly more tactful than RTFA, but it nevertheless assumes the other commenter hasn't read the link (in violation of HN guidelines), and refuses to engage in a discussion on the merits, instead just asking for another hit on the OP's blog post.
Speaking of trolling for blog hits, this post, like all but one of jaffathecake's past 30 submissions, is pure self-promotion, which is obvious when you actually go to jakearchibald.com and see that the Twitter handle there is jaffathecake, and the gmail address there is jaffathecake. "Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity." https://news.ycombinator.com/newsguidelines.html
I vehemently disagree that HTML needs to be rid of all redundancies on the assumption that anyone reading it already has the rules about which tags are self-closing memorized and internalized. I even more strongly disagree that shameless self-promotion is within the spirit or text of HN's rules.
I disagree with your technical opinion: As an HTML beginner in 1995, I had no problems understanding the tags that contain vs those unable to contain (an image or horizontal rule won't contain a paragraph) and frankly find any extra symbols or markup properties take away legibility. In fact, I recall my biggest confusion was the introduction of div and span, because they both seemed to contain text, and both do something style-related but somehow distinct between the two that I didn't understand at the time.
Furthermore, I'm not a fan of your personal beef with Jake and think it's over the line to accuse self promotion when he's not selling a book or service or anything other than seemingly pushing an opinion.
jaffathecake has posted, on average, once per year for the past two years. Going back to his account beginning: it's five times per year.
He posts topics that are engaging (over 60 points per post for the past 30 posts, only 10 of these posts were ignored) as well as technical and usually interesting. He then has a discussion---there are certainly other blog posters who simply post and leave. I much prefer this largely posting of one's own work to people posting tangentially relevant (at best) articles from Politico or The Atlantic. Other people regularly link his posts, confirming that he has an audience on HN. He's not even reposting his old articles or posting three times in a week when a topic doesn't "catch" on the first two attempts like MANY of the karma-heavy serial posters on HN.
His comment style (aka stubbornness) probably comes from his own development experience and he doesn't see the perspective of people writing simple "gg=G" auto-indent algorithms using Python in a new world of custom HTML elements (web components) or writing regex to quickly add a property like aria-level to every tag that can contain. These might be inefficient and contrived use-cases, and he'd probably have a leg to stand on that the trailing slash isn't necessary in those two cases, and I'd like to see what that response looks like, because I tend to oscillate between using an SSG (which I despise because I find myself picking apart i18n bugs across multiple files---a task I'm doing today) and writing HTML by hand (where I find myself needing to bulk-edit with vim or sed). This---the complexity of most frameworks vs keeping it all in your head while learning why "strong" is now preferred over "b" and trying to hand-write responsive CSS---is the modern reality that is making larger-scale web development more inaccessible for tech-half-savvy folks.
I fail to see that he's selling anything. I don't see ads anywhere on his blog. He even rejects recruiters and advertising offers on his front page. If this whole blog of his is just a ploy to get people onto his other (potentially monetized) social media, he is not pushy about it AT ALL.
Note that some browsers may be unable to open this page because of the use of the Brotli compression algorithm[0]! If the page looks like gobbledegook when you open it, this may be why. This excludes anybody using browser versions from before 2016, as well as unfortunately many popular terminal-based browsers.
I tried it again with "deflate" and "identity", and in each case got back a plaintext response with no compression.
Maybe your browser doesn't send the Accept-Encoding header at all? In that case, according to the spec, "If no Accept-Encoding header field is in the request, any content coding is considered acceptable by the user agent."[0] If your browser doesn't send Accept-Encoding but still relies on a specific encoding or no encoding, it's your browser that is broken, not the server.
I will say that it was weird because at the time, I also tested loading the main page and a couple other blog posts on the site. Only this particular one, however, was an issue, and I figured it was probably an anti-hug mechanism where the site forces Brotli compression for a resource if that resource was frequently being accessed over a certain threshold.
I honestly have no idea though. It's weird. What I do know is that it wasn't just my browser - I was having a similar issue with wget, where it got back encoded Brotli data without having asked for it. (wget does support Brotli, but doesn't send an Accept-Encoding header or, it seems, decode the output unless told to specifically.) But, again, I'm not having the issue now, and as you say, not sending an Accept-Encoding header means anything goes so it's not really the same thing as how it worked on my browser.
Hmm, I got br when I dropped the header (see below). Either way, the server's behavior matches the spec, so there's something up with the client if they're seeing nonsense.
Request:
GET https://jakearchibald.com/2023/against-self-closing-tags-in-html/
Accept: text/html
Ah, you know what, IntelliJ automatically adds an Accept-Encoding header if I leave it off, I didn't realize that until just now. Curl is a better test because it doesn't try to be clever.
There's still definitely something wrong with OP's client, because the server does exactly what it's supposed to.
Of actual practical consequence are optional html5 tags [0]. They allow you to write html by hand as easily as if you wrote markdown. Some graybeards even say that optional html5 tags make markdown entirely unnecessary.
Thus, lists don't need closing item tags:
Tables become uncluttered: And you can even write the entire document in that style: No need for stupid html, header, body elements. They are optional![0] https://html.spec.whatwg.org/multipage/syntax.html#optional-...