I’m going to focus on one aspect of the article only.
> You might want to start with a lightweight markup language, which will ironically be geared toward generating HTML. Markdown’s strict specified variation, Commonmark, seems like a pretty decent choice.
Markdown of any form would be a catastrophically bad choice for something like this. Remember this: Markdown is all about HTML. Markdown has serious limitations and is very poorly designed in places, but it works well enough despite being so bad specifically because underneath it’s all HTML, so you can fall down to that when necessary. If you go with Markdown, you’re not actually supporting “Markdown” as your language, but HTML. If the user writes HTML in their Markdown, what are you going to do about it? You’re supposed to be able to write the HTML equivalent of any given Markdown construct and have it work, and I find that perfectly normal documents need to write raw HTML regularly, because Markdown is so terribly limited—so your document web is either (a) not actually Markdown and uselessly crippled (seriously, plain Markdown without HTML is extremely limiting for documents), or (a) actually completely back to being a subset of HTML, which is exactly what you said you didn’t want it to be (rule #1).
If you want to get away from HTML, you want something like reStructuredText, where you don’t have access to HTML (unless the platform decides to set up a directive or role for raw HTML), but have a whole lot more semantic functionality, functionality that had to be included because HTML wasn’t there to fall back on. I think AsciiDoc is like this too.
But the whole premise that “HTML must be replaced because it’s a performance or accessibility bottleneck for documents” is just not in the slightest bit true. HTML is fine. Even CSS is well enough (though most pages have terribly done stylesheets that are far heavier than they should be, e.g. because they include four resets and duplicate everything as well as overriding font families five times). The problem lies almost entirely in what JavaScript makes possible—not that JavaScript is inherently a problem, but it makes problems possible. The proposed variety of document web would be unlikely to perform substantially better than the existing web with JavaScript disabled.
All up, Robert O’Callahan’s response is entirely correct. You can’t split the web like this; you will fail. Especially: if your new system is not compatible with the old (which your rule #2 precluded), it will fail, unless it is extremely more compelling for everyone (rule #3), but I don’t believe there’s anywhere near enough scope for improvement to succeed.
> Remember this: Markdown is all about HTML. Markdown has serious limitations and is very poorly designed in places, but it works well enough despite being so bad specifically because underneath it’s all HTML, so you can fall down to that when necessary.
Originally, this was the case, but I'd argue that modern MarkDown flavors have been separated from HTML. It's common to compile via other pathways than HTML now (e.g. MarkDown → LaTeX → PDF), and many implementations don't support inline HTML anymore (e.g. many MarkDown-based note-taking apps).
> I find that perfectly normal documents need to write raw HTML regularly, because Markdown is so terribly limited
Out of curiosity, what do you need it for? I write a lot of MarkDown, but haven't felt any need for writing raw HTML. Modern versions of MarkDown (e.g. the Pandoc variant) have native support for things like tables, LaTeX equations, syntax-highlighted code, even bibliography management.
These are the three most common cases that I find for needing raw HTML:
1. Adding classes to elements, for styling; admittedly this may be inapplicable to some visions of a document web, if you can’t write stylesheets.
2. Images. If you’re dealing with known images, you should always set the width and height attributes on the <img> tag, so that the page need not reflow as the image loads. Markdown’s image syntax () doesn’t cover that. (Perhaps an app could load the image and fill out the width and height as part of its Markdown-to-HTML conversion, but I haven’t encountered any that do this.)
3. Tables. CommonMark doesn’t include tables, and even dialects that do support tables are consistently insufficient so that I have to write HTML. For example: I often want the first column to be a heading; but I don’t think any Markdown table syntaxes allow you to get <th> instead of <td> for the first cell of each row.
Fair enough :). For the record, Pandoc MarkDown supports all of these via its extended MarkDown syntax. For the first you can write [desc](src){.test} to get a class=test attribute on the link, for example. For the second, you can write {width=50%} to set the image size. For the last, tables do automatically get <th> on the first cell of each row when converted via Pandoc. This is however not standard MarkDown but Pandoc's extended version of MarkDown.
> You might want to start with a lightweight markup language, which will ironically be geared toward generating HTML. Markdown’s strict specified variation, Commonmark, seems like a pretty decent choice.
Markdown of any form would be a catastrophically bad choice for something like this. Remember this: Markdown is all about HTML. Markdown has serious limitations and is very poorly designed in places, but it works well enough despite being so bad specifically because underneath it’s all HTML, so you can fall down to that when necessary. If you go with Markdown, you’re not actually supporting “Markdown” as your language, but HTML. If the user writes HTML in their Markdown, what are you going to do about it? You’re supposed to be able to write the HTML equivalent of any given Markdown construct and have it work, and I find that perfectly normal documents need to write raw HTML regularly, because Markdown is so terribly limited—so your document web is either (a) not actually Markdown and uselessly crippled (seriously, plain Markdown without HTML is extremely limiting for documents), or (a) actually completely back to being a subset of HTML, which is exactly what you said you didn’t want it to be (rule #1).
If you want to get away from HTML, you want something like reStructuredText, where you don’t have access to HTML (unless the platform decides to set up a directive or role for raw HTML), but have a whole lot more semantic functionality, functionality that had to be included because HTML wasn’t there to fall back on. I think AsciiDoc is like this too.
But the whole premise that “HTML must be replaced because it’s a performance or accessibility bottleneck for documents” is just not in the slightest bit true. HTML is fine. Even CSS is well enough (though most pages have terribly done stylesheets that are far heavier than they should be, e.g. because they include four resets and duplicate everything as well as overriding font families five times). The problem lies almost entirely in what JavaScript makes possible—not that JavaScript is inherently a problem, but it makes problems possible. The proposed variety of document web would be unlikely to perform substantially better than the existing web with JavaScript disabled.
All up, Robert O’Callahan’s response is entirely correct. You can’t split the web like this; you will fail. Especially: if your new system is not compatible with the old (which your rule #2 precluded), it will fail, unless it is extremely more compelling for everyone (rule #3), but I don’t believe there’s anywhere near enough scope for improvement to succeed.