Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a youngster entering the IT professional circles, I was enamoured with SGML: creating my own DTDs for humane entry for my static site generator, editing my SGML source document with Emacs sgml-mode. I worked on TEI and DocBook documents too (and was there something related to Dewey coding system for libraries?).

However, processing fully compliant SGML, before you even introduce DSSSL into the picture, was a nightmare. With only one open source and at the same time the only fully compliant parser (nsgml), which was hard to build on contemporary systems, let alone run, really using SGML for anything was an exercise in frustration.

As an engineering mind, I loved the fact you could create documents that are concise yet meaningful, and really express the semantics of your application as efficiently as possible. But I created my own parsers for my subset, and did not really support all of the features.

HTML was also redefined to be an SGML application with 4.0.

I originally frowned on XML as a simplification to make it work for computers vs for humans, but with XML, XSLT, Xpath... specs, even that was too complex for most. And I heavily used libxml2 and libxslt to develop some open source tooling for documentation, and it was full of landmines.

All this to say that SGML has really spectacularly failed (IMO) due to sheer flexibility and complexity. And going for "semantic HTML" in lieu of SGML + DSSSL or XML + XSLT was really an attempt to find that balance of meaning and simplicity.

It's the common cycle as old as software engineering itself.





> HTML was also redefined to be an SGML application with 4.0

Nope, it was intended as SGML from the get go; cf [1].

> SGML has really spectacularly failed (IMO) due to sheer flexibility and complexity

HTML (and thus SGML) is the most used document language there ever has been, by far.

[1]: https://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html


I stand corrected: HTML was defined as an SGML application from the very first published version in 1993 (https://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt), but I know the original draft in 1990-91 was heavily SGML inspired even if it didn't really conform to the spec (nor provide a DTD). Thanks for pointing this out, it's funny how memory can play games on us :)

While HTML is clearly the most used document markup language there has ever been, almost nobody is using an SGML-compliant parser to parse and process it, and most are not even bothering with the DTD itself; not to mention that HTML5 does not provide a DTD and really can't even be expressed with an SGML DTD.

So while HTML used to be one of SGML "applications" (document types, along with a formal definition), on the web it was never treated as such, but as a very specific language that is inspired by SGML and only inspired by the spec too (since day 1, all browsers accepted "invalid" HTML and they still do).

Ascribing the success to SGML is completely backwards, IMHO: HTML was successful despite it being based on SGML, and for all intents and purposes, majority never really cared about the relationship.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: