Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does the proposed regular expression really handle embedded script content correctly? From my limited understanding of HTML, pretty much only </script> counts as closing the script contents and everything else is treated as part of the script.


Does it matter? You can create regex with lookahead for </script>. The point is that it's possible to solve the problem from SO this way, due to the nature of the problem, not that this particular expression is perfectly correct.


It seems the article does defend "it's possible" for strictly regular expressions. To allow lookahead will be context-sensitive, so not in that spirit.


The question is about XHTML though, not HTML which have a more complex syntax.


To me it’s a bit ambiguous if the original question is about both html and xhtml. It’s tagged with both


The headline says XHTML. HTML does not even have the self-closing tags the question is concerned about.


The headline said that the author was specifically looking to avoid 'xhtml self-contained tags', but that doesn't mean they could assume the document they were looking in was valid XHTML - just that it might contain XHTML-style 'self-closing' tags.

I've seen plenty of plain HTML files that aren't well-formed XML yet contain <br/> tags.


Yeah the trailing slash in <br /> is legal in HTML, but it doesn't actually make the tag self-closing. For example <b /> is still an opening tag which require a </b>.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: