Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Try that regex on

  < script> console.log("<script2>"); </script>
Edit 1: I'm unsure if the inner <script2> is valid (X)HTML, so it might not be an issue of being unable to parse correct (X)HTML, but rather an issue of being unable to detect invalid (X)HTML. (Can someone verify?)

Edit 2: It seems Chrome chokes on the space... does anyone know if the initial space is valid? I'm pretty sure I've seen parsers that accept it...



> Edit 2: It seems Chrome chokes on the space... does anyone know if the initial space is valid? I'm pretty sure I've seen parsers that accept it...

Most browsers probably can deal with it, but it's not valid xml/html. Try passing it through a validator, it'll complain about foreign characters after `<` and then complain about a trailing `</script>` as <script> was never opened in the first place.


But it should be easy based on this example to include correct HTML tags in the script which the regular expression will emit. Or if you want to recognise HTML tags in the script, you can easily obfuscate construction of in the script using string concatenation.


I don’t think any part of that is valid XML. There cant be space between < and the tag name, and I believe content containing tags should be in a CDATA section.


The question is about XHTML, not HTML. HTML does not even have the self-closing tags the question is concerned about.

In XHTML, either the opening angle bracket must be escaped, or the script should be in a CDATA section.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: