Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apple's "textedit" can't open files with these characters in them. It reports "The document “test.txt” could not be opened. Text encoding Unicode (UTF-8) isn’t applicable."


Works fine for me. What version of Mac OS X/TextEdit are you using? Are you sure you are saving it (and opening it) as UTF-8?


Sorry, my mistake. I was saving a file from textmate with the string "𝖙𝖊𝖘𝖙" in it and then opening with "open -a textedit eg.txt".

The same experiment with cat in place of textmate works fine, so it's textmate that is buggy.

According to "od -x1", textmate is writing:

  0000000    ed  a0  b5  ed  b6  99  ed  a0  b5  ed  b6  8a  ed  a0  b5  ed
  0000020    b6  98  ed  a0  b5  ed  b6  99                                
So textedit is right to complain.


Yeah, looks like Textmate is simply running the UTF-8 algorithm over UTF-16 code units, so each surrogate is being turned into a single UTF-8 code unit (which decodes to an invalid character).

It turns out that this is such a common mistake that there's even a name for this encoding, CESU-8: http://en.wikipedia.org/wiki/CESU-8


"Isn't applicable?" What kind of an error message is that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: