I don't like the way UTF-8 was clipped to only 1 million codepoints in 2003 to m...

kzrdude · on Jan 17, 2014

What use would it have to have so much extra codepoint space?

vorg · on Jan 17, 2014

2 planes (130,000) of private-use codepoints aren't enough, and because the top 2 planes of Unicode are designated private use, UTF-16 gives developers the option of extending them to 2.1 billion if they need it. I've wanted extra private-use space for generating Unihan characters by formula in the same way the 10,000 Korean Hangul ones are generated from 24 Jamo. I'm sure many other developers come across other scenarios where 130,000 isn't enough for private use.

I'm simply saying that UTF-8 shouldn't be crippled in the Unicode/ISO spec to 21 bits, but be extended to 31 bits as originally designed because the technical reason given (i.e. because UTF-16 is only 21 bits) isn't actually true. The extra space should be assigned as more private use characters. (Except of course the last two codepoints in each extra plane would be nonchars as at present, and probably also the entire last 2 planes if the 2nd-tier "high surrogates" finish at the end of a plane.)

scintill76 · on Jan 17, 2014

Part of the reason this is a problem is because someone probably said "Who could need more than 16 bits' worth of codepoints?", so I'd err on the side of extra codepoint space.