I don't think this is strictly a technical problem, at least not when it happens in international contexts (it's inexcusable for your own country's authorities to not be able to record your actual culturally-specific name).
The reality is that there is a limited char set that is actually understandable at an international level, and it's not that different from ASCII. Even with paper systems, if you go to Rome to sign into a hotel and you give your name as 依诺, they will not be able to even write it down, nevermind pronounce it. And even if you tell them it's Yī Nuò, they will likely ignore the accents since they won't know what those mean. Similarly, if you go to China and say your name is Sângeorz-Băi, they will not know what the diacritics mean and will not be easily able to write them down.
In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in, and that includes names. The situation in writing is actually much much better, even if limitted to ASCII, than it is in actual spoken language. Maybe you can write my name down perfectly (Simionescu), but I would bet you won't use the proper pronunciation unless you happen to know Romanian - you will likely use different vowels and consonants.
What's interesting is that the transliteration of symbols is not even remotely uniform in icao 9303. There are multiple recommended transliterations of some characters, and it definitely goes only in one direction: national script -> MRZ transliteration. It is not possible to go the other direction.
It's not intended to round-trip, it's intended to be roughly human-readable without knowledge of the original script. It's pretty close to the system Olympics used, with the Wikipedia example of Hämäläinen -> HAEMAELAEINEN being well known as a gold medalist cross-country skiier.
Newer versions of the transliteration encourage stripping diacretics, so that would be HAMALAINEN. Much more readable to native speakers, but obviously loses information.
I wouldn't recommend it as there are official tranformations from the countries which uses diacretics and they most times are not to strip them. It's kinda another case of people forcing stuff onto other cultures. And if you do bussiness in some of the countries in some industries you might even get into legal trouble if you apply that.
Take it up with the spec, then. That is the recommendation:
> Section 6 of the 9303 part 3 document specifies transliteration of letters outside the A–Z range. It recommends that diacritical marks on Latin letters A-Z are simply omitted (ç → C, ð → D, ê → E, ñ → N etc.), but it allows the following transliterations: [...]
You said
> you might even get into legal trouble if you apply that.
We're talking about passports, this seems not relevant. For passport-related use such as travel, you use the form of the name written on the passport, exactly as-is.
It seems odd to me to arbitrarily restrict the alphabet if the only requirement is that the data has to be readable by a machine through an OCR system. They could have easily used the Latin alphabet to encode arbitrary bit strings.
> alphabet if the only requirement is that the data has to be readable by a machine
its the only requirement because
1. it's only meant for OCR
2. it's clear that it won't be used by only OCR but at least also human interacting with the system, potentially phone calls passing this information by voice, and anyone who can't pronounce the original spelling. For fairness if you e.g. didn't sing at all in your life are somewhat tone deaf and now are expected to pronounce a asian you probably have to spend days or more until you can do so (just as a extreme example).
If you mean ASCII, you'll also notice that it happens to correspond pretty well with the entirety of writing symbols which have broad global recognition, and this has been true for a long time. Sure, it's missing many many culturally specific things like accents and other diacritics, non-latin writing systems etc.
But none or at least very few of the symbols missing from ASCII are actually broadly understood by people from more than a handful of countries (which can still mean a billion plus people in the case of Chinese, Arabic, and Indian scripts, of course).
Probably Arabic is the biggest counterpoint to my claim, as there is quite a large array of countries across two continents that recognize it. However, even there, there are far more people in countries which use Arabic writing that also recognize Latin letters than the other way around.
The many diacritics used by various European languages are definitely NOT something that has any wide adoption or meaning. Perhaps only the umlaut sign and the accent are even used by more than one or two European languages.
So again, my claim is that any system of writing that is intended for global international communication will have to restrict all names to the A-Z characters in ASCII with spaces as separators (and perhaps 0-9 and a few other characters that would anyway get ignored). Nothing else will work if people around the world are supposed to recognize the name in some meaningful sense. And relying entirely on automated OCR is a no-go for many use cases.
And just like people who interact with those outside their cultures have to accept that their name will be pronounced in a myriad of ways, they have no reason not to accept that it will be written in different ways as well.
> it's missing many many culturally specific things like accents and other diacritics
fun fact: some of the symbols included in ASCII were intended to be used as (non-spacing) diacritical marks, specifically the tilde/caret/backquote characters...
[too lazy to dig up a proper source at the moment but the Wikipedia ASCII article covers some of this]
I completely agree it's not purely technical nor purely social but it is a real problem. Personally, I lost out on an equity options exit event due to delays caused partly by these exact issues and visas.
In Japan, all residents (citizens and people living with a visa) have to have a katakana spelling of their name. Web forms usually ask for this (in addition to your name spelled in kanji, which of course is impossible if your name is in latin characters, so you just hope the form accepts latin), and it's used in many other places as well, such as for bank accounts. Sometimes places will take your latin-character name and transliterate it themselves, and then this causes problems when their transliteration doesn't match that of other places. (katakana has far fewer distinct sounds available than most other languages, so the transliteration always loses information, and there's usually different ways to do it.) Even worse, many forms (like web forms) have rather short character limits for the name field, so with a transliterated Western name, it many times just won't fit within the ~10 characters they allocate.
> In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in […].
Oh… where do we start… We do not have to go as far as finding an intersection of multiple languages. Consider English as an example. I have written up a fictitious but a reasonably real dialogue between:
1. A layperson from a lower socio class, who has not attained high educational levels and speaks English using vernacular and predominantly Germanic vocabulary of the English language.
2. A state citizen of the upper-class descent who speaks English almost exclusively with Latin/French/Greek-derived vocabulary.
This layperson complains about not receiving a welfare payment from the state.
Layperson (L): Oi mate, I ain't got me geld from the state yet. That's daft, ain't it? Every man's got a right to his share, right?
Educated citizen of the upper class descent (U): Pardon my incredulity, but are you referencing the monetary allocation designated by the government for individuals of a particular socio-economic standing?
L: Eh? Oh, you mean the dole? Yeah, that. They owe me, but there's no dosh in me pocket yet.
U: If I interpret your sentiment correctly, you are perturbed due to the delayed disbursement of your financial entitlement. Have you endeavoured to communicate with the pertinent authorities?
L: Talk to who now? Oh, you mean the blokes at the town hall? Aye, but they keep spieling some rubbish. Can't make head or tail of it.
U: My advice would be to liaise with the relevant office, elucidate your predicament, and seek resolution. It is paramount to ensure you have met all requisite criteria for the stipend.
L: Right, so you're saying I should have a natter with 'em and make sure everything's shipshape? Just want what's owed to me, y'know.
U: Precisely. Engage in a dialogue with them, ascertain the cause of the discrepancy, and ensure you have fulfilled the necessary prerequisites for the allocation. You deserve your due compensation.
L: Cheers for that. It's a bit of a muddle, all this, but I reckon I'll give it another whirl.
U: I wish you fortitude in your pursuits. If there is an inherent right to such financial assistance, it is imperative you receive it posthaste.
Even though the layperson does understand responses in the fictitious dialogue, that would not be the case in real life. Both speak the same language, yet the responses are generally incomprehensible to the layperson.
In some cases I’d even suspect the incomprehension might go both ways. If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return. Sticking to their upper-class dialect would be passive-aggressive oppression born out of class contempt…
> In some cases I’d even suspect the incomprehension might go both ways.
Which is also true.
It was a thought experiment to highlight the fact that the lack of comprehension could also arise within the boundaries of a single language. In linguistics, the term for this specific phenomena is «the social register», and there are plenty of active and thriving language that employ the social register in the daily speech. Korean, for instance, is renowned for having a highly complex system of the social registers (effectively, parallel vocabularies) embedded in the spoken language. There are other languages as well.
> If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return.
And that is also true. Social registers have largely disappeared from mainland European languages, yet an English accent and the choice of the words of an English speaker can reveal sufficient details about their socio-economic background.
The reality is that there is a limited char set that is actually understandable at an international level, and it's not that different from ASCII. Even with paper systems, if you go to Rome to sign into a hotel and you give your name as 依诺, they will not be able to even write it down, nevermind pronounce it. And even if you tell them it's Yī Nuò, they will likely ignore the accents since they won't know what those mean. Similarly, if you go to China and say your name is Sângeorz-Băi, they will not know what the diacritics mean and will not be easily able to write them down.
In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in, and that includes names. The situation in writing is actually much much better, even if limitted to ASCII, than it is in actual spoken language. Maybe you can write my name down perfectly (Simionescu), but I would bet you won't use the proper pronunciation unless you happen to know Romanian - you will likely use different vowels and consonants.