I don't think this is strictly a technical problem, at least not when it happens...

RcouF1uZ4gsC · on Oct 25, 2023

This is actually codified for passports

https://en.wikipedia.org/wiki/Machine-readable_passport#Name...

noodlesUK · on Oct 25, 2023

What's interesting is that the transliteration of symbols is not even remotely uniform in icao 9303. There are multiple recommended transliterations of some characters, and it definitely goes only in one direction: national script -> MRZ transliteration. It is not possible to go the other direction.

Take a look at the spec if you are interested: https://www.icao.int/publications/documents/9303_p3_cons_en.... The transliteration tables start on page 24 and are scattered throughout the document.

yencabulator · on Oct 25, 2023

It's not intended to round-trip, it's intended to be roughly human-readable without knowledge of the original script. It's pretty close to the system Olympics used, with the Wikipedia example of Hämäläinen -> HAEMAELAEINEN being well known as a gold medalist cross-country skiier.

Newer versions of the transliteration encourage stripping diacretics, so that would be HAMALAINEN. Much more readable to native speakers, but obviously loses information.

dathinab · on Oct 25, 2023

> encourage stripping diacretics

I wouldn't recommend it as there are official tranformations from the countries which uses diacretics and they most times are not to strip them. It's kinda another case of people forcing stuff onto other cultures. And if you do bussiness in some of the countries in some industries you might even get into legal trouble if you apply that.

yencabulator · on Oct 25, 2023

Take it up with the spec, then. That is the recommendation:

> Section 6 of the 9303 part 3 document specifies transliteration of letters outside the A–Z range. It recommends that diacritical marks on Latin letters A-Z are simply omitted (ç → C, ð → D, ê → E, ñ → N etc.), but it allows the following transliterations: [...]

You said

> you might even get into legal trouble if you apply that.

We're talking about passports, this seems not relevant. For passport-related use such as travel, you use the form of the name written on the passport, exactly as-is.

deadbeeves · on Oct 25, 2023

It seems odd to me to arbitrarily restrict the alphabet if the only requirement is that the data has to be readable by a machine through an OCR system. They could have easily used the Latin alphabet to encode arbitrary bit strings.

dathinab · on Oct 25, 2023

> alphabet if the only requirement is that the data has to be readable by a machine

its the only requirement because

1. it's only meant for OCR

2. it's clear that it won't be used by only OCR but at least also human interacting with the system, potentially phone calls passing this information by voice, and anyone who can't pronounce the original spelling. For fairness if you e.g. didn't sing at all in your life are somewhat tone deaf and now are expected to pronounce a asian you probably have to spend days or more until you can do so (just as a extreme example).

imtringued · on Oct 25, 2023

How do you pronounce 64656164626565766573?

deadbeeves · on Oct 25, 2023

You don't? It's machine-readable.

ninkendo · on Oct 25, 2023

Thank you for providing sanity here. Not every problem comes from idiot Americans just being lazy.

dathinab · on Oct 25, 2023

except that the system they mention was (co) created by the US with the US being a major player deciding on the rules it uses...

tsimionescu · on Oct 25, 2023

If you mean ASCII, you'll also notice that it happens to correspond pretty well with the entirety of writing symbols which have broad global recognition, and this has been true for a long time. Sure, it's missing many many culturally specific things like accents and other diacritics, non-latin writing systems etc.

But none or at least very few of the symbols missing from ASCII are actually broadly understood by people from more than a handful of countries (which can still mean a billion plus people in the case of Chinese, Arabic, and Indian scripts, of course).

Probably Arabic is the biggest counterpoint to my claim, as there is quite a large array of countries across two continents that recognize it. However, even there, there are far more people in countries which use Arabic writing that also recognize Latin letters than the other way around.

The many diacritics used by various European languages are definitely NOT something that has any wide adoption or meaning. Perhaps only the umlaut sign and the accent are even used by more than one or two European languages.

So again, my claim is that any system of writing that is intended for global international communication will have to restrict all names to the A-Z characters in ASCII with spaces as separators (and perhaps 0-9 and a few other characters that would anyway get ignored). Nothing else will work if people around the world are supposed to recognize the name in some meaningful sense. And relying entirely on automated OCR is a no-go for many use cases.

And just like people who interact with those outside their cultures have to accept that their name will be pronounced in a myriad of ways, they have no reason not to accept that it will be written in different ways as well.

kiwidrew · on Oct 25, 2023

> it's missing many many culturally specific things like accents and other diacritics

fun fact: some of the symbols included in ASCII were intended to be used as (non-spacing) diacritical marks, specifically the tilde/caret/backquote characters...

[too lazy to dig up a proper source at the moment but the Wikipedia ASCII article covers some of this]

ninkendo · on Oct 25, 2023

You’re missing the point. This is not a technical problem in the first place, as the poster I replied to did a great job explaining.

pseudonamed · on Oct 25, 2023

I completely agree it's not purely technical nor purely social but it is a real problem. Personally, I lost out on an equity options exit event due to delays caused partly by these exact issues and visas.

giamma · on Oct 25, 2023

Hong Kong natives, even if from a Chinese heritage, have to have an English name.

And software in those countries always has the "English name" field.

shiroiuma · on Oct 26, 2023

In Japan, all residents (citizens and people living with a visa) have to have a katakana spelling of their name. Web forms usually ask for this (in addition to your name spelled in kanji, which of course is impossible if your name is in latin characters, so you just hope the form accepts latin), and it's used in many other places as well, such as for bank accounts. Sometimes places will take your latin-character name and transliterate it themselves, and then this causes problems when their transliteration doesn't match that of other places. (katakana has far fewer distinct sounds available than most other languages, so the transliteration always loses information, and there's usually different ways to do it.) Even worse, many forms (like web forms) have rather short character limits for the name field, so with a transliterated Western name, it many times just won't fit within the ~10 characters they allocate.

Paul-Craft · on Oct 26, 2023

Interesting. Is that some kind of leftover from British rule or something?

philistine · on Oct 26, 2023

I can't get Amazon to deliver mail to my address because of a diacritic, but it's my own fault, not Amazon's.

inkyoto · on Oct 25, 2023

> In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in […].

Oh… where do we start… We do not have to go as far as finding an intersection of multiple languages. Consider English as an example. I have written up a fictitious but a reasonably real dialogue between:

1. A layperson from a lower socio class, who has not attained high educational levels and speaks English using vernacular and predominantly Germanic vocabulary of the English language.

2. A state citizen of the upper-class descent who speaks English almost exclusively with Latin/French/Greek-derived vocabulary.

This layperson complains about not receiving a welfare payment from the state.

Layperson (L): Oi mate, I ain't got me geld from the state yet. That's daft, ain't it? Every man's got a right to his share, right?

Educated citizen of the upper class descent (U): Pardon my incredulity, but are you referencing the monetary allocation designated by the government for individuals of a particular socio-economic standing?

L: Eh? Oh, you mean the dole? Yeah, that. They owe me, but there's no dosh in me pocket yet.

U: If I interpret your sentiment correctly, you are perturbed due to the delayed disbursement of your financial entitlement. Have you endeavoured to communicate with the pertinent authorities?

L: Talk to who now? Oh, you mean the blokes at the town hall? Aye, but they keep spieling some rubbish. Can't make head or tail of it.

U: My advice would be to liaise with the relevant office, elucidate your predicament, and seek resolution. It is paramount to ensure you have met all requisite criteria for the stipend.

L: Right, so you're saying I should have a natter with 'em and make sure everything's shipshape? Just want what's owed to me, y'know.

U: Precisely. Engage in a dialogue with them, ascertain the cause of the discrepancy, and ensure you have fulfilled the necessary prerequisites for the allocation. You deserve your due compensation.

L: Cheers for that. It's a bit of a muddle, all this, but I reckon I'll give it another whirl.

U: I wish you fortitude in your pursuits. If there is an inherent right to such financial assistance, it is imperative you receive it posthaste.

Even though the layperson does understand responses in the fictitious dialogue, that would not be the case in real life. Both speak the same language, yet the responses are generally incomprehensible to the layperson.

loup-vaillant · on Oct 25, 2023

In some cases I’d even suspect the incomprehension might go both ways. If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return. Sticking to their upper-class dialect would be passive-aggressive oppression born out of class contempt…

…which I’m sure actually happens in real life.

inkyoto · on Oct 25, 2023

> In some cases I’d even suspect the incomprehension might go both ways.

Which is also true.

It was a thought experiment to highlight the fact that the lack of comprehension could also arise within the boundaries of a single language. In linguistics, the term for this specific phenomena is «the social register», and there are plenty of active and thriving language that employ the social register in the daily speech. Korean, for instance, is renowned for having a highly complex system of the social registers (effectively, parallel vocabularies) embedded in the spoken language. There are other languages as well.

> If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return.

And that is also true. Social registers have largely disappeared from mainland European languages, yet an English accent and the choice of the words of an English speaker can reveal sufficient details about their socio-economic background.