EBCDIC Is Incompatible with GDPR

dale_glass · on Oct 25, 2023

And good thing, IMO.

You try and have a weird name in a foreign country. If you're not careful, everyone will spell it differently, and then you'll spend ages going "No, I'm sure I'm in your system. Try this other spelling".

At least back in the 90s when I often had to deal with that, you were usually talking to a real life person, sitting on the other side of the desk. You could show them your documentation.

Today since everything is online and computerized, the risk is that a computer somewhere in the chain will just go "The bits don't match", and it'll be challenge to even reach a real person, let alone one capable of even understanding what problem you're having.

Here's a real problem this could cause: To get a visa you usually need to prove ties to your country. This normally includes a bank extract. If the name printed on your bank extract doesn't match what's printed on your documentation that means a very real risk of rejection, and not going anywhere if you can't fix it fast enough. And I can imagine other very not fun possibilities, like having some sort of KYC/AML snag where something decides you lied about your name.

dathinab · on Oct 25, 2023

> weird name

to double down on it it's not even about a weird name per-se but lot of very normal EU names

if a lot of early IT wouldn't have been dominated by a US "works for us must work for everyone" approach I think we never would have ended up with such limitations common in legacy systems (there still would be limitations, pre-unicode the solution was custom code pages and similar, which all supported some subset of non us-ascii but only a subset)

luckily today unicode is the standard (through for some cultural and historic aspects it's sometimes not enough)

simiones · on Oct 25, 2023

I don't think this is strictly a technical problem, at least not when it happens in international contexts (it's inexcusable for your own country's authorities to not be able to record your actual culturally-specific name).

The reality is that there is a limited char set that is actually understandable at an international level, and it's not that different from ASCII. Even with paper systems, if you go to Rome to sign into a hotel and you give your name as 依诺, they will not be able to even write it down, nevermind pronounce it. And even if you tell them it's Yī Nuò, they will likely ignore the accents since they won't know what those mean. Similarly, if you go to China and say your name is Sângeorz-Băi, they will not know what the diacritics mean and will not be easily able to write them down.

In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in, and that includes names. The situation in writing is actually much much better, even if limitted to ASCII, than it is in actual spoken language. Maybe you can write my name down perfectly (Simionescu), but I would bet you won't use the proper pronunciation unless you happen to know Romanian - you will likely use different vowels and consonants.

RcouF1uZ4gsC · on Oct 25, 2023

This is actually codified for passports

https://en.wikipedia.org/wiki/Machine-readable_passport#Name...

noodlesUK · on Oct 25, 2023

What's interesting is that the transliteration of symbols is not even remotely uniform in icao 9303. There are multiple recommended transliterations of some characters, and it definitely goes only in one direction: national script -> MRZ transliteration. It is not possible to go the other direction.

Take a look at the spec if you are interested: https://www.icao.int/publications/documents/9303_p3_cons_en.... The transliteration tables start on page 24 and are scattered throughout the document.

yencabulator · on Oct 25, 2023

It's not intended to round-trip, it's intended to be roughly human-readable without knowledge of the original script. It's pretty close to the system Olympics used, with the Wikipedia example of Hämäläinen -> HAEMAELAEINEN being well known as a gold medalist cross-country skiier.

Newer versions of the transliteration encourage stripping diacretics, so that would be HAMALAINEN. Much more readable to native speakers, but obviously loses information.

dathinab · on Oct 25, 2023

> encourage stripping diacretics

I wouldn't recommend it as there are official tranformations from the countries which uses diacretics and they most times are not to strip them. It's kinda another case of people forcing stuff onto other cultures. And if you do bussiness in some of the countries in some industries you might even get into legal trouble if you apply that.

yencabulator · on Oct 25, 2023

Take it up with the spec, then. That is the recommendation:

> Section 6 of the 9303 part 3 document specifies transliteration of letters outside the A–Z range. It recommends that diacritical marks on Latin letters A-Z are simply omitted (ç → C, ð → D, ê → E, ñ → N etc.), but it allows the following transliterations: [...]

You said

> you might even get into legal trouble if you apply that.

We're talking about passports, this seems not relevant. For passport-related use such as travel, you use the form of the name written on the passport, exactly as-is.

deadbeeves · on Oct 25, 2023

It seems odd to me to arbitrarily restrict the alphabet if the only requirement is that the data has to be readable by a machine through an OCR system. They could have easily used the Latin alphabet to encode arbitrary bit strings.

dathinab · on Oct 25, 2023

> alphabet if the only requirement is that the data has to be readable by a machine

its the only requirement because

1. it's only meant for OCR

2. it's clear that it won't be used by only OCR but at least also human interacting with the system, potentially phone calls passing this information by voice, and anyone who can't pronounce the original spelling. For fairness if you e.g. didn't sing at all in your life are somewhat tone deaf and now are expected to pronounce a asian you probably have to spend days or more until you can do so (just as a extreme example).

imtringued · on Oct 25, 2023

How do you pronounce 64656164626565766573?

deadbeeves · on Oct 25, 2023

You don't? It's machine-readable.

ninkendo · on Oct 25, 2023

Thank you for providing sanity here. Not every problem comes from idiot Americans just being lazy.

dathinab · on Oct 25, 2023

except that the system they mention was (co) created by the US with the US being a major player deciding on the rules it uses...

tsimionescu · on Oct 25, 2023

If you mean ASCII, you'll also notice that it happens to correspond pretty well with the entirety of writing symbols which have broad global recognition, and this has been true for a long time. Sure, it's missing many many culturally specific things like accents and other diacritics, non-latin writing systems etc.

But none or at least very few of the symbols missing from ASCII are actually broadly understood by people from more than a handful of countries (which can still mean a billion plus people in the case of Chinese, Arabic, and Indian scripts, of course).

Probably Arabic is the biggest counterpoint to my claim, as there is quite a large array of countries across two continents that recognize it. However, even there, there are far more people in countries which use Arabic writing that also recognize Latin letters than the other way around.

The many diacritics used by various European languages are definitely NOT something that has any wide adoption or meaning. Perhaps only the umlaut sign and the accent are even used by more than one or two European languages.

So again, my claim is that any system of writing that is intended for global international communication will have to restrict all names to the A-Z characters in ASCII with spaces as separators (and perhaps 0-9 and a few other characters that would anyway get ignored). Nothing else will work if people around the world are supposed to recognize the name in some meaningful sense. And relying entirely on automated OCR is a no-go for many use cases.

And just like people who interact with those outside their cultures have to accept that their name will be pronounced in a myriad of ways, they have no reason not to accept that it will be written in different ways as well.

kiwidrew · on Oct 25, 2023

> it's missing many many culturally specific things like accents and other diacritics

fun fact: some of the symbols included in ASCII were intended to be used as (non-spacing) diacritical marks, specifically the tilde/caret/backquote characters...

[too lazy to dig up a proper source at the moment but the Wikipedia ASCII article covers some of this]

ninkendo · on Oct 25, 2023

You’re missing the point. This is not a technical problem in the first place, as the poster I replied to did a great job explaining.

pseudonamed · on Oct 25, 2023

I completely agree it's not purely technical nor purely social but it is a real problem. Personally, I lost out on an equity options exit event due to delays caused partly by these exact issues and visas.

giamma · on Oct 25, 2023

Hong Kong natives, even if from a Chinese heritage, have to have an English name.

And software in those countries always has the "English name" field.

shiroiuma · on Oct 26, 2023

In Japan, all residents (citizens and people living with a visa) have to have a katakana spelling of their name. Web forms usually ask for this (in addition to your name spelled in kanji, which of course is impossible if your name is in latin characters, so you just hope the form accepts latin), and it's used in many other places as well, such as for bank accounts. Sometimes places will take your latin-character name and transliterate it themselves, and then this causes problems when their transliteration doesn't match that of other places. (katakana has far fewer distinct sounds available than most other languages, so the transliteration always loses information, and there's usually different ways to do it.) Even worse, many forms (like web forms) have rather short character limits for the name field, so with a transliterated Western name, it many times just won't fit within the ~10 characters they allocate.

Paul-Craft · on Oct 26, 2023

Interesting. Is that some kind of leftover from British rule or something?

philistine · on Oct 26, 2023

I can't get Amazon to deliver mail to my address because of a diacritic, but it's my own fault, not Amazon's.

inkyoto · on Oct 25, 2023

> In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in […].

Oh… where do we start… We do not have to go as far as finding an intersection of multiple languages. Consider English as an example. I have written up a fictitious but a reasonably real dialogue between:

1. A layperson from a lower socio class, who has not attained high educational levels and speaks English using vernacular and predominantly Germanic vocabulary of the English language.

2. A state citizen of the upper-class descent who speaks English almost exclusively with Latin/French/Greek-derived vocabulary.

This layperson complains about not receiving a welfare payment from the state.

Layperson (L): Oi mate, I ain't got me geld from the state yet. That's daft, ain't it? Every man's got a right to his share, right?

Educated citizen of the upper class descent (U): Pardon my incredulity, but are you referencing the monetary allocation designated by the government for individuals of a particular socio-economic standing?

L: Eh? Oh, you mean the dole? Yeah, that. They owe me, but there's no dosh in me pocket yet.

U: If I interpret your sentiment correctly, you are perturbed due to the delayed disbursement of your financial entitlement. Have you endeavoured to communicate with the pertinent authorities?

L: Talk to who now? Oh, you mean the blokes at the town hall? Aye, but they keep spieling some rubbish. Can't make head or tail of it.

U: My advice would be to liaise with the relevant office, elucidate your predicament, and seek resolution. It is paramount to ensure you have met all requisite criteria for the stipend.

L: Right, so you're saying I should have a natter with 'em and make sure everything's shipshape? Just want what's owed to me, y'know.

U: Precisely. Engage in a dialogue with them, ascertain the cause of the discrepancy, and ensure you have fulfilled the necessary prerequisites for the allocation. You deserve your due compensation.

L: Cheers for that. It's a bit of a muddle, all this, but I reckon I'll give it another whirl.

U: I wish you fortitude in your pursuits. If there is an inherent right to such financial assistance, it is imperative you receive it posthaste.

Even though the layperson does understand responses in the fictitious dialogue, that would not be the case in real life. Both speak the same language, yet the responses are generally incomprehensible to the layperson.

loup-vaillant · on Oct 25, 2023

In some cases I’d even suspect the incomprehension might go both ways. If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return. Sticking to their upper-class dialect would be passive-aggressive oppression born out of class contempt…

…which I’m sure actually happens in real life.

inkyoto · on Oct 25, 2023

> In some cases I’d even suspect the incomprehension might go both ways.

Which is also true.

It was a thought experiment to highlight the fact that the lack of comprehension could also arise within the boundaries of a single language. In linguistics, the term for this specific phenomena is «the social register», and there are plenty of active and thriving language that employ the social register in the daily speech. Korean, for instance, is renowned for having a highly complex system of the social registers (effectively, parallel vocabularies) embedded in the spoken language. There are other languages as well.

> If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return.

And that is also true. Social registers have largely disappeared from mainland European languages, yet an English accent and the choice of the words of an English speaker can reveal sufficient details about their socio-economic background.

JohnBooty · on Oct 25, 2023

I get that it was a USAcentric thing, and that we should always be active w.r.t. calling out ethnocentric behavior.

But it was also an "8-bit" thing and a "extremely limited computing resources" thing. EBCDIC was designed in 1963/1964.

I mean, when you've got 8 bits to represent a character, and there are more than 256 possible characters... what do you do?

A truly robust solution like Unicode would not have been feasible with the resources of the day, and even a "simple" 16-bit scheme would barely be able to contain all 50,000 Chinese characters.

The blame here lies with the Dutch bank who willingly chose an EBCDIC solution in 1995, although I'm sure they were dealing with various constraints and pressures as well.

wink · on Oct 25, 2023

I can tell you that German bank thing I signed up in 2023 and can't be older than ~5 years asked for my FULL name as written on my ID, in my case it's "First Second Last" but I go by "First Last" but in their infinite wisdom they decided to ask for the full thing (which I understand, it's bank stuff) but never think about what they call me in their stupid email. No one's ever called me "First Second" - not even my parents. And I can't even be mad, but I'm still disappointed. (Fortunately my name is ASCII and since we can do more than 8+3 I've never had problems.)

weinzierl · on Oct 25, 2023

"FULL name as written on my ID"

This is completely normal for official documents in Germany and it makes sense for us.

Technically we do not have first, second or any other numbered names. Our given names form a set in the mathematical sense and any one is equally valuable. This comes from the tradition of given names being given by godmothers and godfathers and we wouldn't want to get into the issue to ever have to value one of them over another. At least this has been the case in some parts of Germany and has influenced the official regulations for names.

Of course the names have to be put into an order on your ID and to keep things simple banks, schools, authorities, etc. ask you to use that order on their documents.

Traditionally, official documents just used the surname with "Herr" or "Frau" but nowadays they often use just the given name in first position on your ID.

If never heard of a "First Second" case with one exception:

Given names can be connected with a dash. In this case the order is fixed and the whole unit is treated like a single name. While in principle arbitrary names can be combined there are certain very common combinations, like "Hans-Peter", "Karl-Heinz" or "Franz-Xaver". If you happen to be named "Hans Peter" (without dash) it's likely that they assume the dash and will call you "Hans Peter" or "Hans-Peter" all the time.

hunter2_ · on Oct 25, 2023

> it's likely that they assume the dash

There is a very mild version of that in the US -- lower likelihood of blind assumption, but still present: when a set of two given names starts with "Mary" or ends with "Ann/Anne". Examples include "Mary Jane", "Mary Kate", "Jo Ann", and of course "Mary Ann". Some have simply merged into single names like "Maryanne" and "Joanne" more recently. There are probably others.

wink · on Oct 27, 2023

> This is completely normal for official documents in Germany and it makes sense for us.

Of course, and I mildly apologize for my case of Whataboutism because I actually described the reverse. They're taking the rules too literally and are using the thing they need for official documents everywhere (their marketing/status emails).

I'm just just kinda puzzled why they'd think it's a good user experience, especially for people who are not just not used to reading their government id name but actually uncomfortable (i.e. pending name change).

rjsw · on Oct 25, 2023

I made a similar mistake in France, the system decided to call me Firstsecond Last.

bandrami · on Oct 25, 2023

My first name starts with a W (let's say it's "Walter"). In India multiple times I have spelled it out over the phone and then received a letter addressed to "Uualter".

felixg3 · on Oct 25, 2023

Hehe, and my name contains an ß which is often confused with a B. I started to point out the machine readable area instead of just showing my passport…

squarefoot · on Oct 25, 2023

Could it have been a too literal transcription of its spelling? W is spelled "Double U".

bandrami · on Oct 26, 2023

Right, yes, that is my assumption

secondcoming · on Oct 25, 2023

A friend named 'Alice' is 'Erith' in a UK council's database.

coldpie · on Oct 25, 2023

Please keep her away from ex-soldiers with very long swords.

selimthegrim · on Oct 25, 2023

This happens to me on United all the time for some reason.

Jailbird · on Oct 25, 2023

I've seen that forever and not just on United. I have thought it's something about the underlying SABRE system that many airlines use. Maybe someone here knows more.

tivert · on Oct 25, 2023

> I've seen that forever and not just on United. I have thought it's something about the underlying SABRE system that many airlines use. Maybe someone here knows more.

I don't remember the precise details, but some airline website's password had restrictions at one point that made it super-obvious that they were internally converting alphanumeric passwords to digits based on the US telephone key mapping.

I remember thinking at the time it might ultimately have been due to SABRE (because I believe that's literally one of the oldest computer system still in use), and screen-scraping some telephone menu system depressingly seems like something someone would do for expediency.

I wouldn't be surprised if a system like that also mangles names.

dpkirchner · on Oct 25, 2023

Fidelity does this as well:

> Usernames and passwords containing letters need to be translated to numbers to enter them in a Fidelity phone system (like FAST, or if you call a representative). Use your telephone keypad to convert the letters to numbers. There is no case sensitivity. Substitute an asterisk (*) for all special characters. https://www.fidelity.com/customer-service/need-help-logging-...

kvmet · on Oct 25, 2023

I can tell you from personal experience that if you have four names it will turn "First Second Third Last" into "Firstsecondthird Last" (I usually fly Delta).

I asked a checkin agent to fix it but they said it will start rejecting my ID if they change it at all.

noodlesUK · on Oct 25, 2023

On British airways (and I believe other ticket systems that use Amadeus), I often get LastnameTitleFirstnameSecondname all as one word (in caps). It certainly looks funny on the boarding pass, but I've never had any issue getting through security.

pavlov · on Oct 25, 2023

Happens to me as well. My first, second and last names are quite short, five letters each. So my guess is that’s it’s related to that.

Maybe there’s simply the COBOL equivalent of the following somewhere:

  if len(first) + len(second) + len(last) <= MAX_LEN_ON_TICKET:
    first_name_on_ticket = first + second

raverbashing · on Oct 25, 2023

I've asked ChatGPT for the COBOL equivalent... and it's 18 lines

pseudonamed · on Oct 25, 2023

Names with hyphens are also contracted by SABRE... Subname-Subname becomes Subnamesubname.

throwaway251023 · on Oct 25, 2023

Something kinda similar, I applied for a PH passport and they ADDED a third name to my name. Instead of my actual "first second last" on my official PH documents I'm " first-second third last". The third isn't anywhere in my name/us birth certificate or any other identifying documents and not at all what my parents named me. I only use my US passport now because it caused a bit of confusion with my US departing airline ticket first name not matching my PH passport and if it had that, then the arrival into the US would not match my US passport.

kps · on Oct 25, 2023

The IT in question started with Hollerith cards¹, processed by electromechanical equipment. These were originally numeric only — digit n represented by a hole in row n which would stop or start a counter wheel. (Punched cards were processed row by row, not column by column.) The alphabetic extension added a second hole near the top edge, handled using much more complicated and expensive equipment. EBCDIC was originally a straightforward mapping of these holes into an 8 bit space, and its arrangement makes sense seen that way.

ASCII on the other hand derives much more from communications equipment (telegraphy) than IT gear.

¹ https://en.wikipedia.org/wiki/Punched_card#IBM_80-column_for...

michaelteter · on Oct 25, 2023

> US "works for us must work for everyone"

I think you mean “built for us, and meets our needs”. It’s not the US’s problem that other countries don’t necessarily take innovation risks, but instead buy our old stuff.

taway1237 · on Oct 25, 2023

>US "works for us must work for everyone" approach

I'm all for bashing the US, but I think this is a global ghing, not unique to the US.

corbezzoli · on Oct 25, 2023

The difference is that most countries don't expect the world to bow to them (culturally, technologically, etc). While I used Chinese products, I never had to learn how to spell my name with Chinese characters.

GrygrFlzr · on Oct 25, 2023

Even right now in modern day Japan I have to canonize my name in katakana (syllabary designed for foreign/loan words), and all the systems strictly expect a singular word First Name and a singular word Family Name. If you have a middle name, it effectively gets thrown out. Multi-word first and/or last names need to be smooshed or cut down.

I have encountered even worse issues digital forms that only accept kanji (Chinese characters) or hiragana (syllabary designed for native Japanese words), the latter of which usually does not support certain voices that katakana supports. Ashley Tisdale, for example, is normally rendered as アシュレイ・ティスデイル (ashurei tisudeiru) - ティ is actually te with a small -i modifier, which does not usually exist with hiragana. Forcibly converted to hiragana, it turns into あしゅれい・てぃでいる - but ぃ is not accepted by the form, even if it exists in UTF-8. Your options are either converting the ティ into ち (chi) or て (te), neither of which are ideal, and may cause mismatches to other systems that properly support the katakana version.

The problem extends further into physical paper forms, where often they provide a very limited amount of boxes for characters, because native Japanese and Chinese names can easily fit within 8 characters. Combine this with the digital systems above and you're bound to have several versions of your name floating around on official documents all mismatching each other.

Some systems that need to print onto physical cards (e.g. getting a 1/3/6 month route pass on your SUICA or PASMO contactless smart cards) are even worse and turn dakuten (diacritics for hiragana/katakana) into their own character. As an example, the character ほ (ho) can be turned into ぼ (bo) using a dakuten, or ぽ (po) using a handakuten. The system will instead render those as two separate characters: ほ゛ and ほ゜ respectively, which cuts down on the number of available characters for the already limited textbox space you're dealing with.

The world is full of presumptions about names even today.

vladvasiliu · on Oct 25, 2023

> The problem extends further into physical paper forms, where often they provide a very limited amount of boxes for characters, because native Japanese and Chinese names can easily fit within 8 characters.

This happens in Europe quite often, even though many people have longer names.

CoastalCoder · on Oct 25, 2023

Interesting!

Any idea if this is why, in Japanese-dubbed anime, the voice actors seriously mangle some English words/names? E.g., they often add a vowel sound to the ends of English words that should end with a percussive syllable.

I.e., do you think it comes from those words/names being written in katakana or hiragana in the dialog scripts, and those systems just can't express the correct pronunciation of such English words/names?

jcranmer · on Oct 25, 2023

Actually, it's probably a simpler reason than that. The Japanese language is largely a CV syllable string (consisting of a consonant and vowels); consonant clusters do not exist, and the only final consonant permitted is 'n'. English, by contrast, is a much more phonotactically complex language--consonants can pretty freely appear both before and after vowels in a syllable, and English also has several consonant clusters. Imagine trying to pronounce the word "strengths" if your native language lacks consonant clusters--it's like an English person trying to pronounce the Czech phrase "Strč prst skrz krk". On top of that, Japan is not great at English proficiency (it's definitely weaker than any other rich country, see https://www.ef.com/wwen/epi/).

It's not really that the written language makes the names hard for them to pronounce, it's that the spoken language doesn't make it easy, and there's probably not enough care to try to pronounce them. Where the written language does make it hard, it's usually when people try to localize Japanese media into foreign languages, and the intended references in names are lost because of the mangling process of transcription into katakana.

hunter2_ · on Oct 25, 2023

As an English speaker who has traveled to Japan without learning much of the Japanese language, I agree generally but I also noticed that there are some cases where a vowel is written but not pronounced. For example, "gosaimasu" is mostly pronounced without the "u" (creating a counterpoint against final consonant other than "n" being forbidden) and "gozaimashita" is mostly pronounced without the second "i" (creating a counterpoint against consonant clusters such as "sht" being forbidden). It gives me the impression that these rules exist more in written Japanese than spoken Japanese, at which point it becomes less clear why adding a vowel to the end of foreign/imported words is so common. Maybe it's just my English perception that the sounds /s/ and /sh/ consist of pronouncing only a consonant, when in reality the fact that those sounds have duration (not just a moment) actually means it's more of a vowel even when totally unvoiced!

hunter2_ · on Oct 26, 2023

As I think on this further, even these voiceless /s/ and /sh/ sounds involve putting the lips into either an /u/ or an /i/ shape based on the following vowel even if that is also voiceless, creating that which is not a syllable in English, but perhaps is for this purpose in Japanese. The C-V cadence and final vowel (given lack of final -n) rules are satisfied...

anonymfus · on Oct 25, 2023

First, that is not because of writing system specifically, but because of the rhythmic structure of Japanese language, see, for example, this video:

https://www.youtube.com/watch?v=J_HLY0Rss-g

Second, in Japanese dubs these words are not usually actual English words, but Japanese words originated as borrowings from English language, so voice actors don't actually mangle them, the same way as English speaking people don't mangle the word "coffee" as they usually pronounce it, despite it being different from how Italians pronounce "caffè".

tivert · on Oct 25, 2023

> Any idea if this is why, in Japanese-dubbed anime, the voice actors seriously mangle some English words/names? E.g., they often add a vowel sound to the ends of English words that should end with a percussive syllable.

I don't know anything about anime, and little about Japanese, but I think Japanese (and Chinese) have a fairly strict consonant-vowel form for all their syllables. That makes foreign words that have runs of consonants or do not end it a vowel hard to pronounce, so speakers of those languages have a tendency to insert extra vowels to make pronunciation easier for themselves.

It's kind of like how English speakers will usually change the Pinyin "X" (as in Xi Jinping) into an English S or SH sound when they try to speak it, because the actual sound doesn't exist in English.

simiones · on Oct 25, 2023

I think it's more that Japanese speakers just don't have those types of sounds in their phonetic repertoire. Some may be able to pronounce them, but most will not (and may not even notice the difference).

Every person has a certain limited set of consonants, vowels, diphtongs, triphtongs, tones, and even syllables that they are able to recognize and reproduce. This is something you can train to recognize more, but you will probably never be able to pronounce or even distinguish the totality of all those used in all languages, even just the living languages on Earth.

Even if you did, there is an added complication that some languages actually used multiple sounds interchangeably, and explicitly distinguishing them may actually confuse you. For example, most European languages recognize various consonants as the same "R" sound, even though they are vastly different (French R is a back of the throat trill, Italian R is a trill near the palate, and English R is articulated next to the palate without any trill). If you come from a language where these are distinct sounds, you may have trouble understanding that two people who use different R sounds are pronouncing the same word.

somat · on Oct 25, 2023

There is also the R/L problem, A sound that to me, a native english speaker, is fairly distinct. However these are the same sound in Japanese. Because of this I think that it is very hard for Japanese speakers to figure out which one to use and they get switched all the time.

dontlaugh · on Oct 25, 2023

There's a finite and relatively small number of possible syllables in several Asian languages, including Japanese.

jacquesm · on Oct 25, 2023

If modern computers had been invented in China and had had a decade or two headstart on the rest of the world then you may well have had to do just that.

This was an accident of history, not some deliberate plan to get the world to bow to the English speakers. And English was already well established as a major language in trade (due to it being superficially simple to learn), next to German, French and Spanish. China was pretty isolated for a long time culturally as well as geographically and the complexity of its script is another barrier to it being accepted as a common language by the rest of the world.

One of the more interesting things along this line in recent history is that with Brexit the EU no longer has an England/Wales/Scotland and a chunk of Ireland in it, but another chunk of Ireland remains. This led the French to immediately propose that French become the official language of the EU parliament but the rest of the countries wouldn't have it, and rightly so.

https://www.independent.co.uk/news/world/europe/brexit-franc...

b3orn · on Oct 25, 2023

> This led the French to immediately propose that French become the official language of the EU parliament but the rest of the countries wouldn't have it, and rightly so.

Didn't happen, they just said they'll use French during their council presidency (not the parliament, it's not even mentioned in your article), that's all, there are no rules against that. They would've done it regardless of Brexit.

jjgreen · on Oct 25, 2023

Having French as the lingua franca, the very idea!

acheron · on Oct 25, 2023

“Lingua Franca” historically has nothing to do with French. It was a dialect used by Italian traders.

Obviously this means the EU should run things in Venetian.

jowea · on Oct 26, 2023

Nothing to do with French seems a bit strong. It's related. From Brittanica:

> lingua franca, (Italian: “Frankish language”) language used as a means of communication between populations speaking vernaculars that are not mutually intelligible. The term was first used during the Middle Ages to describe a French- and Italian-based jargon, or pidgin, that was developed by Crusaders and traders in the eastern Mediterranean and characterized by the invariant forms of its nouns, verbs, and adjectives. These changes have been interpreted as simplifications of the Romance languages.

jjgreen · on Oct 25, 2023

Heh, TIL, thanks. Obliquely, I was in Venice some years ago; sitting on the steps of a church I set to rolling a cigarette. A couple of small boys stopped and stared at this activity, one pointed and said "Il fabricato fumer!", I knew exactly what he was saying (although I have no Italian). So Venetian it is.

extraduder_ire · on Oct 25, 2023

I think that french diplomat just saw their shot, and took it. I doubt they actually forgot that there's still two english-speaking countries in the EU.

bdsa · on Oct 25, 2023

Ireland and... Malta?

acqq · on Oct 25, 2023

> due to it being superficially simple to learn

I, however, don't think most of the people who started using one of the named languages instead of their mother tongue ever really selected English using that specific criterion.

smcl · on Oct 25, 2023

You're saying this like it was some deliberately hostile, colonial move to impose ASCII on the world. But I don't think it was quite like that, more that in the beginnings of computing people designed and built things for themselves. And it just happened to be that a lot of that early work happened in the anglosphere.

wink · on Oct 25, 2023

I honestly think it has more to do with culture. I've never been to the US so this might be completely wrong, but my observations from talking to people and just observing:

- if you move to the US and have a name made up of non-ASCII chars you are more likely to either drop them/substitute them with ascii chars, or use the Anglicized version of your name if it exists, or adopt an English name. And then it's kinda easy to legally change or your name. Or screw it, it's kinda easy to just show up and tell them you're Johnny Awesome and then you're Johnny Awesome.

- if you move to Germany, you can't legally change your name at all without good reason, every document ever, no matter how informal (especially at school) will probably have your full name, maybe hopefully just "First Last" and not all 7 of them, everyone of authority will refuse to call you Johnny Awesome if your name is actually Johnathan Jean-Pierre Awesome-Livingston, and so on, oh and they will also fail to not butcher your name if it's not so easy a 4y old can learn it.

We can't be the only ones leaning more towards #2. And no, I'm not making this up, my go-to example is that I've seen cases where things like officially not calling "Bill Gates" "William Gates" have met resistance. Your name is your name, and I'm still not sure how people in the spotlight are able to be called Dick, I'm not joking.

account42 · on Oct 25, 2023

Try living in an asian country - probably you will have to choose a name in the local script which at best vaguely sounds like your given name. It's expected that if you use someone elses playground that you adapt to their rules - that goes for moving to a foreign country and to using technology primarily developed in one.

Kamq · on Oct 25, 2023

> expect the world to bow to them (culturally, technologically, etc).

I mean, I don't expect you to bow to me.

But at the same time, the software I produce at work is usually entirely consumed by americans who speak english (ok, well, there's one canadian customer that I'm aware of). Because that's who pays for it, and none of those customers is particularly looking to pay for translation.

And the software I produce during my off hours is generally meant for me and my friends to consume. I'll put that on github/gitlab/source hut and you can use it if you want, but I definitely don't have the budget for translation either.

FooBarWidget · on Oct 25, 2023

China has its own problems. There are obscure family names out there consisting of characters that aren't officially recognized, so computers can't process their actual family names. So those people instead pick the closest alternative officially recognized character instead, purely for the purpose of official documents and appeasing computer systems.

snowpid · on Oct 25, 2023

As I understood family names are very old in China. So who chose the family name and "invented" a character?

FooBarWidget · on Oct 25, 2023

I think in premodern times the Chinese character set was not as centrally regulated as it is now, and therefore there should be quite many instances of independent/local character invention.

em-bee · on Oct 25, 2023

many chinese characters consist of combinations of other characters. most common is a combination of two, where one component suggests the meaning, while another hints at the pronunciation.

here is one character made up of 11 others: https://en.wikipedia.org/wiki/Biangbiang_noodles#Chinese_cha...

this shows that new characters can be created not by inventing new strokes, but by simply combining existing characters to convey a new meaning, much like we occasionally do create new words in english by combining existing ones, even though that process in english is not productive, unlike eg. german, where it is quite normal. the difference is that these new words only have one syllable.

with the digitalization the creation of new characters essentially ends. the creation of the simplified chinese character system also pushes against creating new, more complicated characters.

it is going to be interesting to see how that will affect language development. new "words" can still be created by using a sequence of characters, but that means that each character keeps their syllable sound. whereas new compound characters would have a single syllable. so if a new meaning emerges for a syllable, a new character can't be created for it. will this prevent new single-syllable words? or will it lead to multiple characters being pronounced with a single syllable?

tsimionescu · on Oct 25, 2023

Do Chinese characters always have the same pronunciation? In Japanese at least, their Kanji (which are derived from Chinese characters) are often read in entirely different ways in different contexts. For example, 二人 is read as "futari" (two people), but ニ alone is read "ni" and 人 alone is read as "hito".

FooBarWidget · on Oct 26, 2023

Mostly yes. In Mandarin, tone can be a bit different depending on context but overall pronunciation doesn't differ that much.

But a major caveat is that pronunciation can be wildly different when spoken with other dialects. Mandarin and Cantonese reading of the same text, even with same meaning, sound entirely different.

em-bee · on Oct 25, 2023

that's a good question, i know that there are many characters that share one pronunciation, but i have not come across the reverse. there are different pronunciations in different dialects/languages of course, and maybe some of those get adopted by other dialects (that would make sense for food names for example) but i didn't study chinese, so i really don't know.

dtech · on Oct 25, 2023

I mean this is far from unique to China, see e.g. the disappearance of þ (Thorn) from the English alphabet.

em-bee · on Oct 25, 2023

you haven't been in china long enough. i have had a few situations where the system was unable to write my name in latin characters. i even had to get a notarized transliteration of my name into chinese so that the resulting chinese version could be used on some official documents.

dathinab · on Oct 25, 2023

true it's not that uncommon

but it's especially stereotypical for mainly the US, China and I think Japan.

Which are all countries where to due to various reasons (size, culture/nationalism) there are a lot of people doing technical decisions which: 1st only speak the countries language, 2nd have little interaction with very different cultures

(in the US it's complicated, they have a lot of mixing other cultures into them, but do so in a very very US specific way with a lot of unaware cultural appropriation (and I don't mean this in the "bad/evil" way it's often used today but the cultural normal way) and the US is so large that there is little reason to make a trip to a country which is very different, and even if they do so, it's often in a form which is very touristy. This leads to situation where e.g. US citizens claim they are Spanish because of some ancestors and claim to practice Spanish culture but they are 0% clueless about actual Spain even after having traveled there twice or so. Contrast this with the EU where e.g. spending on study semester in another country which does have a completely different language and culture isn't rare, and non touristy holiday trips to other countries are common too (I mean in some cases it's just a few hours by care) and it's very easy, to have people which just don't know better.)

liotier · on Oct 25, 2023

In the early 90's me and a couple of other French guys had taken to writing French unaccented, so that we wouldn't suffer the daily pain of character set problems - we really bashed our own heads to fit into lower ASCII !

TeMPOraL · on Oct 25, 2023

Same here in Poland. I got used to skipping the diacritics, and quickly learned to mentally decode text that was rendered with the wrong code page.

And maybe this is a form of Stockholm syndrome, but to this day, I don't really mind sticking to lower ASCII - it just makes things easier (or at least until recently it did), and I don't really care about that '³' in my surname. Sorry, I meant 'ł'.

Possibly related to that:

- I always stick to US English language when using software, even if it offers Polish, because I don't trust translations. They're usually done by people who don't have enough context and knowledge to do it right. I've been burned too many times by this. Plus, localized error messages hinder searching for solutions.

- I wouldn't mind if everyone switched to English[0] as first language and called it a day; there would be tremendous economic and social benefits from that to everyone, far outweighing the loss of a little bit of cultural variety/noise.

- I'm strongly in favor of meeting machines half-way. LLMs aside, it's trivial for people to learn a small controlled vocabulary here and there (like e.g. "OR" and "AND" and quotes in search queries), allowing to make interfaces vastly more predictable, reliable and comprehensible.

--

[0] - Or French, or Chinese, or Swahili, neither of which I know - to stave off the usual replies of the "you only want it because you already know English" kind.

liotier · on Oct 25, 2023

> I always stick to US English language when using software

Of course, the computer's native language is English - anything else would be silly. This may be a generational meme: young people have French language environments - even most of the computing professionals... But I keep my habit of mixed locales with metric measurements, English-language UI, mixed ISO 8601 and French dates. God bless UTF-8 though !

liotier · on Oct 25, 2023

Metric measurements, English-language UI, UTF-8 and ISO 8601 dates could make a nice international default standard locale.

shiroiuma · on Oct 26, 2023

I really, really hate how Amazon tries to force US customary units (inches, etc.) on me in automatic translation just because I set the language to English.

wang_li · on Oct 25, 2023

If it weren't for Honduras we could have en_HN.UTF-8.

paulmooreparks · on Oct 25, 2023

Describes my custom setup exactly. It would be nice to have it as a default.

rolisz · on Oct 25, 2023

I guess Polish people had it worse than Romanians and Hungarians. All our accents are simple to strip away and they still kinda sound like the base letter.

But I agree with you: I don't trust translations and I think the benefits from humans using a single language would be amazing.

giamma · on Oct 25, 2023

Same for me. I am Italian but I always used computers with English localization because translation was often bad, especially IBM translations.

So I got used to using US keyboards also, and never using accented letters even though they are frequently used in Italian, e.g. instead of writing "Mario è alto" (Mario is tall) I would write "Mario e' alto". It helped a lot that I worked for almost two decades for companies having only non Italian clients and all communications internal and external were in English.

Now that I am working for an Italian company with Italian clients, I am slowly getting used to Italian keyboards and accented letters.

wink · on Oct 25, 2023

The funny thing to me is how these inofficial transcription rules differ from country to country. Seems most people with ö or ü from a Turkish name are happy to drop it to o and u (cf Mesut Özil) but Germans are absolutely not. (Not saying this is a rule, just what I observed.)

Attrecomet · on Oct 25, 2023

German has an official and fully information preserving transcription to basic latin (which is really what all this talk about "anglo letters" is, just the basic latin letters in common use in the early modern era, with no diacritics at all), which can be used in official documents, too. Other languages, like latinized Turkish, obviously copied the diacritics, but seem to have left out the transcription rules, most probably because they borrowed the letters long after their history was relevant.

b3orn · on Oct 25, 2023

For German these rules are offical. They're used in the machine readable part of ID cards and passports. If you start at a german company and they setup an e-mail address for you they'll use these rules too. The origin of the letters ä, ö and ü in German are ae, oe and ue, people just put the e on top of the vowel and they slowly transformed into what we use today. You can still see this in some names like Goethe.

wink · on Oct 27, 2023

I know that, but it's not what I meant, but probably didn't make it clear enough.

Germans know these official rules and maybe linguists, but if you present the typical English speaker with Möller vs Moeller it's confusing. Look at the media (who could, maybe, do some research?) who write Jurgen and not Jürgen. That's my point, the official rules don't help if everyone ignores them, for whatever reason.

snowpid · on Oct 25, 2023

In certain situations like cross word puzzle it wasnt usual to use "umlaut" but instead to write oe, ue and ae. In Swiss German people dont use ß. So transcription was always a thing to know.

gray_-_wolf · on Oct 25, 2023

> Or French, or Chinese, or Swahili, neither of which I know - to stave off the usual replies of the "you only want it because you already know English" kind.

Yes and no. French sure, but in my experience of trying to learn Japanese, the difficulty is insane, much higher compared to the "western" languages.

Zanfa · on Oct 25, 2023

In early 2000s Estonia, I remember people using numbers to denote certain accented characters when texting:

õ - 6

ä - 2

ö - 8

ü - y

buchoo · on Oct 25, 2023

Arabic speakers have a comparable system:

https://en.wikipedia.org/wiki/Arabic_chat_alphabet#Compariso...

HPsquared · on Oct 25, 2023

The trouble comes up when an Arab needs to communicate with an Estonian verbally. How do they spell their names to each other?

iforgotpassword · on Oct 25, 2023

I think it could lead to an accidental diffie helman key exchange.

dathinab · on Oct 25, 2023

in German there is an official convention for the non us-ascii letters, AFIK it predates computers and is rooted in germanic dialects developing differently and later stuff like non-german specific type writers, printing press machines, etc.

ä => ae

ö => oe

ü => ue

ß => ss

but one gotcha is that it's a fuzzy one way trip, some words, especially city names, can have a ae,oe,ue in their correct native spelling. Worse ss is a normal language building block in German which is pronounced differently then ß so writing it as ss is quite confusing for anyone which doesn't happen to know that it's correct spelling is with ß. To top that of some cases of ß spelling have officially changed to ss spelling over time due to people anyway pronouncing it more like a ss and getting it wrong all the time. And to some degree ß is semi-official abandoned by now.

Attrecomet · on Oct 25, 2023

The letters ß and ss are pronounced exactly the same, a hard 's' sound. Their effect on the preceding sounds is slightly different, though -- 'ss' makes the vowel short, 'ß' keeps it long. In all, a rather minute difference.

mxmlnkn · on Oct 25, 2023

After learning that ß is the ligature of literally "sz", I am astounded that "ss" is used instead to replace ß.

gopher_space · on Oct 25, 2023

> if a lot of early IT wouldn't have been dominated by a US "works for us must work for everyone" approach I think we never would have ended up with such limitations common in legacy systems

Most of the early work was in English and the people buying these systems all understood English. Nobody back then had a problem with a lingua franca for aspects of tech because there were still people around who had to learn German to study science.

pif · on Oct 25, 2023

> "works for us must work for everyone"

You are wrong. It was rather a poor decision on the buyers' side: "It does NOT work for us, but who cares?".

ahoka · on Oct 25, 2023

But then ASCII can't even represent proper US English.

giamma · on Oct 25, 2023

My first name in Italian is made of two words. That is not uncommon in Italy, but I usually write it as a single word otherwise non-Italians would assume that the second part is a second name, and address me with only the first part of my name, which I find somehow annoying.

Even better, when I was a kid and I got my first official document, the national healthcare card, the software in use did not allow a space character in the first name. The operator then decided to add a hyphen to enter the two parts of my name as separate words.

Fast forward many years, and I my first name shows up with the hyphen on the ID card and healthcare card, while it does not contain the hyphen on the driving license, which of course I got much later when software had improved.

Generally this is a non issue, nobody ever said that it's not me because there is a hyphen or a space in my name, however when I signed the mortgage for my house, the layer asked me to sign with the hyphen, and in the document he wrote both variants of my name with an A.K.A. clause.

Findecanor · on Oct 25, 2023

I have three official first names but the second is the spoken first name — a practice not uncommon in Sweden. I use only the spoken first name for most things, but all three are present in e.g. healthcare and social security databases: where the spoken name is supposed to be underlined. But the underlining often gets lost when transferred between systems, and some systems even strips away all but the first.

My last name is spelled with a double-s by one side of the family, but with a single s by the other. I've been refused to pick up packages at the mail-parcel centre only because the spelling on the parcel did not match my ID card, despite having a notice slip with delivery number that had been delivered correctly to my physical mail address.

denton-scratch · on Oct 25, 2023

I have four names: two middle-names (one was added in my childhood, when my grandfather died). No bank is prepared to acknowledge this - I'm only allowed a single middle-name.

Four names isn't really a lot; some German aristocrats (or their descendants) have 6 or 7 names, and it's quite customary for Arabs to list a shedload of their ancestors in their name (e.g. Ahmed bin this bin the-other bin whoever).

> the layer asked me to sign with the hyphen

s/layer/lawyer/

That's nuts. If a signature is anything, it's the way you customarily write your name. Fortunately my signature is unreadable, and nobody could tell whether I'd written a hyphen or not. I suggest acquiring worse handwriting (good handwriting is almost useless these days).

giamma · on Oct 25, 2023

Sorry for the typo.

The lawyer even insisted that the hyphen had to be clearly visible in the signature. I think that in those types of documents in Italy you are legally required to use a readable signature, because I bought/sold house a number of times as family grew, and every time different lawyers always insisted on this aspect.

CaptainZapp · on Oct 25, 2023

> Even better, when I was a kid and I got my first official document, the national healthcare card, the software in use did not allow a space character in the first name. The operator then decided to add a hyphen to enter the two parts of my name as separate words.

On virtually every airline ticket for which I provided my first and middle name it was just concatenated.

For example: I would provide

Frank John Sample as my name it appears as

FRANKJOHN SAMPLE on the ticket

Old reservation and ticketing systems, indeed.

rowyourboat · on Oct 25, 2023

Yeah, I have a two-word last name. It sometimes gets smushed together by the airlines, but they're inconsistent about it.

Lufthansa for example does this in a particularly annoying way, if I give my name as LAST NAME, it will automatically smush it to LASTNAME. However, if I then want to retrieve my boarding pass, it will only find my booking when I enter my name as LASTNAME, because the look-up does not smush things automatically.

selimthegrim · on Oct 25, 2023

Only United does this to me.

Yizahi · on Oct 25, 2023

In my country official transliteration to the latin alphabet has changed like 3 or 4 times over past 20 years. I had to change my perfectly valid and legal drivers license recently to make it match my passport. And I have a very simple first and last names, some people are still having issues with horrible mandated transliterations and mismatching documents. Fun.

giamma · on Oct 25, 2023

Out of curiosity, which country is that?

Yizahi · on Oct 25, 2023

Ukraine

giamma · on Oct 25, 2023

Now I understand, I work with some Ukranians and they often have similar but slightly different names (Oleks, Olehk, Maria, Mariia).

hgomersall · on Oct 25, 2023

Tbf, as much as anything that's probably the lawyer doing their job properly as much as it being a necessity for the application.

Ensorceled · on Oct 25, 2023

About to say, why are they complaining about the lawyer being good at their job and possibly saving them (or their heirs) massive headaches and legal fees down the road?

giamma · on Oct 25, 2023

I was not complaining at all, I am just saying that I have to sign with an hyphen because of some historical software limitation.

mijoharas · on Oct 25, 2023

On the other side of this it seems that every single airline disallows hyphens in last names and always either replaces it with a space, or removes the hyphen to have the last name all one word.

wodenokoto · on Oct 25, 2023

That’s absurd. So now every organization needs to be able to handle every script in the world?

One thing is if the database can handle Unicode, but what about employees? They now need to be able to differentiate Chinese characters as well as Sanskrit, Hangul and Thai script?

Of course not.

Airlines have pretty paved the way for everyone having an ascii encodable version of the name and this is so standardized that it’s even in your passport.

davedx · on Oct 25, 2023

Why is it absurd? Either you accept customers or users with foreign names or you don't. If you don't, fine! If you do, then you should be able to store those names correctly.

I find it laughable that airlines are given as a shining example of operational excellence.

lelanthran · on Oct 25, 2023

> Why is it absurd? Either you accept customers or users with foreign names or you don't. If you don't, fine! If you do, then you should be able to store those names correctly.

I think parent makes a point about employees. You can argue all you want that it is reasonable to accept and store any unicode character, but it is in no way reasonable to expect that the person on the other side of the glass to know every single script in the world, and every single unicode character.

Your position is reasonable for those people entering their own name, on their own device, which is configured to their own language - a system should not fall over on that - but that is not the use-case presented by the parent.

> I find it laughable that airlines are given as a shining example of operational excellence.

They're certainly doing it better than anyone else, IME.

The use-case is "employee has to enter the name, on their device, with their keyboard". What's your better alternative, the one that makes you use words like "absurd" and "laughable"?

After all, this use-case is not going away anytime soon.

Whether we like it or not, having a name with uncommon characters is going to make your life difficult in those cases where your name has to be entered on a device and keyboard that is non-native to you.

Spooky23 · on Oct 25, 2023

From a pragmatic perspective, they do a great job.

The point of identification is to identify people to some level of trust. In the vast majority of cases, that means that I should know that you’re the same person that I did business with yesterday.

Airlines need to tie you to an official document, which is much more complex. They do a pretty decent job at it considering all of the stakeholders who make that happen.

The hang-wringing about American hegemony is a projection of some other nationalist feeling. The constraints of Hollerith cards made it difficult to accommodate different character sets and the accommodations are codified in international treaties and business process. It will improve over time, probably first in the more cosmetic CRM side.

Accepting customers and accepting character sets or glyphs are two very different things. Accommodation is a two way street - if your name on the Starbucks cup is in Arabic or Greek, the barista in the US isn't going to be able to call it out. That not because the barista is some ignorant rube, they just don’t speak Greek.

The magic is we have a global system where many people have the ability to step in a plane and go almost anywhere and immediately conduct their business or pleasure with minimal friction. One of the friction points are issues like character sets, or poor accommodation of long names, etc.

rightbyte · on Oct 25, 2023

I can't read Greek. I think it reasonable to demand that transcribed Greek names are ok in my hypothetical restaurant, but not reasonable to demand alfa omega to be ok, for your dinner reservation. You don't have to support any script.

It is not a technical issue but a social one.

davedx · on Oct 25, 2023

Why is it reasonable to you to demand that people write their own names differently?

Also, let's not forget the context here, we're talking about a bank not a restaurant

kps · on Oct 25, 2023

> Why is it reasonable to you to demand that people write their own names differently?

Because sometimes other people have to read it. Do you actually expect your bank has someone who can read Νίκος Καζαντζάκης and 刘慈欣 and ᠲᠠᠲᠠᠲᠤᠩᠭ ᠠ and משה ברבי מימון הספרדי and ᐱᔭᐃ ᐊᕿᐊᕈᖅ? (Or even notice which one Chrome doesn't render correctly? — Edit: I probably shouldn't blame Chrome; it's fine on his Wikipedia page.)

Too · on Oct 25, 2023

Reading characters is just half the story. Good luck for the clerk behind the desk to write it on their keyboard.

wodenokoto · on Oct 25, 2023

> Either you accept customers or users with foreign names or you don't.

That's even more absurd. So now, just because I don't speak Thai, I should reject all Thai customers, since I can't read their names?

davedx · on Oct 25, 2023

Who says you need to be able to speak Thai? I'm saying you should let people store their names in your system the same way they were given them at birth.

I really don't see what's absurd about that. This website is so English language biased it's hilarious.

Btw we're talking about banks not Starbucks or mom and pop shops

lelanthran · on Oct 25, 2023

> This website is so English language biased it's hilarious.

As is most international websites. And systems. And content, in general.

It's a sad fact of life that, as we sail into a globally-connected future, the world is going to consolidate on a small number of languages, and the majority of languages are going to be left behind, discarded, and eventually die out.

People want content. They will learn whatever language gets them the most content. Right now almost all content is produced in a small handful of languages, with (in the west) English being dominant.

At this point it looks like English[1] is going to be in that small set of surviving languages.

It's inevitable. Railing against it is a pointless waste of energy.

[1] My home country has 11 languages, all official. Until widespread internet arrived it was common to find locals who could not speak English. Now, I'd be hard-pressed to find non-english speakers, even in the very outlying areas.

shiroiuma · on Oct 26, 2023

So what exactly do you expect the bank employees to do when they see a name in a script that looks like gibberish to them? They can't say the name aloud, they can't verify the customer's name matches any documents (if you don't know the language, you're not competent to verify names in the script), the name won't match any official government documents, so what's the point?

sheepshear · on Oct 26, 2023

If a Thai bank only used Thai script, I would roughly transcribe my Latin script name into Thai script. When in Rome ...

Asooka · on Oct 25, 2023

So every person needs to be able to read every single script? Cyrillic? Greek? All the variants of CJK characters' reading in people's names? Practically ASCII is enough for information interchange and I say this as someone from a country that doesn't use Latin script. By all means record "name as spelled by person in native script" as a completely freeform string, but in practise we also need "name as spelled using ASCII so everyone can read it".

gpderetta · on Oct 25, 2023

> Either you accept customers or users with foreign names or you don't. If you don't, fine!

Sure, as long as you are prepared to be sued for discrimination.

HPsquared · on Oct 25, 2023

The good thing with ASCII is that everyone, everywhere can type it in using a standard keyboard. Everyone can read, write, speak and hear the basic Latin alphabet.

Not even an English/US thing, this is the Latin script from Roman times.

marcus_holmes · on Oct 25, 2023

You do know that the "standard" keyboard you're referring to is specifically the US keyboard?

It's not even standard for English: there's a UK keyboard that is organised differently and has different characters.

HPsquared · on Oct 25, 2023

OK, we can do away with the currency symbols and brackets, and limit it to Base64. Little Bobby Tables, and perhaps a few musical artists, may need to find an alternate spelling.

latexr · on Oct 25, 2023

That seriously underestimates how common it is for European names to have diacritics.

cedilla · on Oct 25, 2023

I assure that billions of users can't. Even if they have ASCII letters printed on their key board, which is not a given, they will be unable to find that 'f' anywhere, just like you won't find most lower case Cyrillic letters on a Russian keyboard.

And even if they know how to type in all 52 letters, it's absolutely not a given that they are able to transliterate from their native script to the English alphabet.

"Everyone, everywhere" does not have a "Latin" alphabet as the basis.

account42 · on Oct 25, 2023

You really underestimate how widespread use of Latin script is, especially in computers. People cope just fine with domain names (IDN is generally still rare), foreign brands, names of famous people etc. Sure your central chinese rice farmer living in a remote village might only be exposed to chinese script but once to get to someone familiar with a computer they likely will be able to cope with ASCII text just fine even if they don't understand the meaning.

b3orn · on Oct 25, 2023

Not sure if this is still the case or only some legacy stuff, but at least for a while chinese websites used numbers instead of names, China Railway for example uses 12306.cn as their domain name.

Asooka · on Oct 25, 2023

Where do you find these computers with keyboards that do not have the full set of ASCII letters printed on them? ASCII and Latin script are the lingua franca, they're not hard to learn and it's not unreasonable to ask that everyone who wants to use computers learn them.

gdprrrr · on Oct 25, 2023

Most Keyboards only have uppercase printed on them

msm_ · on Oct 25, 2023

That's the most Ameri-centric thing I've read all day. I assure you that:

- There is no such thing as a "standard keyboard"

- Not everyone can type Latin alphabet

- Almost nobody can read, speak, hear or write latin, and it makes no sense to "hear latin alphabet". How do you pronounce "Bordeaux" using only your knowledge of the Latin alphabet? How do you pronounce "Queue" using only your knowledge of the Latin alphabet? You realise native speakers of various languages will pronounce words (for example "pain") and even letters (for example "w", "j", "y") very differently? Etc. My name is pure ASCII and every foreigner pronounces it very incorrectly (even though all the sounds exist in the English language already, just mapped to different letters).

- Not every country is related to the Roman culture.

HPsquared · on Oct 25, 2023

Anyone can listen and understand when a person spells "Q U E U E", even if they don't speak English, so long as they share a common (spoken) language. Not so with "列"

marcus_holmes · on Oct 25, 2023

Wait, so anyone can understand written English even if they don't speak English? So if I type in German you can understand it even if you don't speak German? Wie funktioniert das?

HPsquared · on Oct 25, 2023

People can spell out words in a language they don't understand, as long as they understand the letters. This is good for names. People can only do this if they understand the letters.

marcus_holmes · on Oct 26, 2023

How do you pronounce "Mebd"? Can you work out how to pronounce that name from how it's spelled? Or "Siobhan"?

Your statement isn't even remotely true.

shiroiuma · on Oct 26, 2023

They don't have to pronounce those: they can just spell out the letters: M-E-B-D. It's not that hard. But if the customer's name is "坂本", how exactly is the bank employee supposed to say that?

latexr · on Oct 25, 2023

> Everyone can read, write, speak and hear the basic Latin alphabet.

Let’s set aside everyone which is illiterate. There isn’t even a single way to speak the Latin alphabet, as pronunciation depends on the language. I have a hard time believing there aren’t people who can only read and write in their native non-Latin alphabet.

I don’t know what you mean when you say everyone can hear the Latin alphabet. Listening to sounds has no relation to the language spoken. I can hear Korean just fine, doesn’t mean I understand the meaning of the words or understand their alphabet.

> Not even an English/US thing

It is common for languages which use the Latin alphabet to have diacritics and characters not present in ASCII.

kps · on Oct 25, 2023

Blame Jean-Maurice-Émile Baudot for not including diacritics in his teleprinter code. Dude couldn't even send his own name.

mkup · on Oct 25, 2023

Just nitpicking: Latin script from Roman times lacked W and U, among other things.

cedilla · on Oct 25, 2023

Also lower case and most typography, including spaces.

SiempreViernes · on Oct 25, 2023

Funny! But you should put the /s in there to indicate it is satire :)

HPsquared · on Oct 25, 2023

It's funny because it's true.

Edit: what do you suggest as a universal alternative for bit-perfect verbal communication... Spelling out Base64 Unicode over the phone?

epcoa · on Oct 25, 2023

> speak and hear the basic Latin alphabet.

The Latin alphabet does not have a single pronunciation. And the vowels are ambiguous across the most common languages so you can't even make an argument that they are intelligible.

You've really lost the plot on this one.

HPsquared · on Oct 25, 2023

I'm only assuming the two people speaking on the phone share a common language, I think that's a fair assumption.

inkyoto · on Oct 25, 2023

I instantly conjure up an image of a Roman citizen, a retired soldier now, speaking Vulgar Latin picking up the phone and dialling Comitia Centuriata to lodge a complaint about a unpaid engagement in Punic Wars.

The clerk answers in and speaks Classical Latin.

mytailorisrich · on Oct 25, 2023

No. This is in the EU, and specifically Belgium. The claimant's name probably uses characters that are used in French as French uses diacritics and is an official language of Belgium.

So I suspect that the ruling really is that the bank should use a system that allows to correctly write the country's official languages, which seems quite reasonable.

Now under EU laws this would probably extend to EU languages, with the likely caveat that, I think, different alphabets are expected to be transliterated into the local one. E.g. Greek names are expected to be transliterated into the latin alphabet in Western Europe.

I am pretty sure that there is no expectation that people can use their names written in, say, Chinese characters as that is not reasonably legible.

BlackFly · on Oct 25, 2023

> Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement.

You can just put a note on the account: "Actual name contains accent acute over second e of first name: our system is unable to render this. User is quite sensitive about this so apologize again for our inability."

Not every organization would require this, it depends on the processing. A bank would since it sends you a bunch of correspondence. A restaurant taking a reservation wouldn't if they wouldn't save and sell the information or use it for marketing.

A4ET8a8uTh0 · on Oct 25, 2023

I find it mildly fascinating, because the parent's argument boils down to: it is more work. As other have already pointed out, airlines may not be the role model for this. More to the point, it likely is better to have actual individual identifiers assuming identification is the actual goal.

Naturally, maybe the goal is just to get this train moving somehow.

ko27 · on Oct 25, 2023

Airlines are probably the laziest of them all. You should always store the original name. If other foreign people are supposed to write or read it, then ask passengers to write in their Romanized name as well. Both should be displayed, but only the original should be used for "ID" purposes.

inkyoto · on Oct 25, 2023

It is not clear whether airlines hold travellers hostage to spelling sorrows and miseries, or the airlines are being held hostage to the spelling sorrows and miseries by travel reservation systems.

The airlines get their data feeds from the travel reservation systems – the ones we (or the travel agent) interface with via the airline website or a dedicated web portal. There are two major global ones, Amadeus and SABRE (there are other ones as well). I do not know how any of them interconnect, though.

There have been numerous attempts to modernise both (including rewrites from scratch), and all the attempts have failed so far due to the complexity of the logic: calculating the most optimal travel time for connecting flights in a multi-leg trip is, like, very hard, apparently + applying a correct airfare prices based on the selected itinerary and a myriad of other variables. Subsequently, there has been the decades' worth of growth of the said logic – there was an epic story on here some years back about one such endeavour (I think it was an Amadeus rewrite).

From what I remember, the character encoding was not even considered to be a problem in that endeavour – as in «acknowledged, is trivial to solve, now let's move on onto the actual problems».

piperswe · on Oct 25, 2023

Until recently, Southwest wasn't integrated with travel reservation systems (AFAIK), but it still handled names just like everyone else.

HPsquared · on Oct 25, 2023

Maybe we need an ISO (or whatever) standard for human names, and a standard digital representation.

raverbashing · on Oct 25, 2023

> So now every organization needs to be able to handle every script in the world?

No. But it's one thing to not support Japanese the other not to support apostrophes or accents in text written in latin characters

But guess what was also incompatible with this? ASCII. Hence the é things in HTML

You can fix this if you encode it

wodenokoto · on Oct 25, 2023

So where does the buck end? There are plenty of ornaments that different languages add to Latin letters other than accents and apostrophes.

grishka · on Oct 25, 2023

I'm Russian and I have to live with three similar names: my real one written in Cyrillic, the stupid transliteration into Latin I have in my international passport (and by extension everywhere I show it abroad), and Gregory when I introduce myself to someone who doesn't speak Russian.

Will a bank in a country that doesn't use the Cyrillic alphabet agree to write my name in Cyrillic? I'm 99% sure no. So, organizations fully supporting the writing system of the country in which they operate is a reasonable expectation. But I never expect anyone abroad to agree to write my name in an alphabet they can't even read.

Tangentially, Turkey renamed itself in English into "Türkiye". This suffers from the same issue — there is no letter ü in English. No one knows how to spell that. So it's no surprise the new name didn't stick.

cameronh90 · on Oct 25, 2023

I don't really care how I'm identified, provided I know it's me.

But the security problems are real and very annoying, even with a "normal" name. For some silly historical reason, my middle name has two different spellings depending on which official document you're looking at, which causes no end of problems. Conversely, I know people with very simple, common names, the kind where two people born in the same city on the same day have the same name, that suffer a load of different but equally frustrating issues.

These problems, however, aren't just limited to computer systems [0]. Border officers, bank clerks and government officials and other general bureaucrat types are all just as bad as a strict strcmp implementation.

The issue is that many people operating in an official capacity seem to work under the assumption that a name is a globally unique identifier that has exactly one representation, but that's not how names work in any culture.

On the other hand, if not names, then what? Combined with birth dates, it's the closest thing we have to a globally recognised unique immutable identifier, and any better alternative is going to feel invasive and face a lot of opposition on privacy grounds.

[0] https://www.youtube.com/watch?v=nq-dchJPXGA

sokoloff · on Oct 25, 2023

Names are mutable though.

aleph_minus_one · on Oct 25, 2023

This depends a lot on the country.

In Germany, it is intended to be hard to change your name; in the USA, the situation is different. Also in Germany, if you change your name voluntarily (without having a very good reason (CLARIFICATION: of course marriage is a good reason)), this is considered to be a strong sign that you deeply hate your parents (and did the name change because of that). Indeed, the people who I know who changed their first name to their middle name exactly did it because if this.

konha · on Oct 25, 2023

> Also in Germany, if you change your name voluntarily […], this is considered to be a strong sign that you deeply hate your parents

Huh? I’m German and I have never heard of this.

aleph_minus_one · on Oct 25, 2023

> Huh? I’m German and I have never heard of this.

I am also German. The only people (in Germany) who I know and did a (forename) name change did this because they hated their parents, since changing the name is an open signal that you deeply hate what the parents did to you your whole life (including giving you your old name).

kasey_junk · on Oct 25, 2023

Do women generally not take their husbands name when they marry?

roelschroeven · on Oct 25, 2023

Here in Flanders some do, most don't. Why would they? They are as much a person as their husband, with an identity and family that is not any less valid than their husband's.

kasey_junk · on Oct 27, 2023

My point isn’t that they should (my wife didn’t) only that it’s very common for them to do this so thinking of names as mostly immutable is incorrect even where other forms of name changes are rare.

This is in fact one of the iconic places where lack of diversity in software teams cause problems. Men may have a blind spot around name changes that women don’t, because of their lived experience.

aleph_minus_one · on Oct 25, 2023

I rather had forenames in the back of my mind when I wrote my post. But indeed, because it is hard to change your name in Germany, I know quite some women who used marriage to get rid of a surname that they did not like, because there is hardly any other socially accepted way to get rid of it.

cameronh90 · on Oct 25, 2023

They're also not globally unique and don't have a single representation.

But they're the closest thing we have that would be acceptable to most people. Better suggestion?

bandrami · on Oct 25, 2023

GUID assigned at birth and implanted in an RFID chip behind the ear.

anticensor · on Oct 26, 2023

We need a World Citizen's Number from the United Nations.

CelestialTeapot · on Oct 25, 2023

Like normal...

GoblinSlayer · on Oct 26, 2023

Passport number, phone number, tax account number.

braiamp · on Oct 25, 2023

How the state identifies you, is one of those things that require a single source of truth. Civil registry is usually considered the source of truth, as in that everyone that doesn't agree has to agree with it.

Someone · on Oct 25, 2023

> Civil registry is usually considered the source of truth

The USA doesn’t really have a civil registry. https://en.wikipedia.org/wiki/Civil_registration#United_Stat...:

“In the United States, vital records such as birth certificates, death certificates, and frequently marriage certificates are maintained by the Office of Vital Statistics or Office of Vital Records in each individual state. Other documents such as deeds, mortgage documents, name change documents, and divorce records, as well as marriage certificates for those states not centralizing these records, are maintained by the clerk of court of each individual county. However, the term 'civil registry' is not used.”

I think that means your marriage may be documented in county C1, your subsequent divorce in county C2, and your name change in county C3, none of which need be in your state of birth.

klausa · on Oct 25, 2023

This is very much not true.

Ask anyone with western name who moved to Japan (or a myriad of other countries, but I am in process of doing the same right now so it hits close to home) whether the state has a single source of truth for their name.

cameronh90 · on Oct 25, 2023

My birth registration is handwritten in difficult to read calligraphy, with no unique numbers, and hasn't been digitised.

It may be the source of truth, but it's not a very good one.

fodkodrasz · on Oct 25, 2023

> weird name

Yeah, some people think about their heritage as not compatible with the brave new world, so get rid of it. I know people giving their child dedicatedly ascii-only name common in the USA to make sure their children won't have a "weird name" when going to the USA.

lelanthran · on Oct 25, 2023

> Yeah, some people think about their heritage as not compatible with the brave new world, so get rid of it. I know people giving their child dedicatedly ascii-only name common in the USA to make sure their children won't have a "weird name" when going to the USA.

And? That's what I did - a simple name that he won't have to repeat, spell out for English speakers, not have to spell out for many non-english-speakers, or have trouble non-latin scripts.

Anything you can do to make your child's life easier trumps any value you think they might get from maintaining cultural or traditional links with a mostly dead past.

jdietrich · on Oct 25, 2023

I legally changed my name about twenty years ago, after experiencing multiple days of wasted effort dealing with systems that a) couldn't render my name correctly and b) couldn't cope with fact that their incorrect rendering of my name might not match other legal documents that correctly render my name, or incorrectly render it in a different way.

snordgren · on Oct 25, 2023

We would have loved to name our son after my grandfather, but his name includes non-ASCII characters so we went with his middle name instead.

Y_Y · on Oct 25, 2023

Good old grampa \NUL\BEL\ACK would be rolling in his gràve if he knew that foreigners with non-ASCII names had taken over his country.

soco · on Oct 25, 2023

Oh the little Bobby Tables right https://xkcd.com/327/

fodkodrasz · on Oct 25, 2023

This is the kind of compromise I'd never take. (Yet still a million times closer to my world view than the decision I mentioned. Guess I'm stubborn in some topics.)

PrimeMcFly · on Oct 25, 2023

Why not just use a spelling without accents?