You try and have a weird name in a foreign country. If you're not careful, everyone will spell it differently, and then you'll spend ages going "No, I'm sure I'm in your system. Try this other spelling".
At least back in the 90s when I often had to deal with that, you were usually talking to a real life person, sitting on the other side of the desk. You could show them your documentation.
Today since everything is online and computerized, the risk is that a computer somewhere in the chain will just go "The bits don't match", and it'll be challenge to even reach a real person, let alone one capable of even understanding what problem you're having.
Here's a real problem this could cause: To get a visa you usually need to prove ties to your country. This normally includes a bank extract. If the name printed on your bank extract doesn't match what's printed on your documentation that means a very real risk of rejection, and not going anywhere if you can't fix it fast enough. And I can imagine other very not fun possibilities, like having some sort of KYC/AML snag where something decides you lied about your name.
to double down on it it's not even about a weird name per-se but lot of very normal EU names
if a lot of early IT wouldn't have been dominated by a US "works for us must work for everyone" approach I think we never would have ended up with such limitations common in legacy systems (there still would be limitations, pre-unicode the solution was custom code pages and similar, which all supported some subset of non us-ascii but only a subset)
luckily today unicode is the standard (through for some cultural and historic aspects it's sometimes not enough)
I don't think this is strictly a technical problem, at least not when it happens in international contexts (it's inexcusable for your own country's authorities to not be able to record your actual culturally-specific name).
The reality is that there is a limited char set that is actually understandable at an international level, and it's not that different from ASCII. Even with paper systems, if you go to Rome to sign into a hotel and you give your name as 依诺, they will not be able to even write it down, nevermind pronounce it. And even if you tell them it's Yī Nuò, they will likely ignore the accents since they won't know what those mean. Similarly, if you go to China and say your name is Sângeorz-Băi, they will not know what the diacritics mean and will not be easily able to write them down.
In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in, and that includes names. The situation in writing is actually much much better, even if limitted to ASCII, than it is in actual spoken language. Maybe you can write my name down perfectly (Simionescu), but I would bet you won't use the proper pronunciation unless you happen to know Romanian - you will likely use different vowels and consonants.
What's interesting is that the transliteration of symbols is not even remotely uniform in icao 9303. There are multiple recommended transliterations of some characters, and it definitely goes only in one direction: national script -> MRZ transliteration. It is not possible to go the other direction.
It's not intended to round-trip, it's intended to be roughly human-readable without knowledge of the original script. It's pretty close to the system Olympics used, with the Wikipedia example of Hämäläinen -> HAEMAELAEINEN being well known as a gold medalist cross-country skiier.
Newer versions of the transliteration encourage stripping diacretics, so that would be HAMALAINEN. Much more readable to native speakers, but obviously loses information.
I wouldn't recommend it as there are official tranformations from the countries which uses diacretics and they most times are not to strip them. It's kinda another case of people forcing stuff onto other cultures. And if you do bussiness in some of the countries in some industries you might even get into legal trouble if you apply that.
Take it up with the spec, then. That is the recommendation:
> Section 6 of the 9303 part 3 document specifies transliteration of letters outside the A–Z range. It recommends that diacritical marks on Latin letters A-Z are simply omitted (ç → C, ð → D, ê → E, ñ → N etc.), but it allows the following transliterations: [...]
You said
> you might even get into legal trouble if you apply that.
We're talking about passports, this seems not relevant. For passport-related use such as travel, you use the form of the name written on the passport, exactly as-is.
It seems odd to me to arbitrarily restrict the alphabet if the only requirement is that the data has to be readable by a machine through an OCR system. They could have easily used the Latin alphabet to encode arbitrary bit strings.
> alphabet if the only requirement is that the data has to be readable by a machine
its the only requirement because
1. it's only meant for OCR
2. it's clear that it won't be used by only OCR but at least also human interacting with the system, potentially phone calls passing this information by voice, and anyone who can't pronounce the original spelling. For fairness if you e.g. didn't sing at all in your life are somewhat tone deaf and now are expected to pronounce a asian you probably have to spend days or more until you can do so (just as a extreme example).
If you mean ASCII, you'll also notice that it happens to correspond pretty well with the entirety of writing symbols which have broad global recognition, and this has been true for a long time. Sure, it's missing many many culturally specific things like accents and other diacritics, non-latin writing systems etc.
But none or at least very few of the symbols missing from ASCII are actually broadly understood by people from more than a handful of countries (which can still mean a billion plus people in the case of Chinese, Arabic, and Indian scripts, of course).
Probably Arabic is the biggest counterpoint to my claim, as there is quite a large array of countries across two continents that recognize it. However, even there, there are far more people in countries which use Arabic writing that also recognize Latin letters than the other way around.
The many diacritics used by various European languages are definitely NOT something that has any wide adoption or meaning. Perhaps only the umlaut sign and the accent are even used by more than one or two European languages.
So again, my claim is that any system of writing that is intended for global international communication will have to restrict all names to the A-Z characters in ASCII with spaces as separators (and perhaps 0-9 and a few other characters that would anyway get ignored). Nothing else will work if people around the world are supposed to recognize the name in some meaningful sense. And relying entirely on automated OCR is a no-go for many use cases.
And just like people who interact with those outside their cultures have to accept that their name will be pronounced in a myriad of ways, they have no reason not to accept that it will be written in different ways as well.
> it's missing many many culturally specific things like accents and other diacritics
fun fact: some of the symbols included in ASCII were intended to be used as (non-spacing) diacritical marks, specifically the tilde/caret/backquote characters...
[too lazy to dig up a proper source at the moment but the Wikipedia ASCII article covers some of this]
I completely agree it's not purely technical nor purely social but it is a real problem. Personally, I lost out on an equity options exit event due to delays caused partly by these exact issues and visas.
In Japan, all residents (citizens and people living with a visa) have to have a katakana spelling of their name. Web forms usually ask for this (in addition to your name spelled in kanji, which of course is impossible if your name is in latin characters, so you just hope the form accepts latin), and it's used in many other places as well, such as for bank accounts. Sometimes places will take your latin-character name and transliterate it themselves, and then this causes problems when their transliteration doesn't match that of other places. (katakana has far fewer distinct sounds available than most other languages, so the transliteration always loses information, and there's usually different ways to do it.) Even worse, many forms (like web forms) have rather short character limits for the name field, so with a transliterated Western name, it many times just won't fit within the ~10 characters they allocate.
> In all times in history, when multiple cultures interact, they have to find a common subset of their languages to communicate in […].
Oh… where do we start… We do not have to go as far as finding an intersection of multiple languages. Consider English as an example. I have written up a fictitious but a reasonably real dialogue between:
1. A layperson from a lower socio class, who has not attained high educational levels and speaks English using vernacular and predominantly Germanic vocabulary of the English language.
2. A state citizen of the upper-class descent who speaks English almost exclusively with Latin/French/Greek-derived vocabulary.
This layperson complains about not receiving a welfare payment from the state.
Layperson (L): Oi mate, I ain't got me geld from the state yet. That's daft, ain't it? Every man's got a right to his share, right?
Educated citizen of the upper class descent (U): Pardon my incredulity, but are you referencing the monetary allocation designated by the government for individuals of a particular socio-economic standing?
L: Eh? Oh, you mean the dole? Yeah, that. They owe me, but there's no dosh in me pocket yet.
U: If I interpret your sentiment correctly, you are perturbed due to the delayed disbursement of your financial entitlement. Have you endeavoured to communicate with the pertinent authorities?
L: Talk to who now? Oh, you mean the blokes at the town hall? Aye, but they keep spieling some rubbish. Can't make head or tail of it.
U: My advice would be to liaise with the relevant office, elucidate your predicament, and seek resolution. It is paramount to ensure you have met all requisite criteria for the stipend.
L: Right, so you're saying I should have a natter with 'em and make sure everything's shipshape? Just want what's owed to me, y'know.
U: Precisely. Engage in a dialogue with them, ascertain the cause of the discrepancy, and ensure you have fulfilled the necessary prerequisites for the allocation. You deserve your due compensation.
L: Cheers for that. It's a bit of a muddle, all this, but I reckon I'll give it another whirl.
U: I wish you fortitude in your pursuits. If there is an inherent right to such financial assistance, it is imperative you receive it posthaste.
Even though the layperson does understand responses in the fictitious dialogue, that would not be the case in real life. Both speak the same language, yet the responses are generally incomprehensible to the layperson.
In some cases I’d even suspect the incomprehension might go both ways. If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return. Sticking to their upper-class dialect would be passive-aggressive oppression born out of class contempt…
> In some cases I’d even suspect the incomprehension might go both ways.
Which is also true.
It was a thought experiment to highlight the fact that the lack of comprehension could also arise within the boundaries of a single language. In linguistics, the term for this specific phenomena is «the social register», and there are plenty of active and thriving language that employ the social register in the daily speech. Korean, for instance, is renowned for having a highly complex system of the social registers (effectively, parallel vocabularies) embedded in the spoken language. There are other languages as well.
> If the "educated" citizen is truly educated (and not just upper class), they ought to not only understand the lay person, but chose lay words in return.
And that is also true. Social registers have largely disappeared from mainland European languages, yet an English accent and the choice of the words of an English speaker can reveal sufficient details about their socio-economic background.
I get that it was a USAcentric thing, and that we should always be active w.r.t. calling out ethnocentric behavior.
But it was also an "8-bit" thing and a "extremely limited computing resources" thing. EBCDIC was designed in 1963/1964.
I mean, when you've got 8 bits to represent a character, and there are more than 256 possible characters... what do you do?
A truly robust solution like Unicode would not have been feasible with the resources of the day, and even a "simple" 16-bit scheme would barely be able to contain all 50,000 Chinese characters.
The blame here lies with the Dutch bank who willingly chose an EBCDIC solution in 1995, although I'm sure they were dealing with various constraints and pressures as well.
I can tell you that German bank thing I signed up in 2023 and can't be older than ~5 years asked for my FULL name as written on my ID, in my case it's "First Second Last" but I go by "First Last" but in their infinite wisdom they decided to ask for the full thing (which I understand, it's bank stuff) but never think about what they call me in their stupid email.
No one's ever called me "First Second" - not even my parents. And I can't even be mad, but I'm still disappointed. (Fortunately my name is ASCII and since we can do more than 8+3 I've never had problems.)
This is completely normal for official documents in Germany and it makes sense for us.
Technically we do not have first, second or any other numbered names. Our given names form a set in the mathematical sense and any one is equally valuable. This comes from the tradition of given names being given by godmothers and godfathers and we wouldn't want to get into the issue to ever have to value one of them over another. At least this has been the case in some parts of Germany and has influenced the official regulations for names.
Of course the names have to be put into an order on your ID and to keep things simple banks, schools, authorities, etc. ask you to use that order on their documents.
Traditionally, official documents just used the surname with "Herr" or "Frau" but nowadays they often use just the given name in first position on your ID.
If never heard of a "First Second" case with one exception:
Given names can be connected with a dash. In this case the order is fixed and the whole unit is treated like a single name. While in principle arbitrary names can be combined there are certain very common combinations, like "Hans-Peter", "Karl-Heinz" or "Franz-Xaver". If you happen to be named "Hans Peter"
(without dash) it's likely that they assume the dash and will call you "Hans Peter" or "Hans-Peter" all the time.
There is a very mild version of that in the US -- lower likelihood of blind assumption, but still present: when a set of two given names starts with "Mary" or ends with "Ann/Anne". Examples include "Mary Jane", "Mary Kate", "Jo Ann", and of course "Mary Ann". Some have simply merged into single names like "Maryanne" and "Joanne" more recently. There are probably others.
> This is completely normal for official documents in Germany and it makes sense for us.
Of course, and I mildly apologize for my case of Whataboutism because I actually described the reverse. They're taking the rules too literally and are using the thing they need for official documents everywhere (their marketing/status emails).
I'm just just kinda puzzled why they'd think it's a good user experience, especially for people who are not just not used to reading their government id name but actually uncomfortable (i.e. pending name change).
My first name starts with a W (let's say it's "Walter"). In India multiple times I have spelled it out over the phone and then received a letter addressed to "Uualter".
Hehe, and my name contains an ß which is often confused with a B.
I started to point out the machine readable area instead of just showing my passport…
I've seen that forever and not just on United. I have thought it's something about the underlying SABRE system that many airlines use. Maybe someone here knows more.
> I've seen that forever and not just on United. I have thought it's something about the underlying SABRE system that many airlines use. Maybe someone here knows more.
I don't remember the precise details, but some airline website's password had restrictions at one point that made it super-obvious that they were internally converting alphanumeric passwords to digits based on the US telephone key mapping.
I remember thinking at the time it might ultimately have been due to SABRE (because I believe that's literally one of the oldest computer system still in use), and screen-scraping some telephone menu system depressingly seems like something someone would do for expediency.
I wouldn't be surprised if a system like that also mangles names.
> Usernames and passwords containing letters need to be translated to numbers to enter them in a Fidelity phone system (like FAST, or if you call a representative). Use your telephone keypad to convert the letters to numbers. There is no case sensitivity. Substitute an asterisk (*) for all special characters.
https://www.fidelity.com/customer-service/need-help-logging-...
I can tell you from personal experience that if you have four names it will turn "First Second Third Last" into "Firstsecondthird Last" (I usually fly Delta).
I asked a checkin agent to fix it but they said it will start rejecting my ID if they change it at all.
On British airways (and I believe other ticket systems that use Amadeus), I often get LastnameTitleFirstnameSecondname all as one word (in caps). It certainly looks funny on the boarding pass, but I've never had any issue getting through security.
Something kinda similar, I applied for a PH passport and they ADDED a third name to my name. Instead of my actual "first second last" on my official PH documents I'm " first-second third last". The third isn't anywhere in my name/us birth certificate or any other identifying documents and not at all what my parents named me. I only use my US passport now because it caused a bit of confusion with my US departing airline ticket first name not matching my PH passport and if it had that, then the arrival into the US would not match my US passport.
The IT in question started with Hollerith cards¹, processed by electromechanical equipment. These were originally numeric only — digit n represented by a hole in row n which would stop or start a counter wheel. (Punched cards were processed row by row, not column by column.) The alphabetic extension added a second hole near the top edge, handled using much more complicated and expensive equipment. EBCDIC was originally a straightforward mapping of these holes into an 8 bit space, and its arrangement makes sense seen that way.
ASCII on the other hand derives much more from communications equipment (telegraphy) than IT gear.
I think you mean “built for us, and meets our needs”. It’s not the US’s problem that other countries don’t necessarily take innovation risks, but instead buy our old stuff.
The difference is that most countries don't expect the world to bow to them (culturally, technologically, etc). While I used Chinese products, I never had to learn how to spell my name with Chinese characters.
Even right now in modern day Japan I have to canonize my name in katakana (syllabary designed for foreign/loan words), and all the systems strictly expect a singular word First Name and a singular word Family Name. If you have a middle name, it effectively gets thrown out. Multi-word first and/or last names need to be smooshed or cut down.
I have encountered even worse issues digital forms that only accept kanji (Chinese characters) or hiragana (syllabary designed for native Japanese words), the latter of which usually does not support certain voices that katakana supports. Ashley Tisdale, for example, is normally rendered as アシュレイ・ティスデイル (ashurei tisudeiru) - ティ is actually te with a small -i modifier, which does not usually exist with hiragana. Forcibly converted to hiragana, it turns into あしゅれい・てぃでいる - but ぃ is not accepted by the form, even if it exists in UTF-8. Your options are either converting the ティ into ち (chi) or て (te), neither of which are ideal, and may cause mismatches to other systems that properly support the katakana version.
The problem extends further into physical paper forms, where often they provide a very limited amount of boxes for characters, because native Japanese and Chinese names can easily fit within 8 characters. Combine this with the digital systems above and you're bound to have several versions of your name floating around on official documents all mismatching each other.
Some systems that need to print onto physical cards (e.g. getting a 1/3/6 month route pass on your SUICA or PASMO contactless smart cards) are even worse and turn dakuten (diacritics for hiragana/katakana) into their own character. As an example, the character ほ (ho) can be turned into ぼ (bo) using a dakuten, or ぽ (po) using a handakuten. The system will instead render those as two separate characters: ほ゛ and ほ゜ respectively, which cuts down on the number of available characters for the already limited textbox space you're dealing with.
The world is full of presumptions about names even today.
> The problem extends further into physical paper forms, where often they provide a very limited amount of boxes for characters, because native Japanese and Chinese names can easily fit within 8 characters.
This happens in Europe quite often, even though many people have longer names.
Any idea if this is why, in Japanese-dubbed anime, the voice actors seriously mangle some English words/names? E.g., they often add a vowel sound to the ends of English words that should end with a percussive syllable.
I.e., do you think it comes from those words/names being written in katakana or hiragana in the dialog scripts, and those systems just can't express the correct pronunciation of such English words/names?
Actually, it's probably a simpler reason than that. The Japanese language is largely a CV syllable string (consisting of a consonant and vowels); consonant clusters do not exist, and the only final consonant permitted is 'n'. English, by contrast, is a much more phonotactically complex language--consonants can pretty freely appear both before and after vowels in a syllable, and English also has several consonant clusters. Imagine trying to pronounce the word "strengths" if your native language lacks consonant clusters--it's like an English person trying to pronounce the Czech phrase "Strč prst skrz krk". On top of that, Japan is not great at English proficiency (it's definitely weaker than any other rich country, see https://www.ef.com/wwen/epi/).
It's not really that the written language makes the names hard for them to pronounce, it's that the spoken language doesn't make it easy, and there's probably not enough care to try to pronounce them. Where the written language does make it hard, it's usually when people try to localize Japanese media into foreign languages, and the intended references in names are lost because of the mangling process of transcription into katakana.
As an English speaker who has traveled to Japan without learning much of the Japanese language, I agree generally but I also noticed that there are some cases where a vowel is written but not pronounced. For example, "gosaimasu" is mostly pronounced without the "u" (creating a counterpoint against final consonant other than "n" being forbidden) and "gozaimashita" is mostly pronounced without the second "i" (creating a counterpoint against consonant clusters such as "sht" being forbidden). It gives me the impression that these rules exist more in written Japanese than spoken Japanese, at which point it becomes less clear why adding a vowel to the end of foreign/imported words is so common. Maybe it's just my English perception that the sounds /s/ and /sh/ consist of pronouncing only a consonant, when in reality the fact that those sounds have duration (not just a moment) actually means it's more of a vowel even when totally unvoiced!
As I think on this further, even these voiceless /s/ and /sh/ sounds involve putting the lips into either an /u/ or an /i/ shape based on the following vowel even if that is also voiceless, creating that which is not a syllable in English, but perhaps is for this purpose in Japanese. The C-V cadence and final vowel (given lack of final -n) rules are satisfied...
Second, in Japanese dubs these words are not usually actual English words, but Japanese words originated as borrowings from English language, so voice actors don't actually mangle them, the same way as English speaking people don't mangle the word "coffee" as they usually pronounce it, despite it being different from how Italians pronounce "caffè".
> Any idea if this is why, in Japanese-dubbed anime, the voice actors seriously mangle some English words/names? E.g., they often add a vowel sound to the ends of English words that should end with a percussive syllable.
I don't know anything about anime, and little about Japanese, but I think Japanese (and Chinese) have a fairly strict consonant-vowel form for all their syllables. That makes foreign words that have runs of consonants or do not end it a vowel hard to pronounce, so speakers of those languages have a tendency to insert extra vowels to make pronunciation easier for themselves.
It's kind of like how English speakers will usually change the Pinyin "X" (as in Xi Jinping) into an English S or SH sound when they try to speak it, because the actual sound doesn't exist in English.
I think it's more that Japanese speakers just don't have those types of sounds in their phonetic repertoire. Some may be able to pronounce them, but most will not (and may not even notice the difference).
Every person has a certain limited set of consonants, vowels, diphtongs, triphtongs, tones, and even syllables that they are able to recognize and reproduce. This is something you can train to recognize more, but you will probably never be able to pronounce or even distinguish the totality of all those used in all languages, even just the living languages on Earth.
Even if you did, there is an added complication that some languages actually used multiple sounds interchangeably, and explicitly distinguishing them may actually confuse you. For example, most European languages recognize various consonants as the same "R" sound, even though they are vastly different (French R is a back of the throat trill, Italian R is a trill near the palate, and English R is articulated next to the palate without any trill). If you come from a language where these are distinct sounds, you may have trouble understanding that two people who use different R sounds are pronouncing the same word.
There is also the R/L problem, A sound that to me, a native english speaker, is fairly distinct. However these are the same sound in Japanese. Because of this I think that it is very hard for Japanese speakers to figure out which one to use and they get switched all the time.
If modern computers had been invented in China and had had a decade or two headstart on the rest of the world then you may well have had to do just that.
This was an accident of history, not some deliberate plan to get the world to bow to the English speakers. And English was already well established as a major language in trade (due to it being superficially simple to learn), next to German, French and Spanish. China was pretty isolated for a long time culturally as well as geographically and the complexity of its script is another barrier to it being accepted as a common language by the rest of the world.
One of the more interesting things along this line in recent history is that with Brexit the EU no longer has an England/Wales/Scotland and a chunk of Ireland in it, but another chunk of Ireland remains. This led the French to immediately propose that French become the official language of the EU parliament but the rest of the countries wouldn't have it, and rightly so.
> This led the French to immediately propose that French become the official language of the EU parliament but the rest of the countries wouldn't have it, and rightly so.
Didn't happen, they just said they'll use French during their council presidency (not the parliament, it's not even mentioned in your article), that's all, there are no rules against that. They would've done it regardless of Brexit.
Nothing to do with French seems a bit strong. It's related. From Brittanica:
> lingua franca, (Italian: “Frankish language”) language used as a means of communication between populations speaking vernaculars that are not mutually intelligible. The term was first used during the Middle Ages to describe a French- and Italian-based jargon, or pidgin, that was developed by Crusaders and traders in the eastern Mediterranean and characterized by the invariant forms of its nouns, verbs, and adjectives. These changes have been interpreted as simplifications of the Romance languages.
Heh, TIL, thanks. Obliquely, I was in Venice some years ago; sitting on the steps of a church I set to rolling a cigarette. A couple of small boys stopped and stared at this activity, one pointed and said "Il fabricato fumer!", I knew exactly what he was saying (although I have no Italian). So Venetian it is.
I think that french diplomat just saw their shot, and took it. I doubt they actually forgot that there's still two english-speaking countries in the EU.
I, however, don't think most of the people who started using one of the named languages instead of their mother tongue ever really selected English using that specific criterion.
You're saying this like it was some deliberately hostile, colonial move to impose ASCII on the world. But I don't think it was quite like that, more that in the beginnings of computing people designed and built things for themselves. And it just happened to be that a lot of that early work happened in the anglosphere.
I honestly think it has more to do with culture. I've never been to the US so this might be completely wrong, but my observations from talking to people and just observing:
- if you move to the US and have a name made up of non-ASCII chars you are more likely to either drop them/substitute them with ascii chars, or use the Anglicized version of your name if it exists, or adopt an English name. And then it's kinda easy to legally change or your name. Or screw it, it's kinda easy to just show up and tell them you're Johnny Awesome and then you're Johnny Awesome.
- if you move to Germany, you can't legally change your name at all without good reason, every document ever, no matter how informal (especially at school) will probably have your full name, maybe hopefully just "First Last" and not all 7 of them, everyone of authority will refuse to call you Johnny Awesome if your name is actually Johnathan Jean-Pierre Awesome-Livingston, and so on, oh and they will also fail to not butcher your name if it's not so easy a 4y old can learn it.
We can't be the only ones leaning more towards #2. And no, I'm not making this up, my go-to example is that I've seen cases where things like officially not calling "Bill Gates" "William Gates" have met resistance. Your name is your name, and I'm still not sure how people in the spotlight are able to be called Dick, I'm not joking.
Try living in an asian country - probably you will have to choose a name in the local script which at best vaguely sounds like your given name. It's expected that if you use someone elses playground that you adapt to their rules - that goes for moving to a foreign country and to using technology primarily developed in one.
> expect the world to bow to them (culturally, technologically, etc).
I mean, I don't expect you to bow to me.
But at the same time, the software I produce at work is usually entirely consumed by americans who speak english (ok, well, there's one canadian customer that I'm aware of). Because that's who pays for it, and none of those customers is particularly looking to pay for translation.
And the software I produce during my off hours is generally meant for me and my friends to consume. I'll put that on github/gitlab/source hut and you can use it if you want, but I definitely don't have the budget for translation either.
China has its own problems. There are obscure family names out there consisting of characters that aren't officially recognized, so computers can't process their actual family names. So those people instead pick the closest alternative officially recognized character instead, purely for the purpose of official documents and appeasing computer systems.
I think in premodern times the Chinese character set was not as centrally regulated as it is now, and therefore there should be quite many instances of independent/local character invention.
many chinese characters consist of combinations of other characters. most common is a combination of two, where one component suggests the meaning, while another hints at the pronunciation.
this shows that new characters can be created not by inventing new strokes, but by simply combining existing characters to convey a new meaning, much like we occasionally do create new words in english by combining existing ones, even though that process in english is not productive, unlike eg. german, where it is quite normal. the difference is that these new words only have one syllable.
with the digitalization the creation of new characters essentially ends. the creation of the simplified chinese character system also pushes against creating new, more complicated characters.
it is going to be interesting to see how that will affect language development. new "words" can still be created by using a sequence of characters, but that means that each character keeps their syllable sound. whereas new compound characters would have a single syllable. so if a new meaning emerges for a syllable, a new character can't be created for it. will this prevent new single-syllable words? or will it lead to multiple characters being pronounced with a single syllable?
Do Chinese characters always have the same pronunciation? In Japanese at least, their Kanji (which are derived from Chinese characters) are often read in entirely different ways in different contexts. For example, 二人 is read as "futari" (two people), but ニ alone is read "ni" and 人 alone is read as "hito".
Mostly yes. In Mandarin, tone can be a bit different depending on context but overall pronunciation doesn't differ that much.
But a major caveat is that pronunciation can be wildly different when spoken with other dialects. Mandarin and Cantonese reading of the same text, even with same meaning, sound entirely different.
that's a good question, i know that there are many characters that share one pronunciation, but i have not come across the reverse. there are different pronunciations in different dialects/languages of course, and maybe some of those get adopted by other dialects (that would make sense for food names for example) but i didn't study chinese, so i really don't know.
you haven't been in china long enough. i have had a few situations where the system was unable to write my name in latin characters. i even had to get a notarized transliteration of my name into chinese so that the resulting chinese version could be used on some official documents.
but it's especially stereotypical for mainly the US, China and I think Japan.
Which are all countries where to due to various reasons (size, culture/nationalism) there are a lot of people doing technical decisions which: 1st only speak the countries language, 2nd have little interaction with very different cultures
(in the US it's complicated, they have a lot of mixing other cultures into them, but do so in a very very US specific way with a lot of unaware cultural appropriation (and I don't mean this in the "bad/evil" way it's often used today but the cultural normal way) and the US is so large that there is little reason to make a trip to a country which is very different, and even if they do so, it's often in a form which is very touristy. This leads to situation where e.g. US citizens claim they are Spanish because of some ancestors and claim to practice Spanish culture but they are 0% clueless about actual Spain even after having traveled there twice or so. Contrast this with the EU where e.g. spending on study semester in another country which does have a completely different language and culture isn't rare, and non touristy holiday trips to other countries are common too (I mean in some cases it's just a few hours by care) and it's very easy, to have people which just don't know better.)
In the early 90's me and a couple of other French guys had taken to writing French unaccented, so that we wouldn't suffer the daily pain of character set problems - we really bashed our own heads to fit into lower ASCII !
Same here in Poland. I got used to skipping the diacritics, and quickly learned to mentally decode text that was rendered with the wrong code page.
And maybe this is a form of Stockholm syndrome, but to this day, I don't really mind sticking to lower ASCII - it just makes things easier (or at least until recently it did), and I don't really care about that '³' in my surname. Sorry, I meant 'ł'.
Possibly related to that:
- I always stick to US English language when using software, even if it offers Polish, because I don't trust translations. They're usually done by people who don't have enough context and knowledge to do it right. I've been burned too many times by this. Plus, localized error messages hinder searching for solutions.
- I wouldn't mind if everyone switched to English[0] as first language and called it a day; there would be tremendous economic and social benefits from that to everyone, far outweighing the loss of a little bit of cultural variety/noise.
- I'm strongly in favor of meeting machines half-way. LLMs aside, it's trivial for people to learn a small controlled vocabulary here and there (like e.g. "OR" and "AND" and quotes in search queries), allowing to make interfaces vastly more predictable, reliable and comprehensible.
--
[0] - Or French, or Chinese, or Swahili, neither of which I know - to stave off the usual replies of the "you only want it because you already know English" kind.
> I always stick to US English language when using software
Of course, the computer's native language is English - anything else would be silly. This may be a generational meme: young people have French language environments - even most of the computing professionals... But I keep my habit of mixed locales with metric measurements, English-language UI, mixed ISO 8601 and French dates. God bless UTF-8 though !
I really, really hate how Amazon tries to force US customary units (inches, etc.) on me in automatic translation just because I set the language to English.
I guess Polish people had it worse than Romanians and Hungarians. All our accents are simple to strip away and they still kinda sound like the base letter.
But I agree with you: I don't trust translations and I think the benefits from humans using a single language would be amazing.
Same for me. I am Italian but I always used computers with English localization because translation was often bad, especially IBM translations.
So I got used to using US keyboards also, and never using accented letters even though they are frequently used in Italian, e.g. instead of writing "Mario è alto" (Mario is tall) I would write "Mario e' alto". It helped a lot that I worked for almost two decades for companies having only non Italian clients and all communications internal and external were in English.
Now that I am working for an Italian company with Italian clients, I am slowly getting used to Italian keyboards and accented letters.
The funny thing to me is how these inofficial transcription rules differ from country to country. Seems most people with ö or ü from a Turkish name are happy to drop it to o and u (cf Mesut Özil) but Germans are absolutely not. (Not saying this is a rule, just what I observed.)
German has an official and fully information preserving transcription to basic latin (which is really what all this talk about "anglo letters" is, just the basic latin letters in common use in the early modern era, with no diacritics at all), which can be used in official documents, too. Other languages, like latinized Turkish, obviously copied the diacritics, but seem to have left out the transcription rules, most probably because they borrowed the letters long after their history was relevant.
For German these rules are offical. They're used in the machine readable part of ID cards and passports. If you start at a german company and they setup an e-mail address for you they'll use these rules too. The origin of the letters ä, ö and ü in German are ae, oe and ue, people just put the e on top of the vowel and they slowly transformed into what we use today. You can still see this in some names like Goethe.
I know that, but it's not what I meant, but probably didn't make it clear enough.
Germans know these official rules and maybe linguists, but if you present the typical English speaker with Möller vs Moeller it's confusing. Look at the media (who could, maybe, do some research?) who write Jurgen and not Jürgen. That's my point, the official rules don't help if everyone ignores them, for whatever reason.
In certain situations like cross word puzzle it wasnt usual to use "umlaut" but instead to write oe, ue and ae. In Swiss German people dont use ß.
So transcription was always a thing to know.
> Or French, or Chinese, or Swahili, neither of which I know - to stave off the usual replies of the "you only want it because you already know English" kind.
Yes and no. French sure, but in my experience of trying to learn Japanese, the difficulty is insane, much higher compared to the "western" languages.
in German there is an official convention for the non us-ascii letters, AFIK it predates computers and is rooted in germanic dialects developing differently and later stuff like non-german specific type writers, printing press machines, etc.
ä => ae
ö => oe
ü => ue
ß => ss
but one gotcha is that it's a fuzzy one way trip, some words, especially city names, can have a ae,oe,ue in their correct native spelling. Worse ss is a normal language building block in German which is pronounced differently then ß so writing it as ss is quite confusing for anyone which doesn't happen to know that it's correct spelling is with ß. To top that of some cases of ß spelling have officially changed to ss spelling over time due to people anyway pronouncing it more like a ss and getting it wrong all the time. And to some degree ß is semi-official abandoned by now.
The letters ß and ss are pronounced exactly the same, a hard 's' sound. Their effect on the preceding sounds is slightly different, though -- 'ss' makes the vowel short, 'ß' keeps it long. In all, a rather minute difference.
> if a lot of early IT wouldn't have been dominated by a US "works for us must work for everyone" approach I think we never would have ended up with such limitations common in legacy systems
Most of the early work was in English and the people buying these systems all understood English. Nobody back then had a problem with a lingua franca for aspects of tech because there were still people around who had to learn German to study science.
My first name in Italian is made of two words. That is not uncommon in Italy, but I usually write it as a single word otherwise non-Italians would assume that the second part is a second name, and address me with only the first part of my name, which I find somehow annoying.
Even better, when I was a kid and I got my first official document, the national healthcare card, the software in use did not allow a space character in the first name. The operator then decided to add a hyphen to enter the two parts of my name as separate words.
Fast forward many years, and I my first name shows up with the hyphen on the ID card and healthcare card, while it does not contain the hyphen on the driving license, which of course I got much later when software had improved.
Generally this is a non issue, nobody ever said that it's not me because there is a hyphen or a space in my name, however when I signed the mortgage for my house, the layer asked me to sign with the hyphen, and in the document he wrote both variants of my name with an A.K.A. clause.
I have three official first names but the second is the spoken first name — a practice not uncommon in Sweden. I use only the spoken first name for most things, but all three are present in e.g. healthcare and social security databases: where the spoken name is supposed to be underlined. But the underlining often gets lost when transferred between systems, and some systems even strips away all but the first.
My last name is spelled with a double-s by one side of the family, but with a single s by the other. I've been refused to pick up packages at the mail-parcel centre only because the spelling on the parcel did not match my ID card, despite having a notice slip with delivery number that had been delivered correctly to my physical mail address.
I have four names: two middle-names (one was added in my childhood, when my grandfather died). No bank is prepared to acknowledge this - I'm only allowed a single middle-name.
Four names isn't really a lot; some German aristocrats (or their descendants) have 6 or 7 names, and it's quite customary for Arabs to list a shedload of their ancestors in their name (e.g. Ahmed bin this bin the-other bin whoever).
> the layer asked me to sign with the hyphen
s/layer/lawyer/
That's nuts. If a signature is anything, it's the way you customarily write your name. Fortunately my signature is unreadable, and nobody could tell whether I'd written a hyphen or not. I suggest acquiring worse handwriting (good handwriting is almost useless these days).
The lawyer even insisted that the hyphen had to be clearly visible in the signature. I think that in those types of documents in Italy you are legally required to use a readable signature, because I bought/sold house a number of times as family grew, and every time different lawyers always insisted on this aspect.
> Even better, when I was a kid and I got my first official document, the national healthcare card, the software in use did not allow a space character in the first name. The operator then decided to add a hyphen to enter the two parts of my name as separate words.
On virtually every airline ticket for which I provided my first and middle name it was just concatenated.
Yeah, I have a two-word last name. It sometimes gets smushed together by the airlines, but they're inconsistent about it.
Lufthansa for example does this in a particularly annoying way, if I give my name as LAST NAME, it will automatically smush it to LASTNAME. However, if I then want to retrieve my boarding pass, it will only find my booking when I enter my name as LASTNAME, because the look-up does not smush things automatically.
In my country official transliteration to the latin alphabet has changed like 3 or 4 times over past 20 years. I had to change my perfectly valid and legal drivers license recently to make it match my passport. And I have a very simple first and last names, some people are still having issues with horrible mandated transliterations and mismatching documents. Fun.
About to say, why are they complaining about the lawyer being good at their job and possibly saving them (or their heirs) massive headaches and legal fees down the road?
On the other side of this it seems that every single airline disallows hyphens in last names and always either replaces it with a space, or removes the hyphen to have the last name all one word.
That’s absurd. So now every organization needs to be able to handle every script in the world?
One thing is if the database can handle Unicode, but what about employees? They now need to be able to differentiate Chinese characters as well as Sanskrit, Hangul and Thai script?
Of course not.
Airlines have pretty paved the way for everyone having an ascii encodable version of the name and this is so standardized that it’s even in your passport.
Why is it absurd? Either you accept customers or users with foreign names or you don't. If you don't, fine! If you do, then you should be able to store those names correctly.
I find it laughable that airlines are given as a shining example of operational excellence.
> Why is it absurd? Either you accept customers or users with foreign names or you don't. If you don't, fine! If you do, then you should be able to store those names correctly.
I think parent makes a point about employees. You can argue all you want that it is reasonable to accept and store any unicode character, but it is in no way reasonable to expect that the person on the other side of the glass to know every single script in the world, and every single unicode character.
Your position is reasonable for those people entering their own name, on their own device, which is configured to their own language - a system should not fall over on that - but that is not the use-case presented by the parent.
> I find it laughable that airlines are given as a shining example of operational excellence.
They're certainly doing it better than anyone else, IME.
The use-case is "employee has to enter the name, on their device, with their keyboard". What's your better alternative, the one that makes you use words like "absurd" and "laughable"?
After all, this use-case is not going away anytime soon.
Whether we like it or not, having a name with uncommon characters is going to make your life difficult in those cases where your name has to be entered on a device and keyboard that is non-native to you.
From a pragmatic perspective, they do a great job.
The point of identification is to identify people to some level of trust. In the vast majority of cases, that means that I should know that you’re the same person that I did business with yesterday.
Airlines need to tie you to an official document, which is much more complex. They do a pretty decent job at it considering all of the stakeholders who make that happen.
The hang-wringing about American hegemony is a projection of some other nationalist feeling. The constraints of Hollerith cards made it difficult to accommodate different character sets and the accommodations are codified in international treaties and business process. It will improve over time, probably first in the more cosmetic CRM side.
Accepting customers and accepting character sets or glyphs are two very different things. Accommodation is a two way street - if your name on the Starbucks cup is in Arabic or Greek, the barista in the US isn't going to be able to call it out. That not because the barista is some ignorant rube, they just don’t speak Greek.
The magic is we have a global system where many people have the ability to step in a plane and go almost anywhere and immediately conduct their business or pleasure with minimal friction. One of the friction points are issues like character sets, or poor accommodation of long names, etc.
I can't read Greek. I think it reasonable to demand that transcribed Greek names are ok in my hypothetical restaurant, but not reasonable to demand alfa omega to be ok, for your dinner reservation. You don't have to support any script.
> Why is it reasonable to you to demand that people write their own names differently?
Because sometimes other people have to read it. Do you actually expect your bank has someone who can read Νίκος Καζαντζάκης and 刘慈欣 and ᠲᠠᠲᠠᠲᠤᠩᠭ ᠠ and משה ברבי מימון הספרדי and ᐱᔭᐃ ᐊᕿᐊᕈᖅ? (Or even notice which one Chrome doesn't render correctly? — Edit: I probably shouldn't blame Chrome; it's fine on his Wikipedia page.)
Who says you need to be able to speak Thai? I'm saying you should let people store their names in your system the same way they were given them at birth.
I really don't see what's absurd about that. This website is so English language biased it's hilarious.
Btw we're talking about banks not Starbucks or mom and pop shops
> This website is so English language biased it's hilarious.
As is most international websites. And systems. And content, in general.
It's a sad fact of life that, as we sail into a globally-connected future, the world is going to consolidate on a small number of languages, and the majority of languages are going to be left behind, discarded, and eventually die out.
People want content. They will learn whatever language gets them the most content. Right now almost all content is produced in a small handful of languages, with (in the west) English being dominant.
At this point it looks like English[1] is going to be in that small set of surviving languages.
It's inevitable. Railing against it is a pointless waste of energy.
[1] My home country has 11 languages, all official. Until widespread internet arrived it was common to find locals who could not speak English. Now, I'd be hard-pressed to find non-english speakers, even in the very outlying areas.
So what exactly do you expect the bank employees to do when they see a name in a script that looks like gibberish to them? They can't say the name aloud, they can't verify the customer's name matches any documents (if you don't know the language, you're not competent to verify names in the script), the name won't match any official government documents, so what's the point?
So every person needs to be able to read every single script? Cyrillic? Greek? All the variants of CJK characters' reading in people's names? Practically ASCII is enough for information interchange and I say this as someone from a country that doesn't use Latin script. By all means record "name as spelled by person in native script" as a completely freeform string, but in practise we also need "name as spelled using ASCII so everyone can read it".
The good thing with ASCII is that everyone, everywhere can type it in using a standard keyboard. Everyone can read, write, speak and hear the basic Latin alphabet.
Not even an English/US thing, this is the Latin script from Roman times.
OK, we can do away with the currency symbols and brackets, and limit it to Base64. Little Bobby Tables, and perhaps a few musical artists, may need to find an alternate spelling.
I assure that billions of users can't. Even if they have ASCII letters printed on their key board, which is not a given, they will be unable to find that 'f' anywhere, just like you won't find most lower case Cyrillic letters on a Russian keyboard.
And even if they know how to type in all 52 letters, it's absolutely not a given that they are able to transliterate from their native script to the English alphabet.
"Everyone, everywhere" does not have a "Latin" alphabet as the basis.
You really underestimate how widespread use of Latin script is, especially in computers. People cope just fine with domain names (IDN is generally still rare), foreign brands, names of famous people etc. Sure your central chinese rice farmer living in a remote village might only be exposed to chinese script but once to get to someone familiar with a computer they likely will be able to cope with ASCII text just fine even if they don't understand the meaning.
Not sure if this is still the case or only some legacy stuff, but at least for a while chinese websites used numbers instead of names, China Railway for example uses 12306.cn as their domain name.
Where do you find these computers with keyboards that do not have the full set of ASCII letters printed on them? ASCII and Latin script are the lingua franca, they're not hard to learn and it's not unreasonable to ask that everyone who wants to use computers learn them.
That's the most Ameri-centric thing I've read all day. I assure you that:
- There is no such thing as a "standard keyboard"
- Not everyone can type Latin alphabet
- Almost nobody can read, speak, hear or write latin, and it makes no sense to "hear latin alphabet". How do you pronounce "Bordeaux" using only your knowledge of the Latin alphabet? How do you pronounce "Queue" using only your knowledge of the Latin alphabet? You realise native speakers of various languages will pronounce words (for example "pain") and even letters (for example "w", "j", "y") very differently? Etc. My name is pure ASCII and every foreigner pronounces it very incorrectly (even though all the sounds exist in the English language already, just mapped to different letters).
- Not every country is related to the Roman culture.
Anyone can listen and understand when a person spells "Q U E U E", even if they don't speak English, so long as they share a common (spoken) language. Not so with "列"
Wait, so anyone can understand written English even if they don't speak English? So if I type in German you can understand it even if you don't speak German? Wie funktioniert das?
People can spell out words in a language they don't understand, as long as they understand the letters. This is good for names. People can only do this if they understand the letters.
They don't have to pronounce those: they can just spell out the letters: M-E-B-D. It's not that hard. But if the customer's name is "坂本", how exactly is the bank employee supposed to say that?
> Everyone can read, write, speak and hear the basic Latin alphabet.
Let’s set aside everyone which is illiterate. There isn’t even a single way to speak the Latin alphabet, as pronunciation depends on the language. I have a hard time believing there aren’t people who can only read and write in their native non-Latin alphabet.
I don’t know what you mean when you say everyone can hear the Latin alphabet. Listening to sounds has no relation to the language spoken. I can hear Korean just fine, doesn’t mean I understand the meaning of the words or understand their alphabet.
> Not even an English/US thing
It is common for languages which use the Latin alphabet to have diacritics and characters not present in ASCII.
The Latin alphabet does not have a single pronunciation. And the vowels are ambiguous across the most common languages so you can't even make an argument that they are intelligible.
I instantly conjure up an image of a Roman citizen, a retired soldier now, speaking Vulgar Latin picking up the phone and dialling Comitia Centuriata to lodge a complaint about a unpaid engagement in Punic Wars.
No. This is in the EU, and specifically Belgium. The claimant's name probably uses characters that are used in French as French uses diacritics and is an official language of Belgium.
So I suspect that the ruling really is that the bank should use a system that allows to correctly write the country's official languages, which seems quite reasonable.
Now under EU laws this would probably extend to EU languages, with the likely caveat that, I think, different alphabets are expected to be transliterated into the local one. E.g. Greek names are expected to be transliterated into the latin alphabet in Western Europe.
I am pretty sure that there is no expectation that people can use their names written in, say, Chinese characters as that is not reasonably legible.
> Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement.
You can just put a note on the account: "Actual name contains accent acute over second e of first name: our system is unable to render this. User is quite sensitive about this so apologize again for our inability."
Not every organization would require this, it depends on the processing. A bank would since it sends you a bunch of correspondence. A restaurant taking a reservation wouldn't if they wouldn't save and sell the information or use it for marketing.
I find it mildly fascinating, because the parent's argument boils down to: it is more work. As other have already pointed out, airlines may not be the role model for this. More to the point, it likely is better to have actual individual identifiers assuming identification is the actual goal.
Naturally, maybe the goal is just to get this train moving somehow.
Airlines are probably the laziest of them all. You should always store the original name. If other foreign people are supposed to write or read it, then ask passengers to write in their Romanized name as well. Both should be displayed, but only the original should be used for "ID" purposes.
It is not clear whether airlines hold travellers hostage to spelling sorrows and miseries, or the airlines are being held hostage to the spelling sorrows and miseries by travel reservation systems.
The airlines get their data feeds from the travel reservation systems – the ones we (or the travel agent) interface with via the airline website or a dedicated web portal. There are two major global ones, Amadeus and SABRE (there are other ones as well). I do not know how any of them interconnect, though.
There have been numerous attempts to modernise both (including rewrites from scratch), and all the attempts have failed so far due to the complexity of the logic: calculating the most optimal travel time for connecting flights in a multi-leg trip is, like, very hard, apparently + applying a correct airfare prices based on the selected itinerary and a myriad of other variables. Subsequently, there has been the decades' worth of growth of the said logic – there was an epic story on here some years back about one such endeavour (I think it was an Amadeus rewrite).
From what I remember, the character encoding was not even considered to be a problem in that endeavour – as in «acknowledged, is trivial to solve, now let's move on onto the actual problems».
I'm Russian and I have to live with three similar names: my real one written in Cyrillic, the stupid transliteration into Latin I have in my international passport (and by extension everywhere I show it abroad), and Gregory when I introduce myself to someone who doesn't speak Russian.
Will a bank in a country that doesn't use the Cyrillic alphabet agree to write my name in Cyrillic? I'm 99% sure no. So, organizations fully supporting the writing system of the country in which they operate is a reasonable expectation. But I never expect anyone abroad to agree to write my name in an alphabet they can't even read.
Tangentially, Turkey renamed itself in English into "Türkiye". This suffers from the same issue — there is no letter ü in English. No one knows how to spell that. So it's no surprise the new name didn't stick.
I don't really care how I'm identified, provided I know it's me.
But the security problems are real and very annoying, even with a "normal" name. For some silly historical reason, my middle name has two different spellings depending on which official document you're looking at, which causes no end of problems. Conversely, I know people with very simple, common names, the kind where two people born in the same city on the same day have the same name, that suffer a load of different but equally frustrating issues.
These problems, however, aren't just limited to computer systems [0]. Border officers, bank clerks and government officials and other general bureaucrat types are all just as bad as a strict strcmp implementation.
The issue is that many people operating in an official capacity seem to work under the assumption that a name is a globally unique identifier that has exactly one representation, but that's not how names work in any culture.
On the other hand, if not names, then what? Combined with birth dates, it's the closest thing we have to a globally recognised unique immutable identifier, and any better alternative is going to feel invasive and face a lot of opposition on privacy grounds.
In Germany, it is intended to be hard to change your name; in the USA, the situation is different. Also in Germany, if you change your name voluntarily (without having a very good reason (CLARIFICATION: of course marriage is a good reason)), this is considered to be a strong sign that you deeply hate your parents (and did the name change because of that). Indeed, the people who I know who changed their first name to their middle name exactly did it because if this.
I am also German. The only people (in Germany) who I know and did a (forename) name change did this because they hated their parents, since changing the name is an open signal that you deeply hate what the parents did to you your whole life (including giving you your old name).
Here in Flanders some do, most don't. Why would they? They are as much a person as their husband, with an identity and family that is not any less valid than their husband's.
My point isn’t that they should (my wife didn’t) only that it’s very common for them to do this so thinking of names as mostly immutable is incorrect even where other forms of name changes are rare.
This is in fact one of the iconic places where lack of diversity in software teams cause problems. Men may have a blind spot around name changes that women don’t, because of their lived experience.
I rather had forenames in the back of my mind when I wrote my post. But indeed, because it is hard to change your name in Germany, I know quite some women who used marriage to get rid of a surname that they did not like, because there is hardly any other socially accepted way to get rid of it.
How the state identifies you, is one of those things that require a single source of truth. Civil registry is usually considered the source of truth, as in that everyone that doesn't agree has to agree with it.
“In the United States, vital records such as birth certificates, death certificates, and frequently marriage certificates are maintained by the Office of Vital Statistics or Office of Vital Records in each individual state. Other documents such as deeds, mortgage documents, name change documents, and divorce records, as well as marriage certificates for those states not centralizing these records, are maintained by the clerk of court of each individual county. However, the term 'civil registry' is not used.”
I think that means your marriage may be documented in county C1, your subsequent divorce in county C2, and your name change in county C3, none of which need be in your state of birth.
Ask anyone with western name who moved to Japan (or a myriad of other countries, but I am in process of doing the same right now so it hits close to home) whether the state has a single source of truth for their name.
Yeah, some people think about their heritage as not compatible with the brave new world, so get rid of it. I know people giving their child dedicatedly ascii-only name common in the USA to make sure their children won't have a "weird name" when going to the USA.
> Yeah, some people think about their heritage as not compatible with the brave new world, so get rid of it. I know people giving their child dedicatedly ascii-only name common in the USA to make sure their children won't have a "weird name" when going to the USA.
And? That's what I did - a simple name that he won't have to repeat, spell out for English speakers, not have to spell out for many non-english-speakers, or have trouble non-latin scripts.
Anything you can do to make your child's life easier trumps any value you think they might get from maintaining cultural or traditional links with a mostly dead past.
I legally changed my name about twenty years ago, after experiencing multiple days of wasted effort dealing with systems that a) couldn't render my name correctly and b) couldn't cope with fact that their incorrect rendering of my name might not match other legal documents that correctly render my name, or incorrectly render it in a different way.
This is the kind of compromise I'd never take. (Yet still a million times closer to my world view than the decision I mentioned. Guess I'm stubborn in some topics.)
In some places there is an officially acknowledged list of valid given names (for citizens born in the jurisdiction, eg. Hungary), and that format of the name might not be on it. Elon Musk would be in trouble here naming his children.
It is a debatable practice, yet I think this is somewhat protecting the children from silly decisions of parents (some example from Hungarian tabloid media where parents were outraged: Fradi / Fradika (from the slang name Fradi of the FTC football club, without and with diminutive appendix), Traktorka (tractor with diminutive appendix), various names from soap operas, WoW characters, Shrek, Satan.
So think of the children! :) But really, this is the other end of the spectrum. And this (giving especially weird names to children) is a fine recipe for making the child subject to bullying in my opinion.
I can accept this, but who decides what is too silly? Like here it is already clear from the allowed names list (reviewed, extended every year actually, and contains some quite silly entries already in my opinion.)
Also we have the idea of "nameday" which is minor occasion for celebration and giving small gifts every year, based on the person's given name, and the calendar contains the mapping between names and dates, one without an official name would be in trouble getting that free a bottle of wine :D
Well, it's not so much about 'silly', silly is fine, Musk's names would be silly, it's more about ensuring the kid won't have a guaranteed bad start to life with a crappy name like "hatesjews". On the other hand if you call a kid 'boggle', they might get teased, but might own it and be super cool and love it.
IMO only the names that would be damaging beyond a reasonable doubt should be prohibited.
Growing up in Ireland in the 80s/90s, the only Joel we were familiar with was “Billy Joel”. On Irish radio, his surname was universally pronounced as two clearly distinct syllables: Jo-Elle. It wasn’t until I was in my thirties that a work colleague from another country pointed out that we were all saying it wrong.
Even the Icelandic singer, Björk sometimes had her name pronounced more correctly by DJs. I guess the diacritic was a big clue that the vowel is not pronounced like the ‘o’ in “fork” (the most common way her name was pronounced) and pretty much everyone knew that the ‘j’ sounds like a ‘y’ in English.
I have been working on obtaining Greek citizenship, and it has been crazy dealing with the documentation from my late grandfather who never updated anything upon moving to America. There are two ways to spell his surname depending on if the documents come from Greece or Albania (due to borders changing between countries since his birth). When he was in the process of immigrating to the US through Ellis Island, his last name was converted to Greeklish (English alphabet representing Greek characters) on all of the manifests. This resulted in a few mixed spellings. At Ellis Island they ruled his name would be too confusing for Americans, so they entirely altered the spelling to make it phonetically more pleasing to an American. Fortunately, his name was short enough where the new name sounds the same as it does in Greek. My grandfather proudly took his new name and enjoyed his new life! Many immigrants from Greece had their names shortened and completely altered. To a Greek, they are incoherent gibberish with no meaning.
Then there is the issue where women’s surnames are conjugated into a feminine ending, so women have similar but different last names than their husbands. The US didn’t like this and didn’t understand this practice for a long time. I know a few women that have dual citizenship and maintain a passport for each country with a different spelling of their last name. Updating documentation in one country to the other causes all sorts of issues with this setup (since everything requires a translation via an apostille). My friends recently got married in Greece and then separately again in the US at a courthouse to avoid all of the paperwork and translation pains from the woman’s Greek maiden name differing from her American maiden name.
Many of these issues are easier to deal with now, but they have a long history of confusion and extra effort going back more than 100 years. Safe to say, I’m relying on documentation from my other Greek grandfather where things are more straightforward for pursuing my citizenship.
Question for you as this seems to be your lived experience: how would you correct the issue stated in the article? They give an example of characters like "á, è, ô, ü, ç". Say your name included a "ç" and you're at a movie theater trying to get your call in tickets. Their keyboard notably does not have a "ç" key. How do you guide them toward success?
This story happens in Belgium. All national languages(Dutch, French, Germans) have plenty of accented characters. This EBCDIC variant was never a valid technical choice for the country. Sad that we needed the GDPR to finally whack some sense in this bank.
I worked in a similar national organization. I fought for having a newly built systems in 2015 created in anything different than cp1252 and ISO8859-1. I lost. The architects flat out forbade UTF-8 and required 1252, in the name of consistency.
In Dutch we also have the letter IJ which is part of my last name. But ASCII (nor EBCDIC) don't have it so some family members have changed it to 'y' and some are using 'ij' to make up for that. So on paper our family lineage is now split because these changes have meanwhile made it into birth certificates and legal names and it's so ingrained that it can't really be changed any more.
In Italy, in Veneto (the region around Venice) the local dialect tends to omit the last letter of a word if such letter is a vowel.
Because of that, my grand father and his sister happened to have different family name in their ID cards: when they got registered by the priest at the beginning of 1900, on a paper book of course, the priest who registered the sister added an "I" at the end of the family name as he thought that that should be the right spelling in proper Italian. At that time ID cards did not even exist so nobody bothered.
In Greece there are different male/female variants of the same surname, so it would be normal for a married couple to have slightly different surnames.
e.g Papadopoulou vs Papadopoulos.
Then there are the cases where a Greek wife takes the male surname form to co-exist more easily in other countries where it is expected for a husband/wife surname to match (or if different, be more considerably different).
I think most Dutch people nowadays think of that as two letters: i-j. It's called a "long ij", as a counterpart to the "short ei" (same pronunciation), which definitely is considered two letters.
The only place where the difference between 1 letter and 2 letters still is visible is in names that start with a long ij, because of the capitalization. E.g. the name of the city IJmuiden is written like that and not as Ijmuiden.
Yes, it's changing. Fortunately 'my' branch of the family picked 'ij' so we're in the clear ;) But for some of the older members it is causing all kinds of trouble, especially when interacting with the authorities, more so if one set of documents uses 'ij' and another through accidents of history uses 'y'. Some schools even teach 'ij' as the 25th letter of our alphabet, when they really should be using 'y' for that spot and 'ij' as a two letter combination.
Also, when spelling words people use 'ij' as a single letter for instance a Dutch person would spell 'mijn' as 'm', 'ij', 'n' (and they would likely not say 'lange ij' because there is no word 'mein' in Dutch.
It also causes no end of trouble with storing and searching because a Dutch person might expect IJ to be sorted after 'X' but instead it appears sorted as the combination I J . The fact that different reference works use different methods doesn't help either.
Ahhh interesting, I didn't know it was an entirely separate, single letter! I was in Amsterdam a few weeks back and saw a lot of signage featuring "ij" had it stylised with the "i" sitting just above the tail of the "j" but never considered that it was just one letter.
Czech has this too, sort of, where "ch" is considered one letter but it is composed of two separate characters.
Ha, yes, I have an 'eij' in my last name and constantly explaining it's both the short ei and the long ij is maddening. In reality of course, it's just an e followed by an 'ij' but yeah no never mind.
To be fair, digitally, IJ is simply spelled using the separate letters I and J in Dutch, despite it being a digraph. That codepoint you used is deprecated in Unicode and only there for historic compatibility reasons. Your family members using 'ij' are simply applying the correct orthography, with those using 'y' just digging themselves into a hole.
I don't know what the correct codepoint is but it certainly isn't deprecated in the language and that's the bit that counts, historic compatibility wasn't part of ASCII and that's what caused this. 'ij' takes the same spot as the Greek letter 'y' in the Dutch alphabet and that's what stops this from being resolved to everybody's satisfaction. "Those using 'y'" -> people who didn't have a choice in the matter because the official that made the change did so without their consent in a couple of cases and once it is on your birth certificate good luck trying to change it retroactively across all of your documentation.
You are confusing letter with codepoint and glyph. The IJ is one letter, which consists of two glyphs, and in the Unicode implementation two codepoints. This isn't some historic oversight in Unicode, it was implemented like this based on the Dutch orthography.
Yes, this can lead to bugs in software, but so can anything related to names.
It is deprecated in Dutch to use a single codepoint for the IJ. That was never really an option in any of the character encodings in popular use.
The fact that it is one letter is relevant in cases like (vertical) lettering (which most designers nowadays fuck up), in typography (the number of fonts which make ij look awkward and unaligned is huge), and in collation and sorting using a Dutch locale. I will defend its proper use and treatment where possible, but representing it as a single codepoint is not a sensible goal, and never was.
I do not believe this to be correct. Wikipedia says it's a digraph of two letters. It does say that the codepoint is deprecated, but it's only defined as "compat", not as deprecated in the unicode data.
It is (sometimes) one letter culturally speaking and in lettering. A Dutch alphabet as taught to children used to end in 'X IJ Z' instead of 'X Y Z', although this is no longer the case ever since people started eating 'yoghurt' in the twentieth century. In capitalisation of words too it is treated as a single 'letter' (e.g., 'IJsselmeer' for the lake; note the uppercase 'J').
If you dig deeper on the Unicode website you'll find that the reason those codepoints are included is compatibility with 'certain very rare legacy (non-Unicode) character encodings'. They are not 'deprecated' as compatibility characters for those old legacy encodings, but 'deprecated' as suitable for rendering Dutch text unencumbered by those early code pages.
Words have meanings. "deprecated" has a very specific meaning in Unicode and it does not apply to this code point. A lot of characters we use on a daily are marked <compat>.
This is not a codepoint in daily use. It never was outside of those few legacy encodings. If it is not deprecated, it is not so because it never was in common use in the first place. The concept of the ij/IJ as a single codepoint is deprecated, regardless of the technical classification in Unicode.
If you are claiming that 0x0132 is a codepoint in common use or required for correctly spelled Dutch, you are mistaken.
> If you are claiming that 0x0132 is a codepoint in common use or required for correctly spelled Dutch, you are mistaken.
I made no comment in support or opposition of that. I cannot talk to that as I’m not familiar with either Dutch or that letter. However there are many compat characters in Unicode and there are incredibly few deprecated ones so I was addressing the deprecation claim (and what I believe is a misuse of the term letter). You’re interpreting things into my replies that just aren’t there.
Compat characters are very useful and even if they are not stored all the time, they often show up in text processing in memory for better glyph selection.
> The Unicode Standard encodes these two compatibility characters [0x0132 IJ and 0x0133 ij] to provide support for roundtrip conversion of the Dutch letter 'ij' in certain very rare legacy (non-Unicode) character encodings. It is strongly preferred (and far more common) to use the two character ASCII sequence 'ij' to represent this letter instead.
You can dig in the Unicode mailing lists for discussions on this from over twenty years ago. The bottom line is that you shouldn't use 0x0132 and 0x0133 in modern text. By now this is a resolved issue.
I'm not sure this issue can ever satisfactorily be resolved by ignoring it. Yes, it is customary even among Dutchies to just type i-j because it's easier, but the Dutch alphabet does have a dotted ij as its 25th letter. It's not a ligature in the source language, it is a single letter; as evidenced by the fact that the proper noun IJsselmeer (among others) is written with two capital letters (if not using the single 0x0132 IJ). The letter y does not exist in the Dutch script, only in loanwords (of which we have plenty).
Stop repeating that nonsense! The current Dutch alphabet has 'Y' as its 25th letter. Not 'IJ'. This is a historical thing.
The 'IJ' is a letter, culturally speaking, in the sense that it is capitalized as one, and that it is often rendered as a single unit (which you can see in vertical lettering, if done right), and sorted as if it was a single letter. In terms of character encoding however, it is a 'i' followed by a 'j'.
Swiss story 1: immigration authorities insisted adding my wife's birth name to her official name because "that's how it's done here". 20 years later it's still popping up here and there causing confusion.
Swiss story 2: I was outvoted having an application properly internationalized, they dumped instead all resources in extra database columns because "this country will only ever need three languages". Fast forward, it took one year to get it in resources because somebody high up decided English and Rumantsch wold be needed too and to rewrite all those language-related SQLs turned out overkill.
I had a professor with a foreign name from someplace that didn't use the Roman alphabet. His credit report had like 30-some-odd spellings of his name.
When asked "So what is the correct spelling?" he snarkily shot back "You tell me."
It's not a standard western name. It wasn't originally written I'm English. Attempts to translate it to English -- well, they are guesses on how to spell what someone thinks they are hearing from a language that the Roman alphabet isn't designed to capture.
Yeah I just bought a new home in Birżebbuġa, Malta and you bet every single piece of official and unofficial documentation has it as Birzebbuga. No one wants to deal with this.
> "No, I'm sure I'm in your system. Try this other spelling."
Every Turk, Russian, and Serb enters the chat.
Transliteration without deterministic, reversable fidelity sucks.
PS: I have a "normal" given name people spell 4 different ways and surname people spell 3 ways. ]: I always spell them out. Just be glad your last name doesn't have 4 "y"s as their legal American name like a former coworker.
I see you haven't interacted with companies in the US much. I write my name on an online form, a minimum-wage employee reads my name from a screen and types it into another application, and gets it wrong. Now the online account and the actual billing have my name spelled differently.
U.S. concept of automation is that a cheap employee does something in the back office..
Even if you are 146% careful, there is still a chance that your own country officially renames you by introducing new rules for transliteration between its alphabet and the Latin one. Happened to me in 2020 - the name I use online matches my old passport, but not the new one.
I live in China. According to my bank, my full name is LASTNAMEFIRSTNAMEMIDDLENAME. All caps, no spaces.
When it comes to the amount of ass-pain this can cause, that's just the tip of the iceberg.
Sometimes it's funny though. At hospitals, there's an automated system that announces the next person by name (GDPR? Privacy? What's that?). For non-Chinese names, it reads the name letter by letter. So now everyone in earshot knows there's a foreigner around somewhere.
That is a very narrow view. You assume that your set of characters is appropriate for representing names in the entire world. Why this specific set of characters? Cyrillic or greek could work just as well, they have a limited set and no diacritics.
I don't even have a name that's very "weird", I have an Irish last name, in Boston, and the O' part is something no computer can reasonably tolerate in 2023. I've missed flights. I've had duplicate accounts because some places drop the apostrophe and some don't, I've been given badges at tech conferences where its O' - it's infuriating.
This hits so very close to home for me. I'd rather prefer that certain things stays in ASCII, despite my name and surname are not originally written in English.
This problem is too well-known to everyone who have their original docs written in non-English alphabet (read - everyone with diacritics or cyrillic script or hebrew or heaps of other non-Latin-based writing systems).
To add insult to injury - this is not even about some common transliteration tables from non-Latin to Latin script. Sometimes? rules of transliteration are changed based on local govt whims and if you got caught in such bureaucracy - all hell let loose. You can't obtain a document with previous "latin spelling", because it's changed in a system. This way you may have a ticket with one spelling and passport with another. Or, like in my case, my brother and I have different surname transliterations to English. And my first name would be spelled with 4 different letters than it's in all my other documents if I have to issue a document from scratch. Good luck explaining all this nonsense to anyone, who is trying to "character-match" someone's hard-to-pronounce surname from official document. Even worse when they would try to re-type any of UTF-8 characters without knowing a correct character codes or not having corresponding layout installed (most of the time in international airports and customs).
So, I'm personally fine with something is incompatible with GDPR, as long as certain systems would stay ASCII as long as they can. We'll be opening a whole new can of worms if internationally used documents would be in UTF-8, imo.
So the bureaucrats are winning (as they always do).
Demanding that your name is written down in a specific way is just stupid.
I always found it elitist when people insist on a specific spelling. Typical examples are "Philip/Phillip/Philipp", "Stefan/Stephan", "Harald/Harold", "Erik/Eric/Erich", "Michael/Michel/Mikail" and so on... all these variations refer to the same name. Different dialect, different language, the spelling varies. So what? It's still the same name!
IMHO, the cited Article 16 doesn't necessarily demand a rectification of the spelling. This is about meaning, not syntax. But that's surely up to interpretation.
Missing context in the blog post: in the original court document, the bank mentions that it is already upgrading to a new system that doesn't have this issue. It just isn't done yet. So the bank isn't really refusing to fix this. They are just saying they can't fix it right now.
Also, the idea in the blog post that everybody can simply choose not to use EBCDIC, is a bit naive. For example, the Visa protocol for payment messages contains message fields encoded in EBCDIC. This is a specification implemented by thousands of Visa member banks and various types of intermediaries. You can't "just change" anything in a spec like that without a massive amount of planning and pain.
The document states that the modifications to the system should have been made in 2020, but in 2021 they still couldn't spell the names of all their customers right, and the bank refused to provide a concrete timeline for the fix. The project that was supposed to fix their naming problem ended up being too complicated to implement and they stuck with their old systems.
International VISA transfers weren't the problem here. Bank statements and online environments carried the wrong name, and those had to be corrected. No sensible court would make a bank disconnect its payment systems because the receiving end uses a shitty system, but when it comes to communication between the bank and its customer, there is no such constraint.
Furthermore, EBDIC had already been extended to support the problematic characters long before the lawsuit took place. "Yes but EBDIC database" wasn't a good excuse because EBDIC itself could handle this specific edge case just fine.
By the time an investigation was done, the bank had made advancements and the name of the original plaintiff was representable, so they didn't receive a fine on that count (though they ended up receiving an impartiality related fine for their privacy officer) according to https://ellentimmer.com/2022/01/12/banken-7/
EBCDIC was obsolete in the 90s, and people in Europe have had names that EBCDIC couldn't spell since at least then (technically true! :D).
The issue is that whatever bottom of the barrel priced contractor they used gave them a "cheaper" new system in 1995 that was already obsolete. Now given the era, they could have made the choice to use wide chars instead and it would have been a reasonable choice (wrong in the long term, but still reasonable - unicode was only a few years old at the time), and wchars are easily (ymmv) extendable to utf-16 without having to re-encode everything which is the problem they have with EBCDIC.
I agree on the object level that banks really ought to spell our names right.
However I strongly doubt that we should force them to do so by law. And I am outright against using something called 'General Data Protection Regulation' to force internationalisation on organisations.
If you want to force an organisation to support a sensible character set, please make a law or regulation that explicitly demands that. Instead of sneaking it in via the backdoor.
Scope creep is just as bad for laws as it is for any other project.
The law clearly and unambiguously states that inaccuracies must be corrected by the data controller. That their tech stack doesn't support internationalization, rendering them incapable of fixing inaccuracies, is irrelevant (and their problem). This isn't a backdoor, it's the law working as intended.
> This isn't a backdoor, it's the law working as intended.
I certainly think the lawmakers' intention was that for example, if someone called Stefan has had their name mis-entered as Steffan, data controllers must correct the inaccuracy. So I agree correcting names is within the scope of the law.
But if some comedian decided to legally change their name to "Baron Venom Balrog Sabretooth Vader Megatron Vegeta Robotnik Magneto Bison Sephiroth Lex Luthor Skeletor Joker Grind" and declares none of it can be omitted - I'm not sure lawmakers intended to require all data controllers to support that? Envelopes are only so large, after all.
So I think legislators intended to make some allowances for the limitations of computer systems.
Most, if not all, EU countries have laws on personal names. In my country if a person has a name and surname that are comprised of more words, they need to choose which of those words will be used in legal documents. So in your example, depending on comedians choice, all data controllers would have the comedians name as "Venom Bison".
Strong agree. Competition around personal banking is notoriously weak. If it were stronger the banks would have a strong incentive to offer a good customer experience, but without it they'll stick with an old system that is "good enough" because that is cheaper than replacing it with one that actually offers a better customer experience. Well, now they have a strong financial incentive (fines) to make it better. Good job GDPR.
Even if there was competition, banks, as an industry, are given a monopoly on a public service. Just like we shouldn't let the water company does whatever the f*k they want, but hold them to standards of service, the same is expected of banks.
If my bank doesn't support the letter A in names, they can't spell my first or last name correctly. Why shouldn't the bank—in the year of the linux desktop 2023—be forced to fix this? We live in an age where any character is easily representable in multiple encodings.
Wrongly spelling people's names (regardless of the reason) increases the risk of error and fraud. It also makes it difficult or impossible to request that information about you be removed or turned over if you wish to request it (how are you supposed to request information about yourself if their records have your name spelled wrong, and there's no way to correct it?). Just because the fix here is a technological one doesn't mean that it isn't a problem.
Representation is one thing, but for the user, presentation matters; the user in the article wouldn't have been happy if the bank had stored the diacritics but still communicated without them. I'll bet that 99% of systems that happily handle a Russian or Korean name will store a Mongolian name correctly but utterly fail at printing it.
I don't. It's clear that if there's no legislative push to make them comply with common sense they'll just keep using the systems they set up in the 60s forever, regardless of the incompatibilities or security risks it poses. Upgrading to a new system costs money, aka the only thing they care about. So having some heavy fines ready to incentivize complying as the cheaper option is required.
Customer choice won't make any a difference on this sort of issue since it only affects rare cases. Especially when this sort of news is published with them being called "bank X" and no way for anyone to know which scumbag bank this is so it can't be boycotted.
Does it really count as "internationalisation" when the names in question are common local names that have been around for longer than the bank probably has?
> I agree on the object level that banks really ought to spell our names right.
That is even more so as it spills into CC verifications, so bank limitations “infect” many payment aspects. I had that issue because I moved country and my original bank had no support for customers with a main residence outside the country, so the entire billing address had a weird-ass setup to be fit (a nonsense zip code and the actual country of residence filled in an ancillary field), and using the correct address for billing would fail their checks.
> However I strongly doubt that we should force them to do so by law.
Why not? This is literally a generations-old issue, clearly technical and social pressure has not fixed the issue.
> If you want to force an organisation to support a sensible character set, please make a law or regulation that explicitly demands that. Instead of sneaking it in via the backdoor.
There is no backdoor, that’s a front matter of the law. While the safeguarding of data is the most well known prong of GDPR, it has always included a right to accuracy and rectification, quoth Article 16:
> The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her.
And Article 5 1.d:
> [Personal data shall be] accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purpose for which they are processed, are erased or rectified without delay
I found it genuinely bizarre that my old UK bank's international money transfer system could not handle such wildly obscure international letters as ä and ö in my new address outside the UK. This was perhaps five years ago so absolutely inexcusable.
I am not suggesting that ine law be applied to remedy the transgression of q different law. I meant that banks are "forced" to be accurate by multiple laws, and it's in the bank's interest to fix its encoding challenges.
I wouldn't mind not caring about regulating banks, had we not made a deal: banks are given a monopoly to be a (quasi-) public service, which in effect, is a license to print money. In return, we require them to provide that public service to most.
We could always nationalize banks (which may not be a bad idea,) but as long as they're a public service, they should behave like one. And if the easiest way to do it using privacy laws, so be it.
they are already public service-ing through KYC and SAR (suspicious activity report)
I have brutal firsthand experience of how compliance cripples commerce (simple wire transfer between EU member states 2 months of oops wait a bit compliance).
No, the public service is having a banking system.
KYC and SAR are part of the costs we ask the banks to pay for getting the almost-literal license to print money. Just like we can require them to use people's correct name.
pardon my irony. what I wanted to point out is that the public service thing the current retail banking system does is surveillance, and it's very effective at it.
I'd like to say that the banks argument is only valid if they're being lazy. AS/400 (now System i) can support Unicode through extensions. They may need to rewrite portions of their code, but it's not impossible, and in the last few decades is inexcusable.
Having worked for 20 years in fintech, I have seen many large European and Asian banks running on COBOL programs written 30 years ago, using EBCDIC. In many cases the sources for some of those programs are lost, due to the fact that proper version control systems did not exist at that time, and nobody dares to re-implement them because they are scared of regressions.
Yep, this is true, but I spent years working on AS/400 systems dealing with this exact issue. The thing is that the data isn't stored on a standard file-system. The record definition can be updated, in most situations, without modifying the application code very much at all. This isn't like an app that most people these days have experience with. IBM did everything they could so, if you followed the guidelines when the code was written, it was minimally painful to fix stuff like this in the future.
Everyone (including me) loves to give IBM shit, but they do actually know what they're doing when it comes to this stuff. There are many lessons and examples from mid-range and mainframe servers from IBM that are either ignored or re-invented in modern tech.
If I remember correctly, in 2010 or so Bank of China was using EBCDIC + a specific codepage by IBM to store successfully Chinese names in COBOL systems. Communication with those systems would happen using IBM MQ Series, a message queue system.
The codepage was very specific, it was not part of the standard Java JDK but it was included in the IBM Java JDK.
Small nitpick: they have been maintained since 50 years ago. Very few programs written 50 years ago have spent the last 10 years without some sort of patch.
Usually, though, some more modern system is retrieving data from that legacy system. The modern system even probably already maps additional data to the records kept there, so adding real_name would be a known pattern.
I think this is a key point. Old tech running perfectly fine; if it ain't broken don't fix it.
There's no way to be fully confident that what appears to be a simple update might not break the whole system. And I'd imagine this is likely what the bank is wanting to avoid rather than having to pay programmers to implement the changes.
You only have to look as far as TSB for a cautionary tale.
But it is broken, as it is unable to support frequent European names.
There's also the case to be made that if the sources are lost, you need to rewrite from scratch, as you never know when you need to implement new functionality that business deems critical. If you only start the new implementation after business is breathing down your neck, you are in a world of hurt.
Imagine being born after the system was installed, having nobody to ask questions and having to fix that, with a possibility to get the whole processing down.
I can totally imagine myself both making this complaining and being the person who totally doesn't want to touch this shit with a 30 meter pole.
It's been pointed out in the article's comments that the article is wrong. EBCDIC code pages 37 and 1047 each have support for all of Latin-1. The bank just doesn't want to support it. Furthermore, UTF-EBCDIC is a thing.
Wow, UTF-EBCDIC is a new one on me! Ugh. Wikipedia says "This encoding form is rarely used, even on the EBCDIC-based mainframes for which it was designed" so that's encouraging.
Huh, someone on Wikipedia must have seen my link to their Code Page 37 article and deleted the entire article 6 hours later. Seems pretty harsh. Wikipedia can have lists of all available Troll Dolls but it can't have an article on an important EBCDIC code page because it's not "encyclopedic" enough.
my name has 1 (one) letter with an accent. i have had people and machines tell me "your name is invalid" my entire life. my citizenship certificate doesn't even match my birth certificate after i migrated.
i will never yield to anglocentrism, the resistance only emboldens me. do your worst; my children will have names that'll make your qa leads have a conniption.
it's really funny (see: depressing) hearing countries and organizations tout how "multicultural", "diverse", and "accepting" they are but will immediately turncoat upon being asked to do the work to support their claims.
either store my name correctly or don't store it at all, jesus christ.
That was my interpretation as well. I was making a joke that the OP said his name had one letter, then indicated that his name was "jesus christ". (All lower-case, too!)
As someone with diacritics in my family name, I can relate. Some years back I could not retrieve concert tickets bought at the FNAC (a large French music shop) even by name, and we figured out later that it was only because the è in my name had been replaced by a space due to mainframe conversion ^_^
putting aside that there is no language agnostic way to replace non us-ascii letters in Latin languages the sometimes absurd ways it can get done wrong are just rediculus
like replacing letters with accents with spaces or some unknown letter mark instead of the letter without accent
this also doesn't just apply to human names e.g. if you replace äöüß in German city names you can get the names of different existing cities and also it's often done wrong too (äöü are accents in Germanic languages, they are full independent letters which official us-ascii representations are ae,oe,ue)
The point is it's a different name which can lead to a name collisions where before there was none wore a collision which only happens on some systems.
The result can be not so funny things from less harmful things like problems with reservations for idk. concert tickets or worse hotel rooms to very harmful things like you getting wrongly negative credit scores, being wrongfully investigated for a crime, getting your business bank account locked because of wrongful detection of likely supporting terrorists etc.
And while some people might argue that given that human names are not unique non of this should happen that sadly isn't how the world works.And while I can't find links to it anymore because it was a few years ago there had been cases of the first two cases. And at least cases of the last wrt. other kinds of spelling errors so it definitely can happen.
Obviously that is a troll. Who cares that your name on a bank statement does not contain your diacritics?
Yes, of course it would be nice if all inter-banking systems would support UTF8 or at least ASCII. But they don't. There are various inter-banking systems that simply are so old that they only support EBCDIC. Newer do, but even quite modern SWIFT still requires the EBCDIC set to make sure data goes through to each and every bank on this planet.
And in practice, if you have diacritics in your name, and do any kind of international business or travel, you are used to having found a transcription to use.
My wife has a diacritic in her last name. Even if forms at airlines now support entering them, she NEVER does. Because you can be sure that sooner or later in your travel, with this name getting copied to some immigration system, Covid health certificate database etc, it will get corrupted.
She never had any problems whatsoever with having her name transcribed on all travel documents, but with her passport containing her name with diacritics. Because, surprise, while I don't know what exact encoding the digital part of the passport contains, and least the machine readable line also uses only the ASCII or EBCDIC character set. So the best way to handle this technological debt is not to sue someone, but just use whatever that machine readable line in your passport days.
Names suck, anyway. The idea of having NON-UNIQUE identifiers to uniquely identify human instances is stupid anyway. Everyone on this planet should simply get a UNIQUE name in an universally agreed character set. I'd be even OK with Base64 of UTF8 ;)
Sometimes the lack of diacritic can change the meaning of the name for the worse and either way, names are deeply personal things that people are attached to, so they might want to have it written correctly, regardless. I wouldn't want someone to mispell my name.
My surname once broke authentication system behind my university. Any time I tried to login, it would crash for a few minutes logging out other users. Years later when I got my British passport it also contained my surname "latinised". I was told I should mail Home Office and get some certification that naturalisation certificates and passports don't include special accented letters and in fact the passport contains a mistake that cannot be corrected in the passport. Technically I have my Polish passport with different surname than the British one. This story isn't that unusual.
Also Polish, my pet peeve is having to write my latinised name into forms that specifically say "please enter your name EXACTLY like in your passport".
My pet peeve is online payment forms which ask to 'enter your name exactly as it is on the card' but require ASCII-only in that field - my name as embossed on the card has non-ASCII accented characters.
Anecdotally, it's slightly easier for Greeks and Bulgarians/Russians/etc. than for those of us whose languages use the latin alphabet with diacritics. The transliteration is generally standardised and English speakers tend to avoid even trying to use the other alphabet.
Pyotr Ilyich Tchaikovsky (note: Often anglicized as Peter Ilich Tchaikovsky; also standardized by the Library of Congress. His names are also transliterated as Piotr or Petr; Ilitsch or Il'ich; and Tschaikowski, Tschaikowsky, Chajkovskij, or Chaikovsky. He used to sign his name/was known as P. Tschaïkowsky/Pierre Tschaïkowsky in French (as in his afore-reproduced signature), and Peter Tschaikowsky in German, spellings also displayed on several of his scores' title pages in their first printed editions alongside or in place of his native name. The modern transliterations of Russian produce the following results for 'Пётр Ильич Чайковский' — ISO 9: Pëtr Ilʹič Čajkovskij, ALA-LC: Pëtr Ilʹich Chaĭkovskiĭ, BGN/PCGN: Pëtr Il'ich Chaykovskiy.)
I’m told today the transliterations are standardised enough that it doesn’t happen much. Greek and Russian friends describe less mangling of their names than I get.
Well, my Cyrillic full name has only two realistic transliterations, but I have friends who have it much worse. Especially people who has a Cyrillic Ë somewhere in their name, because it could be spelled without the diacritic in Cyrillic itself, leading to different transliterations.
Luckily I live in Armenia now, which not only has its own alphabet, but also has a tradition of armenising Slavic patronyms! (Which is fair, I guess, as Armenian patronyms are commonly russified in Slavic countries.)
Anyway, the transliteration of Cyrillic at least is not really standardised in my experience. It used to be worse, with Polish, French and English styles, now it settled on English, but there are a still several sticking points which cause combinatoric explosion of variants. Not to mention that my perspective is centered on Russian, and there is a different standard of transliteration for Ukrainian at least (Kyiv and Kiev are Levenshtein 2 apart in Latin, but Levenshtein 1 apart in Cyrillic), and there is a ton of other Cyrillic languages.
My perspective is on the experience of an immigrant in an English-speaking country. Greek and Russian friends give their standardised Latin-for-English transliterations everywhere and it works because there's no diacritics and no English speakers bother trying to read the Greek or Cyrillic versions. My own Romanian name being Latin alphabet with diacritics seems to get mangled in more ways, from what we've compared.
He's also Volodymyr / Vladimir, and it's like with Kyiv: the Levenshtein difference is increased by transliteration.
Also, I discovered that Levenshtein himself was from USSR, and that's why he's not Levenstein. Though I would not be surprised if some of his relatives would prefer Levenstein, just like Ekaterina (or Yekaterina?) Schulmann [1] doesn't like to be spelled as Shulman.
I don't think transcription from Greek is standardised. I've met several people "officially" called Vassilios or Vassileios or Vasileios or Vasilios while they were in the UK, but they all had the same name (Βασίλειος) in Greek, I think. I don't think the transcription of Russian names has been effectively standardised, either. I think I heard about a recent effort to standardise the Latin transscription of Bulgarian names, but that just suggests that they weren't standardised before then (or were standardised differently, at least).
Also, remember that it's Gorbachev in English, Gorbatchev in French, Gorbatschow in German, and so on, so there isn't likely to be a single Latin transcription unless it's one that's incompatible with existing usage.
Also note that it's usually "Rachmaninoff" in English, because he moved to the USA and became a US citizen at age 69, though you'd normally expect that name to be written as "Rakhmaninov" in English, I think; it's written "Rachmaninow" in German, for example. But that's an example of a different problem, really: people changing their names. While they're alive perhaps you might want to use their current official name, but when they're dead and if they're famous then you probably don't want to be forced to use the name that they adopted as a joke just before they died.
Names are not unique anyway. For most official purposes you should identify people with some kind of number and use the name as a check or to help a human sort things out when something doesn't match and you suspect that the number is wrong.
Today the transliteration of names is standardised, yes. You can ask for a different transliteration on your passport if you really want, but there is one official (sadly lossy) conversion table used for everything government-related, from people's names to street names. Street names in particular can get silly when we have a street named after a foreigner whose name is spelled phonetically in Cyrillic, but then is transliterated back into Latin, e.g. various things named after James David Bourchier are re-latinised as "Dzhejms Baucher"; "Dzheyms Bautcher"; or "James Boucher". Technically the first one is "standard". At least the boulevard bearing his name is now "James Bourchier" in Google Maps. Maybe one of these days they'll also support our addresses properly...
I am not sure about that part, iirc Bulgarian folks can pick how their name gets to be transliterated during the application for passport.
>English speakers tend to avoid even trying to use the other alphabet
Why care about English only, most of the Europe is not natively English speaking (save for the UK, Ireland and Malta) - the odds are the name are to be transliterated once to 'English', then pronounced in Spanish (think of the glorious H and J) or German (W vs V)
Mostly doesn't matter though, they sound entirely different compared to the letters without those changes. And you can't just replace them either, lots of words would suddenly have multiple meanings when written.
A large number of Irish and Irish descendants have a character in their surname that software developers tend to associate with various injection attacks.
There's almost no chance of getting an apostrophe into a lot of these forms
Seems reasonable tbh. UTF-8 is 31 years old at this point. Unicode is even older, and various 8-bit code pages older still. At some point you have to get with the times.
Nobody wants to spend money and risk everything. There are so many industries where "getting with the times" is absolutely out of the question due to legal issues, as well as momentum. Aviation, healthcare, banking, …
And, as this lawsuit shows, not getting with the times exposes the company to repercussions as well. I doubt a business would prefer their own schedule over one set by the court.
ING Belgium found out the hard way, and I doubt this change to their systems was cheap. They were forced to alter a system that was due to be replaced (though that didn't happen in the expected timeline, surprise surprise) when they could've rolled out this change over a long period of time somewhere in the last two decades.
That portrays it as if all the risk is on changing the system.
Banks probably have to deal with a constant level of customer support if they don't store names properly. Credit checks probably go wrong, sometimes in ways that incur real financial losses on them.
There is very real risk in refusing to make your software match reality.
If someone brings this up, I am actually happy if legislation forces them to do it. More often than not, it is possible and companies end up with better systems.
UTF-8 may be invented 31 years ago. But it was widely used until recently. I don't remember when the switch happened, but I think it was widely adopted sometime in middle of 2000s to late 2000s. GNU/Linux distro changed their default locale to UTF-8 and Web sites starting to use it.
However it should be noted, that the encoding that UTF-8 was replacing was largely (not exclusively) iso-8859-1 which probably had the characters we are talking about.
I saw that mentioned in a comment under the blogpost. Where would info on which code page is meant to be used stored? I assume it's not inline with the text.
I never really understood the insistence on diacritics. My name is properly written in Hebrew characters, and contains sounds not present in the English language. Many more people use CJK, Cyrillic, Arabic, or other characters. The writing system in use by many people and companies for the English language does not support any of these characters, and nor does it support diacritics. Some variants (in particular, see airlines) use only capital letters, with no spaces or dashes; this also seems reasonable to me (even when my name is somewhat mangled as a result). So just like a CJK/Cyrillic/Semitic name has to be roughly transliterated into Latin characters, and inevitably pronounced differently than in its origin language, why shouldn't the same be true for names with diacritics? Saying that the One True International Alphabet is latin-1 or whatever is rather arbitrary. And no, the solution isn't to give up on cross-cultural communication, nor to require that every human learn all the characters in Unicode.
A Belgian bank requiring a Belgian person to transliterate their Belgian name to English characters is somewhat similar to an Israel bank requiring to transliterate your name from Hebrew to Korean (which has as much relevance to Israel as English has for Belgium), and refusing to use Hebrew in their communication.
There are official national languages and alphabets, the involved characters are key parts of those national alphabets - who cares about what the English alphabet is? Neither the bank nor the customer are in England or in an English-speaking country, and there is a legal duty for the bank to support the national language(s).
You're right, I was thinking of this in the context of an English-speaking country/institution (mostly since I do often see this demand in those contexts as well; to be clear, I think it's a reasonable request, but not a reasonable demand or legal requirement). In the context of a Belgian customer of a Belgian bank it's definitely more surprising; but I also wouldn't categorize it as "inaccurate personal information", but awkward transliteration to a foreign alphabet; as you said, that may be strange and possibly even run afoul of national language laws, but I wouldn't use GDPR against it
> [Korean] has as much relevance in Israel as English has in Belgium
Legally maybe, but certainly not demographically/culturally; I mean, certainly there is a difference in the expected probability of being understood, especially given the highly-overlapping character sets
Yeah. This is how you get company like Blizzard make a super expensive complete flops like Diablo IV. By having stupid beliefs about what your users want. Or ever better, by having your own wants, and projecting those wants at your understanding what the users want. You lie to yourself for long enough and you start believing it.
Recently the way I approach choosing software has changed. I found I spend too much time dealing with problems and issues because whoever produced the software does not really care to ship working product or features that would make my life better. So now I prioritise choosing reliable software and companies that care for me, the user (at least more than their competitors).
You can imagine things like basically every Micro$oft product got dumped immediately. I now refuse my family requests to support their MS-powered machines and tell them if they want Windows they will have to support it themselves, I don't have time to deal with this crap.
I found by sweeping the stuff that gives me headaches I have improved my life and productivity even if I have to use some more expensive or sometimes functionally inferior products (and sometimes do it manually).
A summary search appears to show that the game was riddled with bugs and that killed some momentum. Given the amount of copies sold, it was still probably a success.
> Users don't care about technical debt.
>
> > Falsehoods Project Managers Believe About Technical Debt
If users care about something, then it's either a missing feature or a bug (/ UX flaw), _by definition_.
Sure, we could argue definitions, but over the years I've concluded that "technical debt" is only a useful label if it means "things that make development harder but don't impact the user interface."
Latency optimization? That's a UX improvement, not tech debt.
Supporting characters in existing customer names? Bugfix, not tech debt.
But technical debt slows everything down. I’ve seen first hand whole modules of a system ignored because nobody wanted to touch them. All the bugs in them can’t be fixed in 5 minutes.
GDPR is handled by the state - no idea how/where frivolous part comes from. The people can file complaints with the state/agency but not sue companies directly on their own for violations.
Sure, just store it that way, and make sure the output is converted to UTF8 at every point where a modern system is involved (envelope and letter printing, statement printing, customer website, internal customer support tools, internal anti fraud and audit tools).
Also convert input from UTF8 to EBCDIC at every point a user or employee can type into a modern keyboard/OS. Eg searching customers by name.
Teach employees on inputting characters. Are we going to allow all unicode characters? Will they be able to find records if the customer doesn't give the diacritics in one context such as phone?
Also some routine tasks will be done using a mainframe terminal which doesn't support displaying these characters. More training there as they will just display the UTF-EBCDIC version of the field.
Are there connections like tax agencies, credit reporting agencies, other banks, Visa/Mastercard, and what character sets do they support? We always sent them ASCII CSVs via FTP and everybody who understands it no longer works here.
The quoted bit stops too soon; the original goes on to say that in the "near future" they'll be upgrading all their systems, after which it can deal with accents, and a bit further it says that they expect it'll be ready in 2020. That's why they didn't modify the existing system: they didn't feel it was worth it since they're working on replacing the lot.
This wasn't accepted as a defence with "it's 2018 and you're still using a system from 1995 wtf?!" so they still had to pay a (small) fine.
Oh, I get it - modifying the system costs money that the bank is not willing to spend. After all, only one person complained.
If it happened her, in the US, fixing the issue would be a very expensive process, probably requiring replacement of the entire system and triggering tons of audits, etc.
but it is broken, utf8 support need wasn't new when the law suite happened
there is also the issue that I would be surprised if that system _didn't_ had tons of security vulnerabilities (which just weren't found because of it's obscurity). Heck, I have heard of enough cases where old banking systems had ton's of well known security issues in their interface they just "defined away" by means like "oh you need to be a bank employee to abuse them so they are not an issue" or "pst, don't speak about it we (supposedly) can't afford an update and they would fire people if they need to do an update".
And ING's Unite be+nl project didn't end up meeting the dates they provided the court, so even if the court had accepted that excuse (it didn't), they still would've needed to alter their systems anyway. In fact, the attempt to unite the Dutch bank with their Belgian counterpart, which would've modernised the system, was cancelled in 2020 after barely making progress after four years, followed by a thousand jobs being cut worldwide.
"We're going to replace this anyway" is a pretty bad excuse for a fix that should've been completed ten years ago, and with the replacement clearly not coming any time soon, especially when the you're legally obligated to provide the fix.
The fact that a personal name is stored in EBCDIC is missing the point. the mainframe has supported unicode for 20 years. The bank account holders name is stored in a character column in a database. Every character column can be defined with a different codepage, so it is possible to define the name column as stored in UTF-8, EBCDIC or any of a number of other codepages.
For the bank to claim that the reason that they can store names containing diacritics because the name is coded in EBCDIC is nonsense. The bank just needs to change the database column definition to UTF-8.
You were making a really solid-sounding technical argument, but then you used the phrase "just needs to", which, for me, triggers a tendency to discard whatever the person is telling me.
It might not be as easy as you think if the system has somehow survived without an upgrade for decades. Who knows, maybe it's that ancient?
Back in the day I worked on CODASYL style databases (IDMS) that was EBCDIC. This is before relational databases were a thing. We had no PCs: all of the companies’ data was is in the Mainframe in EBCDIC. No ASCII anywhere, let alone UTF-8.
That’s assuming they didn’t develop the database in-house. If they concocted their own system, it very well could be that this data is in fixed-length records without much room for a quick fix.
I have a Scottish last name starting with "Mc" like "McTavish". Just getting people/companies to capitalise the third letter is an uphill battle some days.
(Oh, and forget about having it sorted next to MacTavish as is tradition.)
I have worked on systems that had to do this in the long-ago past, and the headline is simply wrong.
There are many 8-bit EBCDIC code pages that support many languages (multi-hundred page book of them last I looked at it), at least as many as ISO-8859 alphabet soup encodings... and there are also shift-in/shift-out double byte EBCDIC encodings for Japanese, Korean, and Simplified and Traditional Chinese. They've been around since the 1980s at least. Since nesrly the beginning of utf8 acceptance, there is also UTF8-EBCDIC (shudder)
The bank developers just didn't write their code in a way that can support the multi-codepage way of storing EBCDIC.
Funny, just yesterday I listened to a podcast were a small part they were talking about retro-computing. They ware talking about AS400 and that they are still in use, if you think about Amadeus, one of the worldwide flight-booking systems for example. That some banks in their backend still use AS400 and EBCDIC. The thing why these systems are not being updated is more like: never change a winning team. Because you can spend millions to update those systems just to discover that there is an error in your update. So you spent millions for nothing or even worse. The is not real technical benefit for them to update as long as the systems run.
I think that I am on the banks side. The name they have in their database represents you, the person, this is the same concept as the account number that represents you. If the character set they use is unable to handle the characters you normally use, they will substitute characters to get as close as they can.
As an extreme example, say I go to japan where the restaurant owner dutifully writes my name down as "ソマット" should I get angry because it is not in latin characters?
Shocking that a bank would have such an old system in operation. As a student, we visited ING HQ where a big server room was shown to us that ran virtualized version of much older systems.
But I dare say that Rabobank and ABN-AMRO would have similar prehistoric systems in operation.
Partly it's a lie that the stack forces them to use EBCDIC. There's plenty good ASCII and even UTF-8 support on mainframes even in the traditional OS's. They just refuse to rewrite the customer facing portion of their application for whatever reason.
I wonder whether it is assumed that personal data has to come in a superset of Latin.
I wonder if EU banks have to render your CJK name correctly, or a cyrillic one if their customer happens to be Bulgarian. And use that without falling back to some sort of transliteration.
I interned at a large-ish retail company in the US and we had an IBM mainframe that hosted the core DB2 server. I remember when I first learned of EBCDIC because of that IBM machine and I was blown away. Never had I worked with a character encoding that wasn't at least ASCII compatible.
ASCII's table layout had some real benefits to it, like being able to flip the fifth (?) bit to change case of characters. Did the EBCDIC layout have any serious benefits? I have to assume there are but my quick web search didn't come up with anything.
The information wasn't inaccurate, it was encoded as accurately as possible in a specific encoding. I can't see how anything would be "inaccurate" in any way the lawmakers intended.
Worse thing than that - it took roughly 9 months for a credit card of mine to be enabled. After many, many calls it turned out my name was too long... or more like it that I had submitted a middle name, the bank had not notation of such a thing, so they filled it as a first name (glued together as is tradition).
Even both together are no so long - fewer than 4bits in length.
In the end the bank did shorten my middle name, kept it as a sole letter.
If the person had applied for a credit card, used it, then refused to pay because the name is different from his legal one, I suppose it would be quickly fixed
Why are companies so willing to ignore information technology maintenance? If you have a 40-year-old roof, it will need to be replaced at some point. Why is this any different? Aren’t they able to depreciate investments in information technology like any other asset? I’m not an MBA, so I don’t get it.
If it ain’t broke don’t fix it. No one is going to be happy if you’re upgrading your system to allow diacritics in the name and suddenly all their money disappears. There’s a reason they use COBOL so much and that is because they haven’t needed to replace it. Replacing it just because it’s old just seems like chasing the new ideas for the sake of it.
There's plenty good ASCII and UTF8 support on mainframes. They just don't want to rewrite any of their ancient application, which could easily have a more modern UI slapped on it that could in various ways support more modern encodings, even if they left parts of the backend in EBCDIC.
I don't see how can it be done with backward compatibility? The ship has sailed.
I'd wager that Unicode is actively influencing Gen-Z CJK language users by introducing subtle character errors. I've seen elementary school students confused by differences between what they see in text books and on screen, and write wrong characters (because the online dictionary picked the wrong font).
Codepage 1047, which is EBCDIC based, supports all the same characters that ISO8859-1 (latin 1) does. This includes diacritic characters. Beyond that, IBM mainframes also work with UTF-16 natively. Can't the Dutch bank upgrade their mainframe?
I agree, but I saw it claimed EBCDIC doesn't cover diacritic characters. That's not true.
If ISO8859-1 covers the exact same set that CP1047 does and that doesn't meet GDPR's requirements, that implies ISO8859-1 doesn't either and a unicode based character encoding is a necessity for GDPR compliance.
Then again, unicode doesn't cover the late artist formerly known as Prince's written name either.
All things considered, IBM should work with that Dutch bank to move their system over to UTF-16. It's the natural progression of such things.
Same goes for airline tickets. When you buy a ticket, the travel agency says that you should write your name exactly as in your passport, but on the ticket and boarding pass they then change it: Hyphenation is removed as well as accents.
What counts as accent vs letter depends a lot on the language. For example, in German the umlauts are considered own letters and not accented vowels, in Spanish ñ is considered an own letter and not an accented n etc.
I would have expected IBM to have defined UEBCDIC-8 by now, where the single-byte encodings are identical to EBCDIC ;). Those only include punctuation and control characters though, so that might be the problem.
"It baffles me that it was still being used in 1995 - let alone today"
This person obviously has never seen the inards of a bank's IT department. That core mainframe isn't going to be replaced anytime, it runs code older than he is, and workarounds that would satisfy the complaint would be very large projects with small chance of success.
Pragmatic solutions such as having the correctly spelled name at the top of a report indicating that in the rest of the docs this person will be referred to as the EBCDIC coded name should be aceptable.
"realistically" .. I don't think it's much to ask a Belgian bank (or offering accounts in Belgium) to actually support Belgian customers?
Unless they let go of every single customer with diacritics in their name, the complaint would still stand. Especially since the case is brought by a government body, not the end-customer.
The "throw your hands in the air" move for the bank wouldn't be to ditch this single customer, it'd be to ditch the entire eurozone. Remediation is likely cheaper.
It seems doubtful they’d win in cassation as the GDPR does literally say that personal data shall be accurate (article 5 1.d) and that personal data subjects have a right of rectification (article 14).
Their only possible argument would be that fixing their shit is not a “reasonable step”.
I’m not sure that even falls under the purview of the cour de cassation, but IANAL so I’m not quite clear on what a “matter of law” is exactly.
They were already in the process of overhauling the systems, which will allow diacritics, and expected it will be finished in 2020 (I don't know if they met that target).
This actually happened at my previous company. The sole maintainer of a legacy system was a contractor in his sixties and he died due to COVID. The system was still unmaintained and falling apart when I left.
This is why my bank only stores people's names as bitmap images. Simply draw your name in the provided blank on the website. Just make sure the pixels are identical each time. For a password, whistle a melody in your microphone. Your password must contain at least 3 microtones, a rising pitch, a falling pitch and a glottal click; also it cannot be similar to any known pop songs.
Oh awesome, I’m about to sue every airline that can’t handle the hyphen in my name, EBCDIC can handle hyphens but it would be funny if some other system is just as limited behind the scenes for an amusing quagmire
I’m an American though, in the US, how can I leverage GDPR for this
I’ve booked flights throughout Europe before on European airlines. But I’ve also accessed American airline sites from Europe. I can VPN through Europe and do the same thing right now. Which version lets me lob a complaint? And to which country?
Close your account and go to another bank that can deal with your name properly if that is so important to you, but do not waste resources of the legal system with stuff like this. That is what we have free markets for.
So I guess the bank was just too lazy to provide the character conversion before storing to EBCDIC, made cheap excuses and is now forced by GDPR? Good!
What does "store as binary" even mean in this instance, binary is just ones and zeros. You have to decode it somehow for it to carry meaning. Maybe you meant opaque, but even that would the prevent normal display the data to the user.
The user is not going to accept a hexadecimal binary form of their name, you can't sidestep
Having to deal with encodings by just not specifying one.
Also good moment to point out the you can't do a binary comparison of unicode strings, you need a unicode library that handles normalization.
That would mean that Zoë (spelt with U+0065 Latin Small Letter E followed by U+0308 Combining Diaeresis) wouldn’t match Zoë (spelt with U+00EB Latin Small Letter E with Diaeresis) even though they are the same name that just happen to be typed in on different platforms.
Names exist at a different level of abstraction to binary and it’s a mistake to treat them as binary.
I don't know about that, there still needs to be some validation. Is your legal name really "Bobby $@134 Tables"? In this case validation comes in the form of "is every character EBCDIC-compatible?"
those appear as point you'd just not like. None of them define reality, e.g. loyalty cards (or even personal id cards used as loyalty ones) effectively exchange personal data for services, i.e. monetary discounts.
Discriminating against a non small part of the population of the country you work in is stupid???
You know what is stupid, the attitude of "works for me must work for everyone". It's objectively stupid because it has caused probably not just millions but billions of damages.
The huge majority of widely spoken languages in the world does use letters not contained in us-ascii, this applies to names, too.
I didn't expect people to not know that there are more then just accents when it comes to non us-ascii letters. Quite an eyeopener for how shortsighted people can be.
Also it is discrimination because it's a different name and the difference can lead to issues which can cause non small monetary harm.
In legal documents you transliterate from foreign alphabets to the local one according to legal rules on how the transliteration should be performed. In this case, all the "incompatible" letters are a key part of the local language, which the bank is required to support - most of the world is not like USA which famously doesn't technically implement the concept of an official language.
But technically your point about transliteration is a valid solution (and one that I have actually seen in practice in banking systems) - the bank is free to transliterate the name to something that fits in ECBDIC, as long as it also then appropriately transliterates it back to use the proper name in the documents and communication. The law doesn't mandate any technical nuances of data storage, as long as they get the proper result.
The key point here is that it's not an English-speaking country where being limited to English would be acceptable.
There are official state languages, which have official alphabets, which include non-English letters. There are legal names of people and organizations which include non-English letters.
The law requires the bank to have and apply correct customer information, and spelling someone's official name wrong is simply not acceptable.
I mean, this about business requirements not tech. The bank can use any encoding they want to accomplish their requirements. Nobody is dictating a specific technical standard just the results.
It is the fate of all public online communities to asymptotically approach the intellectual level of Reddit, unless a strong & constant disciplinary force prevents it.
You try and have a weird name in a foreign country. If you're not careful, everyone will spell it differently, and then you'll spend ages going "No, I'm sure I'm in your system. Try this other spelling".
At least back in the 90s when I often had to deal with that, you were usually talking to a real life person, sitting on the other side of the desk. You could show them your documentation.
Today since everything is online and computerized, the risk is that a computer somewhere in the chain will just go "The bits don't match", and it'll be challenge to even reach a real person, let alone one capable of even understanding what problem you're having.
Here's a real problem this could cause: To get a visa you usually need to prove ties to your country. This normally includes a bank extract. If the name printed on your bank extract doesn't match what's printed on your documentation that means a very real risk of rejection, and not going anywhere if you can't fix it fast enough. And I can imagine other very not fun possibilities, like having some sort of KYC/AML snag where something decides you lied about your name.