Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
McDonald's AI drive-thru bot accused of breaking biometrics privacy law (theregister.com)
66 points by donohoe on June 13, 2021 | hide | past | favorite | 41 comments


> The software apparently has an 85 per cent accuracy rate.

That number seems really low to me for a task specific system. i.e. it doesn't need to understand every possible thing someone could say; just the subset of language used to place orders at McDonald's. For reference (https://paperswithcode.com/sota/speech-recognition-on-libris...) SOTA models are at ~5% WER for general speech.

And besides, who wants to order from an AI that's going to fuck up 15% of the time? Compared to a human that error rate is perhaps on-par, but a human has an internal confidence measurement. We know when we've heard something wrong. Speech systems don't really have that (1). So the AI will just blunder forward with your order. I'd much rather interact with a system that says "Sorry, what was that?" 15% of the time (i.e. the current meat based voice recognition that fast food restaurants use) versus a system where I constantly have to check the screen and tell it "Oh, no, sorry, can you fix the, ummm, uhh, we don't want 20 orders of ketchup packets... Oh, god, no we don't want 40 ketchup packets! No our order isn't done! WAIT!"

(1) Yes, you can guess at confidence by measuring the logits, but that doesn't work in practice. It's nowhere near a human's capability to self-measure confidence in our predictions.


Using your own link, SOTA is 2.6%. WER could be lower for the much smaller vocabulary used in a drive-thru.

I'm not sure what you mean by "that doesn't work in practice" re: using logits. Word-level confidence is pretty useful with GCP Speech-to-Text [1]

[1] https://cloud.google.com/speech-to-text/docs/word-confidence....


> I'm not sure what you mean by "that doesn't work in practice" re: using logits.

Like the other comment points out, models today aren't "calibrated" to give that kind of information. More precisely, we aren't training our models to explicitly tell us how confident they are in their predictions. They're simply trained to give the predictions that result in the lowest average error over the training dataset.

For example, we can consider the simple task of recognizing the words "yes" or "no". A naive model could return (0%, 100%) all the time (always guess "no") and, if the dataset is balanced, would get a score of 50%. Another naive model could return (50%, 50%) all the time and get the same score, 50%. Yet in practice we'd rather have the latter model because it better expresses that model's level of confidence. The former model, even though it gets the same average error rate, expresses a level of confidence in its answers that isn't there.

As of today, we only train models on the overall error rate, so our training methods don't prefer one kind of output over the other. That's why measuring the logits to guesstimate confidence isn't actually a good metric. It just happens to accidentally be one sometimes.

A speech recognition model might get to the same WER as a human, but humans are keenly aware of when they didn't hear a word right. That's invaluable information to a food ordering system which can then respond by asking for clarification, rather than blindly following its "best guess" which results in the aforementioned ordering of 40 ketchup packets.

And as far as I'm aware there are no loss functions for training confidence measurements into a system, so this is very much an unsolved problem in speech recognition systems.


> I'm not sure what you mean by "that doesn't work in practice" re: using logits.

I suspect they mean they ML models are usually poorly calibrated and that the softmax-over-logits probabilities generally don't reflect actually error rates, so they're tough to use meaningfully for asking people to repeat themselves.

Personally, if I have to deal with an automated order system, I'd rather some kind of search tree that let's me traverse it using three (left, right, back) well separated noises and a "dumb" back-end instead of having to pretend a ML system and I are having the meeting of the minds that a voice based discussion implies.

I understand that such a system would be hard or impossible to train lay-people to use, but it would be nice to have a "cut the crap" option to let people interface more effectively with the order system and not take part in the charade of a "discussion"


> SOTA models are at ~5% WER for general speech.

Do SOTA results measure performance in difficult environments though?

Presumably dealing with people talking out of their car window next to a busy road would be a lot more difficult than dealing with a relatively clear audio recording.

EDIT: It looks like the linked results are for an audio book dataset. That seems like an optimal environment where you’re going to get clear enunciation with minimal background noise.


I never understood why the speakers in drive-throughs are so shitty. It is like the low resolution CCTV camera recordings released by police that are close to useless for identifying perpetrators. What is the point? This is 2021, DSP technology is a commodity on the orders of a few cents to a few dollars per chip. There is absolutely no reason for an intercom to sound so bad, ditto for CCTVs. A commercial grade surveillance/intercom system that cost 3-4 figures should at the very least perform better than consumer tech like Amazon Ring.


15% error is wildly less than I normally get ordering at the McDonalds drive-through. Not quite half the time I have to give my order again at the second window, then again at the third window.


Here in the UK when you go through the drive through at McDs the order appears on the screen next to the microphone, they ask you to confirm that's what you ordered. Never had a wrong order this way. Seems like a very simple solution if something this is isn't standard everywhere.


All that matters is what is cheaper for McDs. Meat-based order processors are relatively expensive. Like, way more expensive than 15% incorrect orders.


As I've gotten older I've cut back on my fast-food intake quite a bit. Not really for heath reasons either, though it's a nice side benefit. It just really isn't very good food, and it seems to have gotten worse over the past 20-30 years. Especially for what it costs.

However, I will never order from a voice-driven AI system. If that is the choice, then I decline. I don't use it in any other area of my life, in fact I cannot stand talking to computers despite using them professionally as a career. When I call someplace and have to speak to a computer I mumble jibberish until the system gives up and transfers me to a human. I'm a human, I want to speak to other humans, not computers.


Sorry to be Butlerian about it, but "conversing" with machine simulacra disrespects me, the customer.

Someone should deploy a voice-producing AI to interface with companies' voice-processing AIs so I can order from the command line or from an app.


Conversing with the machine simulacra disrespects you, but instructing your machine to converse with their simulacra doesn't?


English isn't my first language so I dread any kind of voice-based systems, and automatic recognition is just the worst. Calling a help line where you have to say what you're calling about to an AI first is just futile. If that's the only way to order at McDonald's then I'll just stop eating at McDonald's.


> mumble jibberish until the system gives up

I mostly just press * (sometimes zero) until it gives up and connects me to a human. I will never speak anything to a computer.


> I've cut back on my fast-food intake quite a bit

Same here. I learned to cook and spice foods properly when I finally decided to go on a diet. I love making foods spiced just the way I like them. Now I can barely eat at McDonalds because the food is hideous - fat, sugar & salt that starts me craving and leaves me feeling sick.


Personally, I don't mind having all my data harvested. I'm aware this is a hot button issue on HN that keeps getting people outraged, again and again, but believe that eventually people will start shrugging about how invasive technology can be.

Ultimately things are moving towards the end of privacy and even (within 50? 100 years?) the end of ownership. I don't see how a civilization that can - easily and with currently available means - house, clothe and feed everyone on the planet, can sometimes be so lost on petty issues as we do (myself included).

I'm more concerned with the health hazards posed by modern technology and industrial processes - but that's me and the reality I subscribe to.


Things are not moving towards the generalized end of privacy and ownership. That may be the case for a specific subset of the population (the 99 percenters), but overall things are simply moving towards a centralization in information availability and asset ownership.

The "elites" (sorry for the boogeyman word Overton window'd out of acceptable use, but it's accurate) own more and more, know more and more about you and your peers, and you still don't know anything about them, or have any democratic control over them.

For someone presumably working in the information industry, calling privacy a petty issue is an interesting opinion, when data is the underlying lifeblood of most of our businesses.


Please allow me to express my shock that you can even formulate this statement.

Can you not see the dangers that, as you are profiled, and data gets correlated across providers, you will be subject to levels of discrimination you can not foresee ?

Some simple examples:

- You live at a certain location and looking to move ? A certain provider will decide not to show you jobs if you happen to live at a certain neighborhood ( Already happening...)

- You live at a neighborhood somehow less respectable, or respectable but where inhabitants are considered more prone to have car accidents ? -> Your car insurance will be more expensive

- You are deemed to cruise through McDonalds once in a while ? -> Your health insurance will be more expensive...

- You happen to be correlated to a certain group a people due to data location ? Even if you do not know these persons ? ->You are likely to be inquired by the police if you know anything or have seen anything ...

-> You share when you have your meals or you consume less meals ? - You will be classified as probably part of a certain religious group

- You spend a certain time in a certain hospital department ? -> Your current health status will be shared with your current employer, dooming your chances of promotion...

-You spend a certain amount time at certain bars, entertainment venues or restaurants within your town ? -> Your sexual preferences will be inferred...

Privacy guarantees,liberty and peace, plus societal opportunities are the defining fights of these times. To see a statement like this, frankly inspires both sadness and concern.


> the end of ownership

I think it more likely the acceleration of ownership. Near-infinite copyright, patented math, the trend has been more greatly in one direction than the other. Even the nature of FOSS/Creative Commons relies on this (even as many suppose it to be breaking free of it). (As someone from some documentary I can't fully recall put it: we will not be free and the world will not be fixed until every square inch of it is owned privately.)

In previous western society, it was acceptable to own the physical human being. The investment was in ontology. But now the investment is moving towards epistemology. People are data. Own the data, own people.


Privacy is a 1 dimensional spectrum which runs from living in isolation on one end to living under thought police on the other end.

You cannot move towards the end of privacy without losing control over your thoughts.

In the world you're describing, everything everyone says will be recorded. Neural networking will be used to process tone and read body language to determine your thoughts before you even consciously recognize them. As a child you will be trained to recognize and halt bad thinking subconsciously.

Consider the episode of The Office where Jim is offered the position of branch manager, and he turns it down because he thinks everything's working fine without a branch manager so why have one?

Never pass up the chance to take an active role in sustaining a good thing. If you feel the balance of privacy is good today, take an active role to ensure it slips no further, or else someone with more ambition than you will gladly push that point along the line a little further.


Every time I see this take, I imagine Lenny saying "Goodbye, dental plan!"

We'll all be toothless sooner or later. Why not abolish dental hygiene and medicine altogether? Think of the savings.


Someone will always own it, it just won’t be you.


Someone will always have privacy, it just won't be you.


> Ultimately things are moving towards the end of privacy and even (within 50? 100 years?) the end of ownership

Yeah we've actually already tried that a few times. It's never ended well.


  the end of ownership
Maybe you'll stop owning stuff, but your self-interest will be owned.


Here ye, here ye. It's just the direction things will go towards, and I know most other people will feel uncomfortable(and so do I) but I think it'll be something that could be changed by changing societal expectations.

An analogy could be that 100 years ago the outrage was over women wearing revealing clothing and how it would affect the purity of society, but now attitudes have changed and now we barely blink an eye.


The McDonalds app is pretty good for special orders and large orders, which would otherwise be irritating for everyone involved. My local restaurant has always correctly removed the ketchup, onions, and pickle from the $1 cheeseburger.

Saving my "Favorites" and the deals giving $3 off makes up for the mildly clunky app performance. Being able to take everyone's order before leaving the house is also nice.


Wow! That law is interesting!

-----

BIPA) states: “No private entity may collect, capture, purchase, receive through trade, or otherwise obtain a person's or a customer's biometric identifier or biometric information.” unless it receives written consent.

...

Under the BIPA, people can receive up to $5,000 in damages from private entities for each violation committed “intentionally or recklessly,”

-------

Unless I'm missing something people in Illinois should get Amazon Echoes or Halos which both "voiceprint" users and then sue Amazon to collect their 5k reward.


There are Amazon Go stores in Chicago. I wonder how this could affect them? Is a 3D point cloud of your body considered to be part of your biometrics? Is accepting the TOS written consent as defined by the law mentioned in the article?


Isn't this what Google has been doing? Collecting our data (voice and whatnot) and using it to build automated systems?


They must have language in their Terms and Conditions that allow this. McDonald's probably has one too. "By using our drive thru service you agree to have your voice collected..." Also, what about the message "This call may be monitored for quality control purposes." That's not asking permission, but it is warning I suppose.


I wonder why they don't forward the drive thru calls to a call center even today.


Using voice recognition to automate drive-thrus...

Seems like they could've added QR code + online ordering to each of their parking spots instead but I guess AI is cool too.


I've used touchscreen kiosks at Taco Bell and McDonald's and Subway. TBH, none of them are great experiences for simple orders.

Order ahead is available from the app, but I wouldn't be surprised if regular audio is actually faster in most cases.


Audio is probably faster per person. However, it is serial in nature. I walk up to 4 kiosks and want a combo #3 with a coke, and I get to skip the three families, each trying to talk their 4 year old into choosing fruit and milk over fries and a coke.

Also, order verification is nice with a kiosk (for now). If I want something without pickles, I can see that right in front of me. It seems like every drive-thru and PoS that has a display which can provide order feedback has switched to displaying ads to buy some other product. I'm sure it's just a matter of time before the kiosks start to have modals which have to be cleared to continue with the order: Have you tried the hot apple pie? How are you enjoying the kiosk experience? Really, the people doing these things can never resist.


Audio is much faster. McDs is way behind with mobile order and pickup. I've done curbside orders where I sat in the slot and watched the entire DT empty out before someone came out with my food.

If you want to see a group that has their technical shit together, check out Chik-Fil-A.


They could offer a phone number to call in to place your order so orders could be placed in parallel


McDonalds does offer mobile ordering, but their own-branded apps are segmented by country. If my App Store is set to the US and I order in Canada (or vice versa) then I have to fall back to more traditional ordering methods like going to a kiosk/cashier/drive thru.

Honestly this (cross-country ordering and payment) is one of the things Uber Eats does really well, but you definitely pay a price premium for it over, say, the Starbucks or McDonalds apps.


People who own cars aren't guaranteed to own smartphones, and McDonald's wants their business too.

In addition, the McDonald's drive-thru is iconic, and the stickiness of that part of the brand means it won't go away soon.


And the fact remains, and McDonalds knows this, that their food doesn't survive 10 minutes in the bag, much less 30 with a delivery driver.


They already have it so that you can order by phone. A touchpad/kiosk type thing would make perfect sense too except maybe it required more hardware than they wanted to spend money on?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: