As a Google employee, I really don't want to be saying "Ok, Google" in my home all the time. It's totally possible for me to go to work in a subway that has Google ads, waste time on my Pixel, work at the Google office for eight hours, waste time on my Pixel, walk past the same ads on the way home, watch YouTube videos and do Google searches about random topics, and ask Google to set an alarm before I go to sleep. It's too much. :)
"OK Google"/"hey Google" is one of the reasons I went with Amazon's products instead! (other reasons include that they seem to understand my voice more reliably than Google or Microsoft's systems). I get to call it Computer instead of Alexa too, though if I were stuck with Alexa it would still be preferable to Google (or Amazon for that matter).
It's isn't just branding though, if I'm right. The wake word needs to be something that is easy to pick out from a complex audio environment and not be a regular part of common speech so cause confusion (computer fails a bit in that regard) so allowing completely custom wake words might cause reliability issues. Also some might choose wake words that the brands don't want to be seen listening for: I might choose "slave" as a Blake's 7 reference for instance but that could easily offend some if they overhear, and there are many other epithets, slurs, and swears, that would not be deemed suitable either.
I do seem to recall that being able to set a customer wake-word was in the works at one point.
I don't see why the system could not be designed such that one could submit a wake-word, which is analyzed for suitability in terms of being sufficiently distinct in complex audio environments, and then checked against a blacklist.
Shouldn't be beyond Google's technical capabilities.
The thing is, a LOT of people avoid Google Home/Assistant because they simply can't bring themselves to blurt out cheesy branding like that.
They really ought to solve that problem. Probably they will right about the time they port Inbox bundling to Gmail and provide a Drive sync client for Linux (it's been, a decade?).
Why is this a problem that needs solving? If the user chooses an ineffective word they'll figure it out pretty quickly... and if they choose an offensive one-- that's on them.
They could also walk around saying offensive words when not addressing the computer.
It wouldn't be a problem if all people were decent and rational.
Unfortunately some would reprogram other people's devices to cause offence for a jape and some will get offended and blame the company for letting it happen.
The difference from Google/Amazon/Apple/MS's point of view is that if I run around yelling the N word and their device doesn't even notice then they can't be seen to be complicit by unreasonable people. It would just be me being an arse.
Heck, some would get offended at the possibility of one of a company's devices responding to loaded words like that even if it never actually happened.
Of course there is another reason: having people use the same word re-enforces brand recognition even if the wake word is not the actual brand just a word/name people associate with it.
I'd prefer it be treated like passwords - upon installation, you should change the default immediately.
I refuse to use a service that requires me to say its globally recognized name so often I will probably become brainwashed to it. And then there's the older hacks with TV commercials that took advantage of those defaults, and the (cooler) hypersonic transmitted voice command attacks, or the ones delivered by vibrating the device's microphone with a laser, etc.
None of these attacks would have worked if the product trigger wasn't so predictable from the get-go.
Eventually even Raspberry Pi stopped using the default pi/raspberry default combination. How we invoke our voice-activated programs should be treated with equal care.
This reminds me of the great Sci-fi book series "Old Man's War", where the soldiers get a thought-controlled computer called "Brain Pal™" installed in their head. After installation they first have to choose a name and almost everyone uses a swear word for that. The main protagonist then keeps activating it with the phrase "Hey Asshole".
I'm pretty sure that this book came out way before Google Assistant.
I'd also add their insistence on using the same branding approach for experimental apps as their core offerings. A new app named `Google $RandomNoun` has a really high chance of being killed a few years from now, while `Google Search` and `Google Maps` don't. I'm sure the company wants to use the same structure to give new products an initial boost, but they seem weirdly indifferent to the long term damage it's doing to the overall brand.
So, I can't find the reference googling it now, but apparently Jeff Bezos really wanted Alexa to be called "Amazon" (pretty similar to Ok Google).
I think he didn't want to dilute the brand, and have the association front of mind or something, but the people on the alexa project managed to convince him to go with alexa instead (as it's confusing, and arguably a better name).
I think that's part of the reason for the different wake words, one of which is amazon (though the confusion with people called alexa is likely a much bigger one. It's probably why they have amazon in that list at least).
Very interesting. I can’t help feel the irony given Alexa itself was a brand Bezos acquired* in 1999 to get in on being a Search Engine just when Google was on the rise...
I can hear it now “We finally have a use for that brand we spent all that money on..”
Agreed. And I have a small child. I _really_ don't want them to be forming "relationships" with brands by asking robot assistants named after corporations to do stuff.
This is what has led us to stop using our assistants around the house. I have 4 of those little google pucks around the house that we used for various tasks and automations. Now that we have a 1-year-old we almost never use them because I don't want him learning the phrase "ok google".
It sounded dumb enough for two adults to say it all day, but the convenience was worth it. Having my little one start using the phrase to talk to things is definitely where I draw the line.
Besides, now that it tries to make suggestions after every other command all you really hear in my house in regards to them is "hey google shut the fuck up and do what you're told". Unfortunately the stupid puck doesn't allow you to interrupt the unsolicited FAQ any longer.
Come to think of it, I think I'm getting rid of most of these things this weekend.
My Alexa really needs a way to whitelist commands. I bought it to be a wireless clapper and fancy alarm clock. If it thinks I'm asking it anything outside of those two domains, I am not.
Of course Amazon isn't likely to ever do such a thing, because god forbid we don't have the ability to suddenly order something from Amazon on every device at all times.
> Of course Amazon isn't likely to ever do such a thing, because god forbid we don't have the ability to suddenly order something from Amazon on every device at all times.
Comically, that's actually one of the things its the worst at. I tried to do it exactly once and swore I would never do it again. In the time it takes to read the first search result, I could have found and ordered what I want from my phone. If that's what they were going for, I can't believe they didn't scrap the project.
Supposedly they're fairly good at re-ordering, but given how often Alexa mis-hears me, I'm not a fan of doing anything involving money on it.
That's interesting. The Amazon devices can be configured with a wake word of "Alexa", "Amazon", "Echo" or "Computer". I see Google offers no way to not say "Google".
We have kids too. We use "Computer" which is one of the wake words offered by Amazon because we don't want our kids to get used to the idea of bossing around a person.
Also, I've mapped all of the functions we use to "recipes" that include the word "please" so that they don't work unless you use the word "please." i.e. the upstairs lights are named as for example "the Chicago lights" and there is a recipe that says "turn on the Chicago lights" when someone utters "Computer, please turn on the upstairs lights." So if you say "Computer, turn on the upstairs lights" it says "I don't know what upstairs lights are."
The disadvantage of this approach is that you need to get the string exactly as stored.
Most likely, everyone you know, especially outside of tech, when referencing a search engine, says to "google it".
There are, unfortunately, millions of ways we form relationships with brands. A brand's reason for existing is to live somewhere in our psyche either subconsciously or consciously. We say "Q-tip" instead of cotton swab, "band-aid" instead of bandage, "Advil" when we need ibuprofen, or "Tylenol" when we want acetaminophen. When we see white polar bears around Christmas time, we think of Coca-Cola, and so on and so forth.
All that to say, kudos for you for taking a principled approach, but I am not sure it's going to achieve much.
I think you can trigger Google assistant by using "OK Boo Boo". It works on the Google Home and on my Android phone, and is actually easier for toddlers.
Just wait (a few years, perhaps) until you buy or rent that sweet new self-driving car and try to get out of town for some rest and relaxation... you turned off the radio, but that doesn't matter, because Big G uses the windshield as a billboard looming into your personal space, beaming ads to the most captive audience that exists.
The future is now. Already my local gas station has a screen that starts playing ads the moment you start pumping gas. The mute button on it is broken from being pushed too many times, too hard.
Last year my parents rented a room at a 5 star hotel. The hotel had a smart mirror and I only ever saw that mirror play ads. It's insane.
The insistence on making the spoken interaction feel as "human" and "natural" as possible honestly introduces way more confusion than it needs to and makes the whole thing feel uncomfortable for its parasociability and stiltedness.
In Star Trek they were perfectly comfortable saying "Computer! Do the thing" in a more specific, 'computer' intonation. It was all fairly natural language, but there is no attempt to pretend the computer is a person. This made the thing feel more futuristic than what they're trying to do now.
It's not even that natural; with a new baby in the house I've really grown to dislike Alexa just for how much I have to yell at it. We're not a loud household, but talking to Alexa is like talking to my grandfather without his hearing aids. Everything has to be said at least three times, in increasing volume levels.
I just rewatched all of TNG, and that's actually not the case: there are several instances where a crew member (often Geordi?) would speak to the computer in a more "human" conversational way. I recall one episode in particular where Geordi's trying to set the mood in his quarters for an impending date, and he's very conversationally refining the music choice until he gets what he wants.
Yep, Star Trek computers understand addressing, the conversation is modal: one does not need to begin every sentence with the keyword, a first use of the hotword (or implicitly in some cases, like entering a turbolift) combined with a specific tone makes the computer “open” the conversation. From then on, tone only is sufficient for the computer to know when it is being addressed. With a conversation opened, context is remembered.
I am flabbergasted that the following hasn’t been an option:
- hey Siri
- yes?
- what are the last three releases from <artist>?
- X Y and Z
- search again without EPs
- W X and Z
- Play the first one
- <playing W>
- thank you Siri
<conversation closed>
Also, with attention tracking that -already- exists with the FaceID array, the phone can know when it is addressed and when it’s not. You know, just like when you’re talking to someone, you usually look at them...
As someone who recently acquire a Clapper... I had forgotten just how finicky it could be. Can't clap too quietly, too loudly, too slowly, or too quickly. Often takes me three or four tries! Not exactly an enjoyable experience for the person sitting next to you...
I vaguely remember a sitcom where someone had to watch a important event with friends on TV and someone else installed a clapper on said TV. Everything went well until the applause kicked in.
Yep. "That sounds like a clap from the next room" versus "that sounds like a quiet clap in this room" is not an impossible distinction to make (humans could do it most of the time), but not trivial either.
It's experiences like this that make me question why anyone would spend this much effort and money setting up a system that isn't as good as what it replaces.
I have a convenient, quiet, perfectly reliable single-bit computer on the wall in every room in my house. It costs $1, can be operated by a 2 year old, is perfectly intuitive, and the only downside I can think of is that occasionally it necessitates I be very slightly less lazy than I might otherwise be able to aspire to, i.e. when I need to go downstairs and turn off a forgotten light.
Considering my house is all using LED lights on hydro power, it's probably better for me to just leave a 5w light on all night than it is to install a Google Home setup here, in terms of my carbon footprint.
That is the key point, for me anyway. "Computer, all lights off" instead of getting out of bed when I've left something on is one of my lazy wins from voice control. Setting multiple timers without touching things with messy hands while faffing in the kitchen is another.
-- is Google misinterpreting your voice? E.g. does it hear a sound it thinks is "play" in the middle of your phrase?
-- or is it some weird statistical model that because of invisible and irrelevant correlations, sometimes concludes it's more likely you're asking for music? Like the song with that title is currently in the top 40, or was played by you in the past, or something?
-- is it because you don't speak with a 20-30 year old white male techbro bay area accented voice, so Google never even bothered testing whether it'd work for you?
(My brother in law is Australian living in the US, and has to use his "sarcastically fake American accent" to be understood on the phone. He, more politely than I, calls it his "phone voice". It's the same voice all my friends here use when parodying American stereotypes... I bet it works on Ok Google too.)
You really don't think Google doesn't test in other huge markets? A study a couple years ago found that Google did awfully well with English speakers from the US, India, and China. [1]
From a quick search, it looks like national accents are less of a problem, and the real difficulties like in dialects and foreign accents (e.g. a Spanish person speaking English).
But the idea that the training corpus used for speech recognition models is all "20-30 year old white male techbro bay area" guys is ludicrous. Where are you even getting this? Speech corpuses used for testing are put together by people who take demographic diversity seriously. On the other hand, they are sometimes biased towards a single mainstream national dialect. (E.g. "standard" American English, as opposed to a thick Boston accent.)
I feel like the true spirit of Google is that they test fifty shades of blue, then apply the "quantitatively best" shade of blue in an unrelated product where it makes the text unreadable. Either that or I have no idea what the heck they're testing for.
I strongly suspect that was an example of malicious compliance.
Googler #1: "Marissa want to change the link colors _again!_ What can we do to make her stop micromanaging shit like this?"
Googler #2: "All we need to do is convince her we are obviously wrong about 'graphic design-y things' because we're developers, and that she can prove we are wrong by making a 'data driven' decision. Then we'll say 'The only way to prove we are wrong would be an insane global 0 way multivariate test of all possible colours!!!' and she'll _insist_ on it. Then we build a tool and run the test and point at the results every time she tries to make us change it again. And we can offer to run the tool for _everything else_ she micromanages us about in future too!"
Googler #1: "Brilliant! So, do we build the tool and run the test? Or just make up the results and present them to her?"
For any company Google's size (including Google), it's rarely ever accurate to say "$COMPANY does $X". Maybe some parts of Google quantitatively tests everything, but it is safe to assume that there's huge variation across the company in testing rigor.
> From a quick search, it looks like national accents are less of a problem, and the real difficulties like in dialects and foreign accents (e.g. a Spanish person speaking English).
Can confirm, in high school I had to occasionally go AFK after being beckoned talk to the amazon echo because it wouldn't understand my parents' accents.
A lot of immigrant children get used to translating for their parents, but repeating commands but with a native accent is a newer phenomenon.
Settings > Siri > Language > English (United Kingdom)
not fix it for you? And possibly setting
Settings > Siri > Siri Voice > British
as well?
Apple advertises:
> Since Siri is designed to recognize accents and dialects of the supported countries or regions, Siri’s accuracy rate will be highest for native speakers.
Or is this problem just with American friends' devices?
If that were true then maps will stop working for anyone traveling to another country. I know that's certainly not the case on Android, and I suspect it's not on iOS either.
The new Android 11 power menu screen has been a life changer for me. I now control my lights with my phone 90% of the time since it's literally one power button click away, and I always have my phone on me.
It doesn't work with hue lights, so it's worthless to me. Besides, I just have a on/off widget on my home screen which is faster and more reliable than holding the power button.
Oh, good to know. But then it wants me to create a Hue account and link that with Google... which I don't really want / care for. So I'll be sticking with my widgets.
80% of the time my lights would turn on.
20% of the time, I’d be greeted with: “Ok, playing ‘Turn on the Lights’ by Future on Spotify.”
And I’d stand there in the dark, listening to music I don’t like, questioning my life decisions.