To me this highlights the core failure with all these voice-driven UIs: you have absolutely no idea what is possible and what isn't. Discoverability is zero, which makes big changes like this even more disruptive.
I've already decided that I'm done with Google's smart assistant stuff in any case. I have a Google Home with a screen in my kitchen and the most-used feature (aside from just existing as a photo frame) was an integration with a really useful shopping list app called AnyList. It certainly wasn't complex, we'd say "Hey Google, add <x> to the shopping list" and it would do it. But it was very useful: I'd have something in my hands I just pulled from the fridge (e.g. milk) and be able to add it to the list without interrupting what I'm doing. If it had to wait until I was done for me to pull out my phone I'd inevitably forget.
Then one day Google decided to disable that integration. Now the only shopping list you can add to is one Google provides (which naturally has way fewer features than AnyList). They've never provided even the remotest defense for why they've removed it, it's very obviously to lock us into the Google ecosystem. So our Google Home is now a glorified photo frame that plays music from time to time (and even then prioritizes cover versions and YouTube videos over actual songs, presumably because $$$)
On top of what you mentioned in your first sentence, any ideas to improve discoverability are horrible. I hate it when Alexa does not play my podcast right away but explains to me how I can change episodes. I despise Alexa for telling me that I have some notification before starting my timer. And so on - any voice only attempt to explain me something without me asking for it means adding a lot of friction between me and my goal.
The constant "By the way..." from Alexa drove me nuts.
If you say "Alexa, stop by the way", it'll get the device to stop responding with follow-ups for ~24h. I ended up creating a routine that runs every night at 4am to lower the volume to zero, automatically say "stop by the way" to the device, and then raise the volume a minute later, and Alexa has stopped with the follow ups
I just discovered this today, but if you whisper to your Echo, it will skip all of the "by the way" BS (it will also reply in a whisper, which is kind of creepy, but you win some, you lose some).
Yes... when it does anything other than what I ask, it makes me feel like it's wasting my time.
Alexa recently started responding with "Good afternoon! <the normal response>" and it irks me more than it probably should. I've looked to see if I can turn it off and can't find the option.
As a Brit, when I bark "Alexa, set a timer for 5 minutes" and Alexa prefixes her reply with "Good evening <pause>", TO ME that sounds like a bitter correction-by-example to my abruptness. It's the sarcastic opposite of a pleasantry.
Can you instruct it to "stop the greetings - go straight to the response"? Doesn't Alexa (yet) have some LLM behind it that will create a profile just for you?
Spot on! Lot of voice assistants have been following "if we could" line instead of "if we should line". For many straightforward applications, clicking through well defined interface can be the least error prone way to get the job done.
Seriously though. The development of every company to just trash faq's or an "assistant" or third party forums is awful. Trying to get help with technology is almost always infuriating now.
I recently had to contact windows help, and they were useless. The windows forum points to a third party software that I am convinced is malware.
And while I'm ranting, why can't the download folders in windows be set to group by "none" and sort by "name" by default? Why are both "most recently edited" by default? AND WHY CAN'T I CHANGE IT MICROSOFT?
Maybe WinSetView [0] and the options [1] it provides might be of use to you? It was able to resolve the issues I was facing so hopefully it can work for you as well.
I'm not dying for them to record more than absolutely necessary, but I really hope all the voice ui engineers do log curses to know when their app is doing annoying stuff. The number of times these things screw up I definitely get really spicy and hope that feedback makes it back.
Luckily the Google Assistant is basically worthless.
Setting a timer and have it remind you that you could have set an alarm instead is annoying, but there have been multiple examples where we set a timer and the Google home mini seems to forget it.
> okay Google, set a timer for 1 hour
>> Certainly, and did you know that I can also look up Wikipedia entries for you.
>> Timer for 1 hour has been set and is starting now
> okay Google, how much longer left on timer
>> You don't appear to have any timers set at this time.
I'm pretty sure that our robot overlords will either forget, get distracted with generating mis-information, or become artists painting portraits of furries.
I remember (maybe its still like this!) there used to be a page you could go to and see all your alexa voice clips since forever. It's amazing amazon has gone so long not caring about privacy without backlash.
I'm an Android user and have never owned an iPhone, but that sounds to me like you're talking about a push notification setting that is completely separate from an Alexa response setting. I could be wrong, but my guess is that that's why you're getting downvoted.
Thanks for posting that possible misinterpretation :)
Everything in my post was Alexa-specific. Someone who has the stated problem can follow that precise and non-obvious menu tree within the Alexa app, and it will solve their problem. It's a device-specific setting, so can be done on the kitchen Alexa without affecting other units.
Both "notification" [1] and "Do Not Disturb" [2] in my post were Alexa-specific terms that are unrelated to their mobile phone brethren.
The kids used to love our Google mini speaker. There were voice activated games, normal stories, choose your own adventure style stories and a fun little “animal of the day” feature.
Every single one of those things have been deprecated and removed for… reasons?
All we can do with the speaker now is ask it to play music, get the weather, or set a timer.
It's not just voice, but conversational interfaces in general that have this discoverability problem. That means chatGPT. I'm glad people are discovering it, if you excuse the pun. CUIs are not a panacea, not a replacement for conventional UIs.
I think there's a line worth drawing here: Pre-LLM voice interfaces required you to guess the command(s) the designer of the thinking were having in mind for the action you want to perform. With LLMs you can be 10ft into human-level vaguery and metaphorism and your intent might still survive.
So the difference wrt discovery is that you only have to gesture at what you wanna do and, if a matching action exists, there is a chance it will be understood.
I'd wager we'll see a renaissance of voice assistants with LLMs, especially once the good-enough ones can run on device.
When you put it that way I am reminded of my youthful efforts at solving puzzles in Sierra Online text adventures by guessing the right prompt. I guess I got early training for successfully interacting with LLMs :)
There are obviously cases where CUIs work well: while driving if you don’t have a physical button to do the thing you want, the CUI is better than a screen UI, and if it’s a complex thing, you would never have a physical button for it anyways. In the home when you want to control devices in the morning “open all the shades” or “open the north shades”, unlock the door and start the car, etc…
Not really conversations though, more like accessible commands. I don’t think we get conversations until the tech improves and the latency goes way down, meaning on device processing of speech at the very least.
I’ve had exceptionally bad experiences with car CUIs - all the problems of trying to figure out exactly how to tell Alexa what to do (and then figure out what Alexa actually _did_) while going 70mph in traffic. Just give me physical buttons.
it reminds me of all the old text games like Zorg or whatever where they gave you a cursor leaving you to figure out what you could ask/say/do. which is fine for a game. it sux for a "tool"
Funny fact is that this is core use case I use siri to add things to the reminders app which I have a smart list called "shopping list." It's amazing how great this is, and how many times I use this.
This and music control are the uniques cases where Siri Just Works for me. “Add toothpaste to my shopping list” no fluff no nonsense replies, just “I added that to the shopping list” - and it’s on my phone’s Reminders in that list.
I wonder how well they’ve optimised Siri for a just few specific use cases like this, because nothing else seems to reliably work on it.
Today for the first time I asked Siri for the weather forecast for next week and it actually replied in a sensible way.
Until now I have only been saying simple commands like "countdown n minutes" or "call my favorite wife" (I only have one but thankfully she is also my favorite.)
I think this kind of highlights the ios vs android user mentality. afavour is annoyed that google is trying to lock them into only using google's shopping list while you love when apple "just works" within their ecosystem.
As an iOS user and dev, I’ve been reading these comments confusedly.
Apple has exposed Siri to us as developers in several ways, one of which is called “Siri intents.”
The long and short of it is that app developers can hook into the Siri system so users can use Siri commands to control certain parts of third-party apps.
This means that users aren’t locked into using, say, Apple’s own Reminders app, and they can tell Siri to add things to third-party to-do list apps.
Users also have the option to create their own workflow via Shortcuts and give it a command phrase that Siri then responds to.
I assumed something similar was available on Android, but the responses indicate that this might not be the case.
I second this. My two other derived commands I use every day are “remind in 2 hours to do X” or “remind me Friday at 7 to take Y”. This and “timer X minutes”.
I would be very sad if these were to disappear or to stop working as well as it does now (I like that if I say “remind me tomorrow” when it’s right past midnight it asks for confirmation that I actually mean the same day since it’s “already tomorrow”)
My Google Home - Lenovo Smart Clock just says "I'll remind you today at $TIME".
The device is out of support, it has its glitches (e.g. it's still able to play and pause podcasts, but after pausing it would say "Sorry. something's gone wrong", if asking it to snooze an alarm it would say "OK, alright" and then "snoozing for 10 minutes"), looking forward (not!) to Google making a breaking change and it being e-waste soon.
Hah, in the imaginary future where we have cyborg assistants instead of climate destruction, my robot butler will pour half a cup of tea, go back to the kitchen, and then return to pour the second half of the cup, and I'll tell my guest, "Yeah, it's a 5 year old model, the startup that made him went bust so I'm using a firmware from a Ukranian forum... Джевс, можна мені трохи цукру?"
I still have a Google Home in the kitchen, but what I really want it to display is data from my weather station, transit info, the time, etc. I can't control that damn thing at all. I've got a TidByt coming today, which I have high hopes for!
I haven't used it myself, but Home Assistant can cast to a Google Home Hub, so afaiu, you can set up a whole dashboard of widgets and controls and use it from a stock Home Hub.
Nice! What I really want, though, is a local API! I can use a starlark script to grab entity states from my local Home Assistant super easy... but then I have to cUrl them up to a Tidbyt server somewhere, only to have it sent right back into my house. Seems kinda silly, and will break anytime my internet goes out.
Are you referring to casting a URL to open a browser on the home hub? I tried that in the past but it was kinda janky. It's definitely possible with something like https://demille.github.io/url-cast-receiver/ though
This problem can be solved pretty soon though; given a good enough LLM, and perhaps for power users a configurable set of integrations, you should be able to make any language request, translate it into verbs, and ask for an explanation if you get confusing results. With in-context learning perhaps you can even get to the holy grail of “when I ask for X, please always apply interpretation Y” for personalization.
I think the broader context here is that Google is downsizing the current Assistant team in preparation for an LLM-based replacement, perhaps once Gemini has rolled out.
Yeah, beware “no true Scotsman” here. But I think it’s a reasonable hypothesis that GPT-4 is “good enough” or close to it, and when the current assistant services get wired up to LLMs we’ll see a step function in their utility. Apple and Google are both definitely working on this. OpenAI’s voice mode is closer capability-wise, but doesn’t have the integrations; that would be an obvious product for them to do this year.
If this doesn’t seem obvious in 1yr I’d say the hypothesis is likely wrong.
Sounds to me like a silver bullet idea. "LLMs will make it good".
I don't think voice is a good interface. "It chats like a human" is the lowest possible hanging fruit in terms of product design, and bets everything on the smarts of the tech that's behind it.
We are so used to tooling faster than voice. Keyboards and taps are very, very fast. I want digital assistants as smart AND as fast as that, not something smart but incredibly slow to interact with because it needs to dumb itself down to human speech I/O.
To me, this is also not about modality or making it more generic. I just don't want an anthropomorphized smart-ass assistant. I want smart tools that actually assist me directly, no chat.
Agreed. I find it utterly exhausting to talk to most humans, at least when my goal is purely transactional or functional. Human conversation takes work. Let's save it for what matters.
People hate call centers. I don't want to have a conversation with some human or human-parity AI at my airline, I want to change my flight in 3 clicks. In retail and fast food, self-checkout and online or kiosk ordering is popular for a reason.
I've found myself feeling those same feelings when being forced to have conversations with a chatbot. Chatting with ChatGPT just for fun can be fun, but it can be just as painful as a call center if you need to get something specific done. In order to cancel a hotel reservation, I was having to chat with some bot. It made it into a whole conversation, with brief pauses. It should have been 3 clicks.
I don’t think I’m naively expecting LLMs to magically fix things. The reason voice is currently a bad interface is that the agents are dumb, mapping verbs to hand-coded action trees. The missing pieces are first intelligence and second long-term adaptability/personalization.
The removal of the AnyList integration drove us to get an Alexa assistant device in the kitchen. It still works there
Never thought amazon would have the better assistant ecosystem. Google already has my email and calendar and other stuff, Amazon I have to auth into them like a third party. But it's become true
I've been meaning to get around to trying an Apple HomePod Mini, the AnyList integration on my iPhone works okay (thought as an infuriating two-step process, "Hey Siri, add to shopping list" ... "milk") so hopefully it would be replicable on the pod, too. Seems like the least worst option.
I've been playing with homeassistant voice integration and it's pretty darn good
There are some gpt and llama experiments you can run there, but it's finally getting to the point where a local assistant you host yourself might be viable soon
Getting audio into it and having a little speaker so it can talk back is both the easiest and the hardest part. I need to play around more with some esp32s and microphone modules
Quite the opposite. I bike the kids to school every morning. The weather is too important to be summed up by a voice summary, I need the hour by hour details.
> I'd have something in my hands I just pulled from the fridge (e.g. milk) and be able to add it to the list without interrupting what I'm doing.
Non user-hostile technology would happily solve this problem: you pull from the fridge what is about to finish or before throwing the empty container into the trash, put it in front of a mini camera mounted on the fridge, image (possibly also barcode) recognition would identify the product, then you say a magic word and it would be added it to your list directly on the phone, then software would calculate the best path to the shops having in stock the products in the list.
Sadly, this would be hijacked by businesses in no time by making it dependent from some cloud services then selling your data to other advertisers or by pestering you with offers of similar competing products.
As of today, I still prefer a pen and a piece of paper over any type of automation.
Who in their right mind would bring this kind of intrusive technology into their home, right into their most intimate space? I seriously don't get it. I mean what's wrong with keeping a pen & paper shopping list on your fridge?
I’m not denying that personal conversations happen in the kitchen. But the most intimate space? That’s pure hyperbole.
Not to mention I think the paranoia about these home assistant devices is overblown. Do you use a smartphone? In your intimate kitchen?? It has just as much capability to listen to you.
You’re making an argument of false equivalency. Anyone who has used Siri can tell you that its listening capabilities are vastly overstated. There is a reason they make smart speakers and don’t just leverage your phone, despite the fact that almost everyone with a smart speaker has both.
We have a saying that loosely translates to “If the barn burns down, so does the house”. The meaning is that it’s not really wise to let the house burn down if the barn is burning.
Anything inside my house is the intimate space. I block the “smart” devices on the router firewall level so that they cannot call home - and here in this thread dozens of people use those spying devices that listens to you and to your family members and send this information to who knows where - this is just mind boggling to me. To each their own for sure, but using “smart home” devices made by corporations is absolute insanity in my head.
Yikes. About the discoverability issue. I can add a little story.
I can not for the love of my life get Siri to resume play in my native tongue. It is supported by Siri, I just have no clue on earth on what phrases she accepts. Tried many, started feeling dumb. Turned off Siri.
I think it's hilarious that you can't reliably ask Assistant "what is this song?" any more but they've provided a dedicated button for what should be an easy query to process.
In other words, they have just closed down that entire ecosystem. I don't think there's an antitrust angle here: Forcing e.g. Apple to allow third-party apps or app stores seems within reach (at least in the EU); this would be more like forcing Apple to enable apps in a world where they don't even exist (anymore).
"If there is no banner, your Actions project is not a Conversational Action and won’t be affected."
Here, "conversational" means your voice command should prompt the app to ask you a question, driving a state machine. (Set a reminder. When do you want to be reminded?)
I haven't tried to use it, but the docs indicate CRUD-over-voice ("built-in intents") is unaffected.
I had the same issue with the Google Assistant/AnyList integration, and I hacked together a pair of Lambda functions to move Keep list items to an AnyList list.
Both unofficial APIs ;)
A big portion of the limited discovery of available commands is deliberate so as not to limit what you might ask it to do, which can be used as input for future capabilities.
> "Hey Google, add <x> to the shopping list"... was very useful... Then one day Google decided to disable that integration. Now the only shopping list you can add to is one Google provides
If I recall correctly, "add to my list" originally went to shoppinglist.google.com. They rolled it into Keep[1] ("note to self," tags, sharing).
It works well. However...
In 2016, I figured "tell me showtimes for $MOVIE in $CITY" should work. It still doesn't work. It may never work, because "robot butler with ads" is no one's idea of the future. (Maybe Jeff.)
A fire-and-forget voice action like "add to list" should still work(?). For conversations[2], we can pray Gemini is less infuriating.
Did you actually try your cinema prompt recently? I just asked my pixel phone "tell me showtimes for aquaman in Berlin" and it immediately opened an overview of cinemas with showtimes for the Aquaman movie today with options to buy the ticket.
Same for the weather prompt someone else posted above and contrary to their experience it showed me exactly what I expected.
I'm a bit confused if those examples are something that didn't work in the past or if it's something that somehow doesn't work properly in the US right now.
VIM is well designed and stable, so you can learn the interface and be confident that it will work for you next time you use it. It's an amazing tool.
Today's voice assistants are the opposite. They are unreliable and completely unstable. They don't have a clear list of commands they understand, let alone some sort of menu system - the documented commands on the manufacturer's websites often don't work. They also randomly change what they can do: stuff stops working for no obviously reason.
For example - yesterday, my son's Nest Audio suddenly refused to set a music alarm, claiming "this device doesn't support that feature yet" (it's literally a speaker for music...). It worked the day before.
Vim has an actual manual and things stay relatively stable across releases. It's not the case for the voice assistants, where stuff breaks randomly and features are added/removed without notification.
I've already decided that I'm done with Google's smart assistant stuff in any case. I have a Google Home with a screen in my kitchen and the most-used feature (aside from just existing as a photo frame) was an integration with a really useful shopping list app called AnyList. It certainly wasn't complex, we'd say "Hey Google, add <x> to the shopping list" and it would do it. But it was very useful: I'd have something in my hands I just pulled from the fridge (e.g. milk) and be able to add it to the list without interrupting what I'm doing. If it had to wait until I was done for me to pull out my phone I'd inevitably forget.
Then one day Google decided to disable that integration. Now the only shopping list you can add to is one Google provides (which naturally has way fewer features than AnyList). They've never provided even the remotest defense for why they've removed it, it's very obviously to lock us into the Google ecosystem. So our Google Home is now a glorified photo frame that plays music from time to time (and even then prioritizes cover versions and YouTube videos over actual songs, presumably because $$$)