Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now that I see this, here is my wish (I know there are security privacy concerns but let's pretend there are not there for this wish): An app that runs on my desktop and has access to my screen(s) when I work. At any time I can ask it something about what's on the screen, it can jump in and let me know if it thinks I made a mistake (think pair programming) or a suggestion (drafting a document). It can also quickly take over if I ask it too (copilot on demand).

Except for the last point and the desktop version I think it's already done in math demo video.

I guess it will also pretty soon refuse to let me come back inside the spaceship, but until then it'll be a nice ride.



Here you go: UFO - A UI-Focused Agent for Windows OS Interaction

"UFO is a UI-Focused dual-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications."

https://github.com/microsoft/UFO?tab=readme-ov-file


Agreed. I’m excited about reaching a point where the experience is of being in a deep work ‘flow’ with an ultra intelligent colleague, instead of jumping out of context to instant message them.


> with an ultra intelligent colleague

Ultra knowledgeable but pretty stupid actually.


A very eager very well read intern.


So far. :)


If you understand the first thing about LLMs you'll know it'll never be less stupid, just better at hiding its stupidity.


This makes me think, we're seeing all these products inject AI and try to be "smart" on their own, but maybe the experience we really need is a smart OS that can easily orchestrate dumb products.

I know that Siri/Google Assistant/Cortana(?) can already integrate with 3p apps, so maybe something like this but much smarter. e.g. instead of "send the following email" you would tell the assistant "just write the email yourself". At this point your email app doesn't need integrated AI anymore. Just hooks for the assistant.

I imagine once Google puts that kind of brains on Android and Chrome, many product devs will no longer need to use AI directly. Two birds one stone situation, since these devs won't need OpenAI.


1984's telescreens aren't something you're supposed to wish for


One of my favorite tweets:

    Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale

    Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus
https://twitter.com/AlexBlechman/status/1457842724128833538?...


You mean I can't wish for a brave new world?

Joking aside, I agree. It's too bad, though, that we know a thing (this or anything else even technological or not) that could be used for good and improving ourselves will almost always be diverted for something bad...


This basically already exists and the companies that sell this are constantly improving it. For better or worse.


“You haven’t done anything productive for 15 minutes. Are you taking an unauthorized break?”


Something similar already exists, see https://www.rewind.ai/ and https://www.perfectmemory.ai/


They showed just that in the demo with the voice call example. Screen share can be a live video feed


HAL. HAL? HAL!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: