Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Self Driving Desktop (github.com/hofstadter-io)
103 points by verdverm on May 8, 2019 | hide | past | favorite | 38 comments


One of the most unknown tools (>10 years) http://sikulix.com/ It allows play mouse/keyboard event scrips BUT it allows to find components (coords) via screen OCR so you can make your scripts multi resolution/desktop independent. Also, it's Java based so you can play it multi SO.


This is an excellent tool! But you forgot to mention that the user codes it in python, it comes with a purpose built ide, it recognises both text and images the latter with an approximation capability.


There are a number of other commercial and free desktop automation tools that exist, some of which I've used to automate GUI testing in the past.

https://en.m.wikipedia.org/wiki/Comparison_of_GUI_testing_to...

My favorite on the Windows side was vTask Studio, but it looks like the domain is down and the link was removed from that wiki page.


You can still grab vTask Studio via the WBM, thanks for that, shall be trying it out;

https://web.archive.org/web/20170927151003/http://www.vtasks...


Yeah! I've used sikuli to automate some legacy ui-driven application with an embedded scripting engine-- wanted to rig CI to run an automated test suite to test scripts that executed in the application, but there needed to be lots of pointing and clicking to get the app into a state where it was willing to execute scripts. Sikuli was handy! The embedded image recognition is cool and pretty easy to use -- detecting the buttons certainly wasn't the most fragile part of that rube goldberg test automation setup


I don't think Sikuli is unknown at all - I have used it for a long time. But there has not been much progress over the last years, especially the OCR features are lacking. A good alternative to Sikuli is the newer Kantu, which is also much easier to install (just a browser extension + small native EXE).

https://a9t9.com/kantu/x/desktop-automation


Sikuli is amazing! I've used it (to great success) for data processing automation and MMO grinding.


In 2008, we were at CeBIT showing off the then-brandnew KDE 4 desktop. (The booth was sponsored by a Linux-focused media company.) The biggest attention magnet was a script that we hacked together the evening before, that clicked through the application menu and demoed various desktop features in a loop. For a booth, it's absolutely vital to have something that moves, not just static posters and people standing around waiting.


What is it? The page just says "Desktop Automation framework" and then lists a bunch of commands and switches.

Perhaps 2-3 paragraphs describing what it does?


At a glance, macros. Or maybe the "System Events" portion of Applescript, for Linux. Something like that. Indeed, the page would benefit from an explanation and maybe rationale.


Seems like a small wrapper around PyAutoGUI - that I've used before and is great: https://pyautogui.readthedocs.io/


Or an alternative to Automagica: https://github.com/OakwoodAI/Automagica



What's different about this compared to a shell script that invokes xdotool save for being much more verbose.


I wish this had a ‘Record’ feature. That kind of logging could be incredibly useful. I use tools like Katalon on the web and they are great for making a first pass at test development. It doesn’t need to be entirely visual but if it can capture the flow visually it can be refactored in code and be much more accessible and usable.


I use OBS for recording and Flowblade for editing. Got sick of editing my mistakes out, so then this repo came to be. Planning to add some playlists to start that up, set file names, begin/end recording.

self-driving-desktop will be part of a demo automation framework that is in the progress.


I did have a recording function around, to track mouse movement. The issue is that the mouse movement gets verbose, and you would have to clean that up somehow.


Sounds like a candidate for machine learning - and an excuse to learn it.


I was going to say "sounds like a candidate for a Kalman filter".


nah, just xlib events being printed to tty


if you could access window dimensions and if a mouse click yielded an action, you could probably back out that click’s coordinates to that button and toss the rest of the cursor data.


There is also xnee (Xnee is Not an Event Emulator).

https://xnee.wordpress.com/

Worked well last I tried it.


> mv x y s;: move the mose to x,y in s seconds

The problem with tools like this is that they create an API that the developers don't know about and have no intention of supporting. I broke one recently by having the app maximize on startup, but everything from adding UI elements, rearranging them or timing differences can introduce breakages.

Considering it's scripting anyway, an actual API would be easier.


It would have been cool to have screenshots on the front page. It gives so much more sense as to what the thing on github actually is, because I didn't understand it (without further time) from just the github.


So it's basically AutoHotKey?


I think I have been looking for a framework this simple and straightforward for about...12 years now? Ever since I got my own personal computer as a college student, pretty much.

I can't wait to completely go off the wrong quadrant of this chart with it.

https://xkcd.com/1205/


re: xkcd, sometimes, it's not just about the time in minutes you save in aggregate. I often find routines especially helpful during flow states -- maximizing time for more creative work.

There's also just something satisfying about using something like Alfred to launch a complex sequence of things that would have taken many mouse clicks and hand movement. Or using keyboard shortcuts to resize and move multiple windows around monitors. It feels almost... powerful? Not sure why.


It mostly do not matter. The main goal for automating something is rarely to save time nowadays (the low hanging fruit are much rarer). It is to document procedures, prevent defects, or to test before running.


I love this xkcd but it's hard to see the compounding or exponential savings the arises


Is it normal for devs to be able to read and understand github reps without any explanations, introductions or context beyond the title? I remember much more of this in github's early days and always wondered if this doesn't faze the talented devs reading it.


I think it would be fair to say that you shouldn't expect anyone to be able to understand a bare repo with just a glance, but if you're well versed with the technologies that the repo uses and you know of similar products, then I think you can guess it.

Here's how my thought process went on this one:

# I open the repo on github and look at the readme

1. Okay it's doing something automatic

2. It uses python

3. Okay there's this playlist thing which has a bunch of commands in it. Looks like of like an autohotkey script.

# I look at the file list

4. Okay I know lark. Looks like the author wrote a domain specific language parser for their input files. They probably get those commands out as a nested list from the parser.

# I look in test.txt

5. Okay that doesn't tell me much new

# I look in main.py

6. Oh there aren't any comments in here...

7. Alright the main function parses the commands from the input file and runs "do" on them.

8. Okay this is just like autohotkey


For mac there is also talonvoice.com which allows a lot of similar functionality along with methods for connecting to keyboard shortcuts, voice/dictation control and noise control.


Ruby would've suited well for the DSL this project is trying to implement.


I really was hoping for a desktop computer on wheels :(


Autohotkey: Xdotool edition


Kind of like Kixtart IIRC.


It's "Grammar"


The title is a bit misleading leading to disappointment. I was expecting something like a self driving car. You just give the desktop an objective and it figures out how to get there and then gets you there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: