Self Driving Desktop

mcklaw · on May 8, 2019

One of the most unknown tools (>10 years) http://sikulix.com/ It allows play mouse/keyboard event scrips BUT it allows to find components (coords) via screen OCR so you can make your scripts multi resolution/desktop independent. Also, it's Java based so you can play it multi SO.

flarg · on May 8, 2019

This is an excellent tool! But you forgot to mention that the user codes it in python, it comes with a purpose built ide, it recognises both text and images the latter with an approximation capability.

nitrogen · on May 8, 2019

There are a number of other commercial and free desktop automation tools that exist, some of which I've used to automate GUI testing in the past.

https://en.m.wikipedia.org/wiki/Comparison_of_GUI_testing_to...

My favorite on the Windows side was vTask Studio, but it looks like the domain is down and the link was removed from that wiki page.

Thoth0 · on May 8, 2019

You can still grab vTask Studio via the WBM, thanks for that, shall be trying it out;

https://web.archive.org/web/20170927151003/http://www.vtasks...

shoo · on May 8, 2019

Yeah! I've used sikuli to automate some legacy ui-driven application with an embedded scripting engine-- wanted to rig CI to run an automated test suite to test scripts that executed in the application, but there needed to be lots of pointing and clicking to get the app into a state where it was willing to execute scripts. Sikuli was handy! The embedded image recognition is cool and pretty easy to use -- detecting the buttons certainly wasn't the most fragile part of that rube goldberg test automation setup

eastendguy · on May 8, 2019

I don't think Sikuli is unknown at all - I have used it for a long time. But there has not been much progress over the last years, especially the OCR features are lacking. A good alternative to Sikuli is the newer Kantu, which is also much easier to install (just a browser extension + small native EXE).

https://a9t9.com/kantu/x/desktop-automation

ladberg · on May 8, 2019

Sikuli is amazing! I've used it (to great success) for data processing automation and MMO grinding.

majewsky · on May 8, 2019

In 2008, we were at CeBIT showing off the then-brandnew KDE 4 desktop. (The booth was sponsored by a Linux-focused media company.) The biggest attention magnet was a script that we hacked together the evening before, that clicked through the application menu and demoed various desktop features in a loop. For a booth, it's absolutely vital to have something that moves, not just static posters and people standing around waiting.

gwbas1c · on May 8, 2019

What is it? The page just says "Desktop Automation framework" and then lists a bunch of commands and switches.

Perhaps 2-3 paragraphs describing what it does?

zapzupnz · on May 8, 2019

At a glance, macros. Or maybe the "System Events" portion of Applescript, for Linux. Something like that. Indeed, the page would benefit from an explanation and maybe rationale.

mirkonasato · on May 8, 2019

Seems like a small wrapper around PyAutoGUI - that I've used before and is great: https://pyautogui.readthedocs.io/

Macuyiko · on May 8, 2019

Or an alternative to Automagica: https://github.com/OakwoodAI/Automagica

mirkonasato · on May 8, 2019

That one also depends on PyAutoGUI https://github.com/OakwoodAI/Automagica/blob/master/setup.py...

michaelmrose · on May 8, 2019

What's different about this compared to a shell script that invokes xdotool save for being much more verbose.

reilly3000 · on May 8, 2019

I wish this had a ‘Record’ feature. That kind of logging could be incredibly useful. I use tools like Katalon on the web and they are great for making a first pass at test development. It doesn’t need to be entirely visual but if it can capture the flow visually it can be refactored in code and be much more accessible and usable.

verdverm · on May 8, 2019

I use OBS for recording and Flowblade for editing. Got sick of editing my mistakes out, so then this repo came to be. Planning to add some playlists to start that up, set file names, begin/end recording.

self-driving-desktop will be part of a demo automation framework that is in the progress.

verdverm · on May 8, 2019

I did have a recording function around, to track mouse movement. The issue is that the mouse movement gets verbose, and you would have to clean that up somehow.

hateful · on May 8, 2019

Sounds like a candidate for machine learning - and an excuse to learn it.

semi-extrinsic · on May 8, 2019

I was going to say "sounds like a candidate for a Kalman filter".

verdverm · on May 8, 2019

nah, just xlib events being printed to tty

reilly3000 · on May 8, 2019

if you could access window dimensions and if a mouse click yielded an action, you could probably back out that click’s coordinates to that button and toss the rest of the cursor data.

thepete2 · on May 8, 2019

There is also xnee (Xnee is Not an Event Emulator).

https://xnee.wordpress.com/

Worked well last I tried it.

flukus · on May 8, 2019

> mv x y s;: move the mose to x,y in s seconds

The problem with tools like this is that they create an API that the developers don't know about and have no intention of supporting. I broke one recently by having the app maximize on startup, but everything from adding UI elements, rearranging them or timing differences can introduce breakages.

Considering it's scripting anyway, an actual API would be easier.

laythea · on May 8, 2019

It would have been cool to have screenshots on the front page. It gives so much more sense as to what the thing on github actually is, because I didn't understand it (without further time) from just the github.

Adamantcheese · on May 8, 2019

So it's basically AutoHotKey?

keerthiko · on May 8, 2019

I think I have been looking for a framework this simple and straightforward for about...12 years now? Ever since I got my own personal computer as a college student, pretty much.

I can't wait to completely go off the wrong quadrant of this chart with it.

https://xkcd.com/1205/

albertshin · on May 8, 2019

re: xkcd, sometimes, it's not just about the time in minutes you save in aggregate. I often find routines especially helpful during flow states -- maximizing time for more creative work.

There's also just something satisfying about using something like Alfred to launch a complex sequence of things that would have taken many mouse clicks and hand movement. Or using keyboard shortcuts to resize and move multiple windows around monitors. It feels almost... powerful? Not sure why.

marcosdumay · on May 8, 2019

It mostly do not matter. The main goal for automating something is rarely to save time nowadays (the low hanging fruit are much rarer). It is to document procedures, prevent defects, or to test before running.

verdverm · on May 8, 2019

I love this xkcd but it's hard to see the compounding or exponential savings the arises

imjustsaying · on May 8, 2019

Is it normal for devs to be able to read and understand github reps without any explanations, introductions or context beyond the title? I remember much more of this in github's early days and always wondered if this doesn't faze the talented devs reading it.

lejar · on May 8, 2019

I think it would be fair to say that you shouldn't expect anyone to be able to understand a bare repo with just a glance, but if you're well versed with the technologies that the repo uses and you know of similar products, then I think you can guess it.

Here's how my thought process went on this one:

# I open the repo on github and look at the readme

1. Okay it's doing something automatic

2. It uses python

3. Okay there's this playlist thing which has a bunch of commands in it. Looks like of like an autohotkey script.

# I look at the file list

4. Okay I know lark. Looks like the author wrote a domain specific language parser for their input files. They probably get those commands out as a nested list from the parser.

# I look in test.txt

5. Okay that doesn't tell me much new

# I look in main.py

6. Oh there aren't any comments in here...

7. Alright the main function parses the commands from the input file and runs "do" on them.

8. Okay this is just like autohotkey

dwiel · on May 8, 2019

For mac there is also talonvoice.com which allows a lot of similar functionality along with methods for connecting to keyboard shortcuts, voice/dictation control and noise control.

satyanash · on May 8, 2019

Ruby would've suited well for the DSL this project is trying to implement.

Aeolun · on May 8, 2019

I really was hoping for a desktop computer on wheels :(

westmeal · on May 8, 2019

Autohotkey: Xdotool edition

rhizome · on May 8, 2019

Kind of like Kixtart IIRC.

BeatLeJuce · on May 8, 2019

It's "Grammar"

softgrow · on May 8, 2019

The title is a bit misleading leading to disappointment. I was expecting something like a self driving car. You just give the desktop an objective and it figures out how to get there and then gets you there.