Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Since this is a research paper with promising ideas but non-functional code, what are people using as the best-in-class agents for computer automation? For example:

1. Claude for computer use

2. Various startup offerings—if you have recommendations, please list them

3. Established tools like Playwright, Selenium, and WebDriver, combined with screenshots and LLM-based guidance

What tools or approaches are actually working for building useful automation solutions?



Are you sure about the non-working code point?

I've yet to try it but my understanding is the repo here has got working code along with installation instructions:

https://github.com/microsoft/OmniParser


I confirm it works: I got the gradio demo working locally and it's pretty reasonable.

Slight rough edges (to be expected) and you do need to read the README with attention but it's all par for the course. I had to install einops which wasn't in the requirements.txt and even though I had downloaded the HF models they released, it still needed to pull in another model when I first ran the demo.


Thanks for the tip, will try again.


our agent is available via NPM: http://testdriver.ai




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: