Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impressive stuff! Has anyone tried it for computer/browser control? How does it fare with graphs and charts?


The 'point' skill is trained on a ton of UI data; we've heard of a lot of people using it in combination with a bigger driver model for UI automation. We are also planning on post-training it to work end-to-end for this in an agentic setting before the final release -- this was one of the main reasons we increased the model's context length.

Re: chart understanding, there are a lot of different types of charts out there but it does fairly well! We posted benchmarks for ChartQA in the blog but it's on par with GPT5* and slightly better than Gemini 2.5 Flash.

* To be fair to GPT5, it's going to work well on many more types of charts/graphs than Moondream. To be fair to Moondream, GPT5 isn't really well suited to deploy in a lot of vision AI applications due to cost/latency.


Im labeling a dataset with it. We’ll see how it turns out


Pretty good so far. Have 100,000 detections




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: