Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every website and application is just layers of data. Playwright and similar tools have options for taking Snapshots that contain data like text, forms, buttons, etc that can be interacted with on a site. All the calls a website makes are just APIs. Even a native application is made up of WinForms that can be inspected.


Ah, so now you're turning LLMs into web browsers capable of parsing Javascript to figure out what a human might be looking at, let's see how many levels deep we can go.


Just inspect the memory content of the process. It's all just numbers at the end of the day & algorithms do not have any understanding of what the numbers mean other than generating other numbers in response to the input numbers. For the record I agree w/ OP, screenshots are not a good interface for the same reasons that trains, subways, & dedicates lanes for mass transit are obviously superior to cars & their associated attendant headaches.


Maybe some day, sure. We may eventually live in a utopia where everyone has quick, efficient, accessible mass transit available that allows them to move between any two points on the globe with unfettered grace.

That'd be neat.

But for now: The web exists, and is universal. We have programs that can render websites to an image in memory (solved for ~30 years), and other programs that can parse images of fully-rendered websites (solved for at least a few years), along with bots that can click on links (solved much more recently).

Maybe tomorrow will be different.


Point was process memory is the source of truth, everything else is derived & only throws away information that a neural network can use to make better decisions. Presentation of data is irrelevant to a neural network, it's all just numbers & arithmetic at the end of the day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: