Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Makes sense, thanks. I wonder whether human web-browsing strategies are optimal for use in a LLM, e.g. given how much faster LLMs are at reading the webpages they find, compared to humans? Regardless, it does seem likely that Google’s dataset is good for something.


Take this example:

A human googles "how much does a tire cost?"

They pick out a website from search results, then nav within it to the correct product page and maybe scroll until the price is visible on screen.

Google captures a lot of that data on third party sites. From Perplexity:

Google Analytics: If the website uses Google Analytics, Google can collect data about user behavior on that site, including page views, time on site, and user flow.

Google Ads: Websites using Google Ads may allow Google to track user interactions for ad targeting and conversion tracking.

Other Google Services: Sites implementing services like Google Tag Manager or using embedded YouTube videos may provide additional tracking opportunities

So you can imagine that Google has a kajillion training examples that go: search query (which implies task) -> pick webpage -> actions within webpage -> user stops (success), or user backs off site/tries different query (failure)

You can imagine that even if an AI agent is super efficient, it still needs to learn how to formulate queries, pick out a site to visit, nav through the site, do all that same stuff to perform tasks. Google's dataset is perfect for this, huge, and unparalleled.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: