Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's relatively easy. First you crawl a page every hour. If it changed, you halve the time. If it hasn't, you double it. You set some limits, like once a minute to once a month. You can also adjust the multiplier, and instead of factor of 2 you use something like 1.2. This way you can adjust more precisely to the page's update time.

Also headers and sitemap.XML can tell you how often the pages change.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: