Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh interesting. I never would have thought AI would be used for this. Does it also find things like the meta "revised" tag or anything like that? Doing some Googling it seems like officially it should be "revision", but seems like it's very common to use "revised"


I tried using https://www.npmjs.com/package/metascraper before which I believe does check this meta tag.

But a few websites set their updated date to the current date which was annoying, maybe to rank better in Google? And some people (including me) only mention the update time in the page text content.

I've used GPT to parse human formatted dates in another project too, it's quite reliable if you validate the output timestamp. And relatively cheap too if you only pass in the first part of the page text.


I can see how it's a tricky problem. I wish html had more structure here (and people followed the structure, a whole other problem...). FWIW, my page has a "last updated" date on its now page but comes up as 1969 in aboutideasnow.

Oh, now aboutideasnow shows no date at all.


The vast majority of indexed websites don't have a date unfortunately :(

What's your website so I can take a look at the parsing?


https://mattgreer.dev/now

It says last updated today because I really did update it today :)

anyway, cool project!


Ahh, I believe we're not trusting the current date at the moment because of spam potential: https://github.com/lindylearn/aboutideasnow/blob/main/apps/a...

Which seems bad if you update your site and submit it afterwards. I'll remove this check for now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: