I love this idea, and admire the work you put into it. I'm a fan of long reads a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		noduerme on Sept 16, 2021 \| parent \| context \| favorite \| on: A search engine that favors text-heavy sites and p... I love this idea, and admire the work you put into it. I'm a fan of long reads and historical non-fiction, and Google's results are truly garbage. I have a criticism that I think may pertain to the ranking methodology. I searched for "discovery of Australia". Among the top results were: * A site claiming that the biblical flood was caused by Earth colliding with a comet (with several other pages from that site also making the top search results with other wild claims, e.g. that the Egyptians discovered Arizona); * Another site claiming the first inhabitants of Australia were a lost tribe of Israel; * A third site claiming that Australia was discovered and founded by members of a secret society of Rosicrucians who had infiltrated the Dutch East India Company and planned to build an Australian utopia... These were all pages heavy with HTML4 tags and virtually devoid of Javascript, the kinds of pages you'd frequently see in the late 1990s from people who had built their own static websites in a text editor, or exported HTML from MS Word. At that time, there were millions of those sites with people paying for their own unique domain names, and so the proportion of them that were home to wild-eyed conspiracy theories was relatively small. What I think has happened is that kooks continued to keep these sites up - to the point where it's almost a visual trope now to see a red <h1> tag in Times New Roman and think, uh oh, I've stumbled on an "ancient aliens" site. Whereas scholars and journals offering higher quality information have moved to more modern platforms that rely more heavily on modern browsers - with or without their own domain names. So as a result what seemed to surface here were the fragments of the old web that remain live - possibly because people living in cabins in Montana forget to cancel their web hosting, or because the nature of old-school conspiracy theorists is to just keep packing their old sites with walls of text surrounded by <p> tags. Arguably, this seems to rank the way Google's engine used to, since it couldn't run JS and they wanted to punish sites that used code to change markup at render time. At least, when I used to have to do onsite SEO work, it was always about simple tag hierarchies. I wonder whether there isn't some better metric of validity and information quality than what markup is used. Some of the sites that surfaced further down could be considered interesting and valuable resources. I think not punishing simple wall-of-text content is a good thing. But to punish more complicated layouts may have the perverse effect of downranking higher-quality sources of information - i.e. people and organizations who can afford to build a decent website, or who care to migrate to a modern blogging platform.

_dain_ on Sept 17, 2021 | [–]

those three pages sound pretty interesting, I don't see this as a problem

GaylordTuring on Sept 18, 2021 | [–]

I don’t want my search engine to somehow try to judge the believability of the results. I’d like to be the judge of that myself.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact