Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Suppose I go off and make a search engine, with my own algorithm for ranking search results. At first it works great. And then five years later, I find that, by a quirk in my algorithm, I find that a new site has risen to the top of all my results. This new site was created as a prank, and it contains authoritative-sounding but incorrect answers to common questions. The results returned are still relevant to my users, but they are now factually incorrect.

Should I simply accept that this site is the correct top result for my users' queries, since it has been ranked by my unbiased algorithm? Or should I decide to put my finger on the scale, and redesign my algorithm (possibly by simply reweighting results from this particular site) to change the rankings?

Does the answer change if the site is not a prank site, but is propoganda? Or an earnest-but-incorrect flat earther?



I think the role of a search engine is to index the page, and to help users find what they're looking for. It's a difficult task that requires somehow quantifying relevance of answers for a given text string.

Whatever quantification is used is an implicit kind of 'censorship', and the 'relevance' of results for a given string is inherently a value-judgement. That's where the value of a search engine comes from. We choose search engines based off how good its value judgements are.

So, I wish DDG released more details here, because this announcement otherwise just sounds like standard practice. Abusing "quirks in the algorithm" is just called SEO, and it's been a cat-and-mouse battle between search-engines and abuse. Without details, it's harder to interpret this move generously.


That sounds about right. A search engine has to try and intuit what users are going to want. Decisions have to be made to show one thing above another. In order for us to trust the engine though, we should have a way of knowing what values are referred to when making these decisions.


I'd say a good search engine should try to deliver the results the user is looking for. Simple as that. It probably can never be perfect, as the search engine can only guess what the user is looking for. But it can try to do its best.

There is no rule that it should rank sites by page rank or number of keywords or whatever.

If I don't want to see "Russian Disinformation", I probably won't search for it to begin with. On the other hand, maybe I want to see what all the fuss is about - then why should the search engine stand in my way?


The problem is that the definition of disinformation isn't really about accuracy - it's mostly "what is (woke) California prepared to accept". For example, most of the coronavirus "disinfo" bans have been handed out on a basis of roughly like "is this useful for the goal of making people adhere to covid policy?" If you say vaccines won't prevent transmission before Cali is okay with it, byebye, even though the statement has always been accurate. It just hasn't been part of blue California's story until lately. Talking about lab leak? It's okay now, but has gotten people banned for the last couple years.

So what will "Russian disinfo" be like? It's hard to even know what the facts are since truth is the first casualty of war.


I think you're framing the discussion in the wrong way.

Yes, if you're looking for "Russian Disinformation", you should be able to find it.

But if I'm looking for "U.S. midterm elections", I shouldn't get Russian disinformation.

What is the problem with modifying the algorithm to favor actual information over disinformation? Or sites that are known to propagate disinformation?

The only thing I could say is that it's weirdly specific to Russian disinformation. Disinformation should be disfavored no matter the country of origin.


Why would they show Russian information about the war when you search for Midterm elections?

The issue is that they are introducing a manual intervention in the algorithm out of political motives.

They are not automatically a great authority on correctness of information regarding the war. I don't think DDG has investigators on the ground in Ukraine.

Of course ultimately all search engines will apply some criteria for correctness or relevance of information. Maybe we should applaud DDG for at least being transparent about it.

So in the end presumably you have to pick the search engine just like you pick your other news sources.

It is just a pity that DDG previously advertised itself as "unbiased", and now they throw that goal out of the window.

Still it feels like the search engine feeding me results that they want me to find, not the ones that I most likely would want to find.


The problem is not "U.S. midterm elections" but "Ukraine invasion 2022".

Part of the problem is that defining "relevance" is hard, and quickly descends into gray areas and morality. Russian disinformation about Ukraine is relevant in that it actually is content about the Ukraine invasion in 2022. But it's not relevant in that it's not facts about the Ukraine invasion in 2022.

Maybe there should be a button that lets users opt-out of "factual only" results to see the full uncensored internet, much like how there is a button that lets users opt-out of porn filters.

For that matter, we should have more filters, but users should be able to see what they are and opt out of them manually. Porn, violence, suspected "disinformation" and conspiracy theories, etc. I would love to have all of these turned off by default, but be able to re-enable them selectively.

Of course, that means you still have the problem where one organization (or a handful of organizations) gets to dictate what is considered "disinformation", but that's arguably a separate problem from not letting users control their experience.


Ultimately isn’t “disinformation” control exactly the problem? There are abundant examples of disinformation just being information that was 24-96 hours ahead of the news cycle (sometimes 6-9 months).

DDG has now decided that they are editors because their perspective is “correct”, but on what authority?


I shouldn’t get American disinformation, but your trusted sources are untouchable.


All general critiques about DDG aside, it is absolutely untrue that search engines never target American news sources for downranking. Google/DDG and other sites have removed and downranked American news sites on multiple occasions. Scroll through this very comment section and you'll find someone complaining about DDG's handling of Drudgereport[0].

It was not that long ago that we were having really fierce debates about when and how aggregators and algorithms should filter/downrank vaccine misinformation, and a ton of that debate revolved around downranking American news sources and commentators.

----

[0]: I do want to note that it isn't immediately clear to me that DDG did downrank Drudgereport, and sometimes people just get kind of conspiratorial about things, but I'm taking commenter at their word since I assume they have some source for that they just didn't mention.


They won’t down-rank American disinformation from sources they trust or sources they favor.

This can all be solved by adding an option for results without down-ranking political disinformation.


> They won’t down-rank American disinformation from sources they trust or sources they favor.

Nobody suppresses a source that they trust. Of course search engines don't downrank sources that they think are trustworthy. Why would they? There's also nothing specific to America about that, DDG also doesn't suppress foreign news sources that it personally trusts as accurate, because... they're sources they trust.

That sentence is a frankly kind of impressive effort to rephrase a critique that is essentially, "I disagree with their decisions, and I think the sources they trust aren't actually trustworthy" as some kind of much broader general criticism, like search engines are bad for not making editorial/ranking decisions that are the opposite of what they think are the correct editorial/ranking decisions to make.


> Why would they?

They should down-rank their trusted sources when their trusted sources publish disinformation. The problem is they don’t.


> They should down-rank their trusted sources when their trusted sources publish disinformation.

If search engines thought that their trusted sources were publishing disinformation, they wouldn't be trusted sources.

You just disagree with their decisions about what is trustworthy, that's all. There's nothing deeper going on, it's not surprising that a search engine trusts a source that it trusts.


> If search engines thought that their trusted sources were publishing disinformation, they wouldn't be trusted sources.

You’re confusing propaganda sources with fallible trusted sources.


No, I'm commenting on the fact that search engines do block propaganda sources in the US when they think they're a significant source of harmful propaganda, but very obviously they don't block propaganda sources that they don't think are propaganda.

Your problem is that these engines disagree with you about what is and isn't a harmful propaganda source. That's a reasonable disagreement to have. But you're trying to phrase this like it's some kind of deliberate action or general policy on their part, and it just doesn't make any sense. They do block American sites when they think those sites are significant sources of misinformation. And for extremely obvious reasons that should not be confusing or surprising to anyone, they don't block sites for violating policies that they don't think the sites have violated, because that would be an absurd system for moderating content.

It's like asking, "why won't the police officers arrest all of the guilty people that they think are innocent?" Because they think they're innocent.


> they don't block propaganda sources that they don't think are propaganda

That’s the point of contention.

> But you're trying to phrase this like it's some kind of deliberate action or general policy on their part, and it just doesn't make any sense.

It doesn’t make sense because I’m saying that.

Users upset with DDG understand that everyone is blinded by bias, so don’t attempt to filter topics that can be affected by it.

The solution is give those users options for unfiltered results like they do with safe search.


> The solution is give those users options for unfiltered results

Important to remember at this point in the conversation, DuckDuckGo didn't say that they were going to filter these results (although I also wouldn't really have a problem with that), they announced that they were going to downrank some of the sites.

Safe search toggles turn off actual content removal, which kind of makes sense -- there's a list of "mature" sites that are included in the list of possible search results or excluded. But ranking is different, turning off a site ranking doesn't make any sense in the context of a search engine. You want a toggle to make results no longer be a list?

Everything on DDG is ranked, everything is. There aren't separate categories of ranked and unranked content, there's no set of websites where DuckDuckGo isn't ranking them alongside other websites. It doesn't make any sense to say that DuckDuckGo shouldn't rank political content or news sites when returning them in searches, I don't know from a UI perspective what that would even look like.

I guess completely randomly sorting the search results for those queries? But... I mean, no one would want that feature, you would never be able to find relevant information for a political query.

Even before DuckDuckGo made this announcement and even before the war in Ukraine started, DuckDuckGo was always ranking these sites. There was never a period of time where these sites weren't being ranked higher or lower on search pages than other sites, and that ranking was always being determined in part by DuckDuckGo's internal bias about how rankings should work and what was and wasn't a "relevant" or accurate news source. From day one, from the start of the search engine, they were always ranking political content.


"If I don't want to see "Russian Disinformation", I probably won't search for it to begin with."

? Nobody is really searching for 'disinformation'.

That's the whole point.

Is 'disinformation' really 'what they were looking for'?

Or were they looking for good information, and the 'misinformation' - which doesn't appear as 'misinformation' comes up first. They click it and become unwittingly 'misinformed'.

It's obviously very nuanced, but there is definitely such a thing as misinformation and especially propaganda.

For example, I see a ton of feeds sharing RUS losses with video snippets etc. but not a lot of UKR losses. Unless it's civilians in which case they want the information out there i.e. 'war crimes'.

Or more nuanced: the words of Putin himself. He 'misinforms' arguably by misrepresenting literally every thing he talks about, and obfuscating other realities. There are 100% pro-Nazi sympathisers in UKR forces. But it's also pretty clear that the government is not a 'Nazi Regime' by any stretch.

Issues such as 'NATO Threat' which is in some ways real, but used as an excuse really, because there is no material threat of the invasion of Russia.

So it's complicated.

There's also the fact that little bits of misinformation can contribute a lot to public opinion. I can just be 'populist' stuff. Like a funny picture oof Biden next to a 'strong and commanding Putin'. Charicatures influence people. That's a bit of a different domain, but also relevant.

When wars break out, we have to be a bit more pragmatic and also vigilant.


In the war situation, I assume everything is potentially misinformation, and want to see statements from all sides.

I mean if you read Western media, they will literally write "Russia is putting out a lot of disinformation", so that would be at least one claim I could try to verify myself, by looking at actual Russian media.

But the more important point is, how does DDG decide what is misinformation and what is not? Sure a search engine always has to try to rank information by some criteria. The issue I have here is with a manual interference in the algorithm that seems politically motivated.

Realistically, I guess we should be thankful that at least DDG points out that they are doing that, whereas others are simply doing it without telling anybody (on all sorts of issues, not just this war).

Also, I think some "internet savvyness" has to be expected from search engine users. The assumption that just because some statements by Putin show up in search results, people would just flat out believe them, is rather insulting and belittling.


1) There is a difference between 'rah rah nationalism' and 'disinformation'. The Western media is not putting out disinformation so much as focusing on the things that benefit them. This is perennial.

2) It's myth that there is some kind of 'neutrality' in term of search - you have to pick an algorithm. It's rational to want to chose sources that have integrity as opposed to those that do not. 'Information Populism' i.e. just picking the 'most popular link' leads to the 'National Equiror' type information.

3) "how does DDG decide what is misinformation and what is not" and "politically motivated"

This moral relativism is the problem with the 'free speech' advocates. There is something as the 'truth' and, some parties are better at communicating it. Some parties actually just make up whatever they want and say it. There are ways to make that determination. Everyone is based, some more than others.

4) " I think some "internet savvyness" has to be expected from search engine users. "

"The assumption that just because some statements by Putin show up in search results, people would just flat out believe them, is rather insulting and belittling."

Both of these statements are essentially wrong.

Almost nobody has the 'internet savviness, time and wherewithal' to actually fact check. Less than 0.1% of people. Most people are not paying close attention to any issues, let alone a specific one, and don't have the wherewithal to do anything about Tweets they say. Moreover, 'most people' are populist, they like to twee jokes, get angry, like things that make fun of the group they don't like.

As for b) 30% of Americans believe that the 'election was stolen' by Joe Biden, when there isn't a shred of evidence to support that lie. And that's a pretty big lie. So imagine that - at least 30% of people will believe you if they just want to believe you. 30% of people believe that Police are evil and just want to arbitrarily arrest and beat black people, and that merely the act of getting pulled over is dangerous, which is also ridiculous. Racism exists, but anti-racism hysteria has given people a context that simply is not true. The evidence actually supports that.

This is why 'Putins words alone' can be dangerous. His current 'propagnda' about 'Denazification' and that the UKR government is a bunch of 'Nazis and Drug Addicts' is perverse, and that the government is fully of Nazis.

But 75% of Russians believe it fully.

How do you think he managed to convince 75% of the Russian population of things that are ridiculously untrue?

Russians are getting calls from relatives in Ukraine, and literally not believing them when they hear of bombings in Kiev, they'd rather believe the propaganda that 1st hand evidence.

That's the power of misinformation and it speaks to the fact that people aren't often even interested in the truth at all, but rather that which makes them feel better.

Finally - if you want access to Putins' words, it's all there. There's nothing hidden. If that's what you are really searching for you will get it. You can easily find on DDG and Google his 'Mein Kampf' style works where he's justifying the grand narratives of the invasion. In that context, anyone searching specifically for that is going to find it in the 'quality' context we'd expect from DDG or Google.

Nothing is being hidden or censored, it's all there if you actively search for it.

What we don't want is Putins' propaganda seeping through the cracks into every day populist rhetoric. We don't want his army of social influences able to make their lies and propaganda party of the daily vocabulary of common information ingestion.


"The Western media is not putting out disinformation so much as focusing on the things that benefit them."

What do you mean - Western media doesn't lie, only shifts focus? Sorry that is just flat out wrong. The last couple of years should have tought you that.

"This moral relativism is the problem with the 'free speech' advocates. There is something as the 'truth' and, some parties are better at communicating it."

Um no sorry, your type of thinking is exactly the problem. There is no neutral way to establish truth, so nobody should be given the authority to say what is true or not.

"Both of these statements are essentially wrong."

People believe stupid shit - the solution is not to establish some "authority" forcing people to believe just some stupid shit.

The Trump fans may seem stupid for believing Biden stole the election, but his opponents were equally stupid for believing in the Russian collusion theory. Yet that was relentlessly pushed by mainstream media.

"This is why 'Putins words alone' can be dangerous."

Yeah because Russians have no other news sources. That is exactly why it is bad to limit access to news sources, as DDG is doing indirectly.

"it speaks to the fact that people aren't often even interested in the truth at all, but rather that which makes them feel better."

So DDG is just trying to make us feel better?

"Nothing is being hidden or censored, it's all there if you actively search for it."

That is such a hypocritical claim. Given all your arguments, you seem to be fully aware that that is not how information flow works.


> contains authoritative-sounding but incorrect answers to common questions. The results returned are still relevant to my users, but they are now factually incorrect.

IMO the difference is between web-wide decisions vs something else. DDG have stated they're specifically targetting something beyond web pages.

If DDG has some way of measuring "misinformation" then fair play, we can assume it isn't specific to Russia because it'd be a universal solution.

Some sort of decree from the CEO about what is true and what is not just sounds dangerous. We might not fully understand automation, as in all search engines, but it's a safe wager that he doesn't understand it any better otherwise he would've coded it in already and it'd be a non-story for the searchers of the world.


So if I could only detect this one prank site, but did not have a way to detect and downrank all prank sites at once, you would say it's invalid for me to just take action against the one I know about, since that's not a universal solution?


define prank.

dictionary definition (according to Google)

>a practical joke or mischievous act.

Maybe if that's the definition then all Twitter parody accounts need to be removed. Would you agree?


Well that depends. If my position were that you have to act via "web-wide decisions" and "universal solutions", then I might conclude that my only options were to leave up all prank sites, including those damaging the usefulness of my search results, or remove everything including harmless parody accounts.

On the other hand, if I were to accept that it's valid to target individual sites or subsets of sites only as they become a problem, then I might conclude that removing this one troublesome prank site that is pointing my users to bad information does not require me to take action against random twitter parody accounts.


> I might conclude that my only options

I guess the point is that you were able to conclude that in the first place. You concluded there are options. An algorithm/decision removing one side of it means you do not, more so if you do not even get to understand why that choice was removed when you're making that choice.


What would you direct searches for "strawman" to?


>Should I simply accept that this site is the correct top result for my users' queries,

Yes, your users aren't using your search engine as an oracle for the truth, they're using it to find pages with similar text. It's their job (and specifically not yours) to determine how well the text they find corresponds with reality. Perhaps they're even aware the site is factually incorrect and enjoy reading it for the novelty, perhaps they want to cite it to warn others that sometimes things online are wrong. You have no idea what the people are doing with the information.

Google has started censoring their results because they decided people should use their search engine as a truth oracle (with the results you would expect, lets not forget the "when did George Washington go to the moon" thing from a year or so ago.) I think that's a mistake and is a large part of the reason I use DuckDuckGo.

You may have caught this one instance but there are likely an unlimited number of others. You're not going to catch everything and you make your search engine worse with every exception you make to the algorithm.


>they're using it to find pages with similar text.

This is one vision of what a search engine's job is, but I'm not sure it's what most users are looking for. Even barebones pagerank goes far beyond just finding similar text - it uses structure of the web's link graph to estimate the quality of a page. Now obviously pagerank is not directly trying to ascertain which pages are truthful and which are not, but it is arguably using connectedness as a proxy - the assumption is that people tend to link to reliable, useful pages and tend not to link to incorrect, harmful ones.

Does vanilla pagerank also violate your boundaries for what a search engine should do?


> Yes, your users aren't using your search engine as an oracle for the truth, they're using it to find pages with similar text.

I’m confident this is factually incorrect for a very very large subset of searches. Probably a majority.


Could you source what you're basing your facts on? I know many people like myself are just looking to expand on the text we're searching for, and letting the free market decide what's a good search result.


The best supporting data is the rise in no-click searches. See https://sparktoro.com/blog/in-2020-two-thirds-of-google-sear...

No-click searches continue to grow in share while google continues to adopt design that presents information outside a traditional link, like a weather widget or the fact you were looking for presented at the top in large text outside the scope of a link.

This suggests a growing share of searches weren’t looking for links in the first place, they were looking for information and when the search engine provided that information, the user left without clicking a link


It's never going to work for that though. This is like changing a hammer design because some people are hurting themselves using it as a comb.


It does work for that. For a large number of searches, on relatively uncontroversial topics, search engines have effectively become oracles of truth.

This is how google ended up with their modern design that discourages website clicks, they did research into their users and realized most people searching “Abraham Lincoln’s birthday” want to see the top result “February 12, 1809” and not the top result “https://www.loc.gov/item/today-in-history/february-12”

Obviously I’m aware that not all data is uncontroversial and agreed upon, but it’s a fact that a large percentage of searches are quite literally treating search engines as oracles of truth


>It does work for that.

It does not. I used the moon thing from last year as an example. Here[1] is a very noncontroversial query that still yields factually incorrect results in the box at the top. Yes if you read the results carefully and think about it you'll realize it's wrong but then you're doing the same thing you would be doing otherwise.

[1] https://www.google.com/search?sxsrf=APq-WBv0RNgr6Q3zUOjDc1lM...


And this example https://i.imgur.com/UkcR945.png that's still broken as it was half a year ago. ( https://news.ycombinator.com/item?id=27622613 )


> It does not.

In the "Abraham Lincoln’s birthday" example that the person gave, it absolutely does work that way in regards to the user's motivations.

When a user googles for "Abraham Lincoln’s birthday" They are almost certainly attempting to use a search engine as an oracle of truth.

> that still yields factually incorrect results

Search engines aren't perfect. That does not change the fact that the oracle of truth model, is a good model to describe a user's motivations.

Such as when they google "Abraham Lincoln’s birthday".


helen___keller wrote:

>This is how google ended up with their modern design that discourages website clicks, they did research into their users and realized most people searching “Abraham Lincoln’s birthday” want to see the top result “February 12, 1809” and not the top result “https://www.loc.gov/item/today-in-history/february-12”

But this is a false dichotomy. There are more options than either an infobox that says "February 12, 1809" and a "February 12 in history" page. I think you can agree that a link to the Wikipedia article on Abraham Lincoln will be a fine search result. At most the search engine needs to be smart enough to match "birthday" in the search query against variations of "born on" that are present in the article.


> But this is a false dichotomy. There are more options than either an infobox

I am not saying that there is only a singular way of displaying information.

Instead the point here is that Yes, users who search that are actually trying to use Google as an Oracle of Truth here, and if the top results are anything other than what his actual birthday is, then that is a problem.

So in other words, yes it is being used as an oracle of truth, even if there are different ways of precisely displaying that.

> I think you can agree that a link to the Wikipedia article on Abraham Lincoln will be a fine search result

Only if the article links is actually the truth, and not some other number. So yes the search engine is being used as an oracle of truth, to find out what the actual answer is to what his actual birthday is.


"The Covid vaccine doesn't prevent deaths 100% of the time. So it does not work."

Something doesn't need to be perfect to work well.


I’m rarely an expert on the broad keywords I search for. That’s why I’m searching: to learn.

I would use a search engine that makes some attempt to rank higher quality, more accurate information higher than others.


> Yes, your users aren't using your search engine as an oracle for the truth, they're using it to find pages with similar text.

The reason google removes factually incorrect results is because this is actually exactly what the vast majority of people use search engines for.


Your view is completely opposed to all of the recent "Google search is totally broken due to garbage SEO sites" HN articles.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: