I pay for my email that gives me a lot of aliases and most of them have not been pwned yet. So with his tool I would be flagged as a bot. Honestly, doesn't sound like a great idea to be frank.
There must be large swaths of people that have either been careful or have specific emails that they use for certain purposes that haven't been pwned.
The question, what should happen if I haven't been pwned? Should I not be able to purchase the thing or would I face some annoying captcha?
I like Troy Hunt, but this idea penalize people with good habits and that is just something I can't support.
Sort of. He does encourage this use-case in the final paragraph.
> Applying "Pwned or Bot" to your own risk assessment is dead simple with the HIBP API and hopefully, this approach will help more people do precisely what HIBP is there for in the first place: to help "do good things after bad things happen".
This is a common investigative technique that predates HIBP, however more people are starting to automate it now (using non-HIBP datasets). I think this combined with the new request-based pricing on the HIBP API implies he just wants to make some money off being the quick to implement 75% solution.
No, it doesn't penalize them (at least not his idea, implementations might), it simply fast tracks pwned emails and doesn't apply the normal bot checks that would otherwise apply to everyone.
That's not how he's suggesting it would work. All checks would normally be applied to build a "how human are you" or "humanness" score. He's suggesting a pwned email test and arguing it would be a good signal for "humanness". The implementation might not make it an explicit penalty (-1 to your "humanness" score), but not being pwned might not help your case (+1 if you are pwned, but +0 if you're not).
Yeah it would definitely be good to integrate it into a Bayesian approach where it is mixed with other factors to generate a % chance of being human vs. bot.
It depends on the risk. I have an account that was pwnd (with the same password) but there is no risk to me as there isn't anything useful in that account (not even a DoB, Address or even a Name.) Worse case, someone changes the password and locks me out. Then I'll create another account as it's not a big deal.
The point would not be that it's a threat to you (though it may be), it's that compromised accounts (like one you don't care about) are a threat to an ecosystem that can't identify whether a "user" is a human or a bot.
That is, your compromised account could be used in an attack and it would look like a human.
Facebook and Twitter are basically closed to new users. If you've gone this far without an account, your new one will be shut down for being a bot within hours of creating a new account, or flagged for "extra verification" which requires sending a government ID to these companies so they can verify that you didn't photoshop a fake government ID.
This new approach seeks to extend this feature to the entire internet. What could possibly go wrong?
I create a new account every 6 months or so on Facebook when my old one gets banned for "violating the community guidelines" and I haven't been asked for an ID ever since 2019. Twitter, though, is way worse and I had to give up and I'm currently just buying aged accounts. Violates even more parts of the ToS than just ban evasion but at least the accounts last for years instead of weeks.
I gave up on facebook (but I wasn't trying that hard) but it seemed to be using the extortion practices a lot of services use now. At first it appears to let you create an account but upon logging in for the first time it demands a phone number for 'verification'. Microsoft was even worse when they migrated my mojang account, it let me use it for a little while before demanding the number.
Back when I had a facebook account I recall it suddenly up and demanding I scan my drivers license one day or I couldn't log in again...on the same and only machine I actually had used facebook on.
If you don't like somebody on Facebook you can report them for offensive content. I posted something that some asshole facebook friend didn't appreciate and they dug through my facebook feed and found one image that was borderline (somebody in politics in their drawers) and sent in a complaint to facebook. Got my account banned for a first strike. Same douche could have sent in more complaints and facebook would have happily given me three strikes. It's a Stasi system. Don't like your boss or your neighbor, report them...
It's not about what's against the ToS, it's about getting the monkeys who review the reports to judge that it's against the ToS. Given their working conditions, they have little incentive in making an accurate determination and may just be pressing buttons at random, so spamming enough reports will eventually yield a ToS violation even on perfectly clean content.
Your question contains the implicit assumption that "TOS" is some bright shining line that everyone, from all posters, to all of the AIs and humans analyzing whether something conforms, completely agrees with. Therefore, "just don't break the TOS" is a reasonable solution.
This is manifestly and obviously false, in numerous ways. I don't even need to cite capriciousness, cultural differences, or potential political bias; even ignoring those things, it simply isn't and can not ever be a bright shining line.
This is even before we consider that TOSs have been known to retroactively change. YouTube just made such a change; doesn't affect whether the videos are removed but the retroactively changed the monetization standards, with large effect. "Just don't break the TOS" is a non-starter in such an environment.
You generally can't actually check the government databases directly, but you can still determine this.
First, companies can catch most fraudulent documents simply by looking at the document (eg. are the fonts all correct, does the checksum on the MRZ add up, does the data in the MRZ match the data on the face of the document, does the data on the document match previously collected data about the individual, etc.) Some will go further by combining this with a "liveness" check (eg. they might ask you to take a picture of yourself in a certain pose, or to record a short video looking side to side)
Second, companies can use a soft credit check (if authorised by the user, which would need to be in the fine print when you sign up or when you are asked for such a document). Such a credit check won't affect your credit score, but can be used by companies to see if an individual with your details exists. Companies which offer such credit data in the UK/US/other western countries typically boast of 90-95% match rates across a population, but obviously younger people are less likely to be found since they are less likely to have a credit history. This is typically aggregated with data from non credit sources (electoral roll information, county court judgements, etc.) to reach those high match rates. They might also geo-locate the IP address from which you accessed their site and compare it against any address information they have on you (which could come from you providing it on sign up, it could be extracted from the document if it's a driving license or something, or it could come from any credit records they found relating to you)
For Facebook specifically, they might look at other online activity - other social media accounts they can link to you, etc. And throw all of that into the mix.
If at the end of all that they don't have a clear answer, they might fall back to a manual process, or allow the account to be created but have content posted by the account flagged for manual review.
> eg. are the fonts all correct, does the checksum on the MRZ add up, ...
Is that hard?
A quick googling shows websites that will generate a California driver's license for virtually no money, so I'd assume with decent programming skills should be able to put together a generator.
I created a Facebook account a few months ago to use Marketplace. The profile has only a name and unique-to-facebook email. I always use it in Firefox Containers.
Still active, and I've sold a handful of things with it.
So they admit that new generations are not interested in FB or Twitter and they will die with the boomer generation? If not then this logic makes little sense :)
Is it a black and white silver bullet one call destroys 'em all solution? Not even close. But, like he states in his article; from a "defence in depth" its another strong signal.
Are you a bad guy just because you have a weirdo email (which I do)? No.
Are you a bad guy just because you use tor? No.
Are you a bad guy just because you're trying to make a purchase during an extreme surge? No.
Are you PROBABLY a bad guy given a weirdo email, you're on tor, and you're trying to buy during a surge in purchases? I would say yes. I might not ban you outright, but you're going to jump through a lot more hoops than someone with an ancient email and a residential ip address.
> you're going to jump through a lot more hoops than someone with an ancient email and a residential ip address.
I understand this kind of reasoning.
At the same time I see a potential to snowball. This will encourage people to move away from weird addresses. Which will make it an even more effective filter and will justify stricter measures. So more people will move away. Etc.
Thats a really good point. I'm working through this space right now so I'm kind of myopic to stuff like this.
I use a self hosted VPN (digiOcean); but under duress, I'd be a jerk to me. tbh; most sites are, lol. I've given up youtube and google because I am reCaptcha'd to death...
To your actual point, I don't think it would be a deal killer per se in implementation. Weirdo@Weirdo.com isn't blocked because they show up in troys list of known emails.
Fakebook@Weirdo.com is suspicious in this model because it has not been seen before.
Wouldn't bad actors just push their fake email addressess to haveibeenpwned in fake leaks? Steps:
1- periodically set up a legitimate looking service, possibly proxying real services.
2- wait a year or two for your fake service to premiate throughout the www and for seach engines to index it.
3. Mix your bot email addresses with legitimate previously pwned addresses.
4- proclame "woe is me, for thyself hasth been pwned"
You can set up this process so that you can inject a couple 100k bot email addresses periodically every couple of months.
This is an incredibly shortsighted idea with the potential to hurt a lot of innocent people.
It is going to happen, and some people will make money off it by farming such addresses, but it raises the time and the cost to obtain a plausible email address for fraud.
At that point you'd be better off making those emails and signing up to a bunch of services. Bot emails aren't fresh for 2 years, and if they are somebody isn't doing their job properly.
I think the point is bot emails shouldn't be fresh.
Same way some people just set up businesses with random names in tax-shelter territories and sell the company 10 years later to add a sense if legitimacy.
This is a cute "hack" for bot detection, but it's too unpredictable for the real world. Far too many users with good security hygiene are penalized by this system
Plus, this might incentivize hackers to defeat the system by logging into and using email accounts pwned in these breaches.
> Plus, this might incentivize hackers to defeat the system by logging into and using email accounts pwned in these breaches.
This already happens at a large scale anyway.
There's hundreds, if not thousands of "account shops" and sellers online selling hacked accounts for all sorts of services. Everything from Spotify to Twitter to news sites.
They ingest new breaches (or use automated tools to go hack sites and dump databases), and automatically test the leaked credentials against loads of shit using tools like OpenBullet or SentryMBA.
Those tools even integrate rotating proxies, captcha solvers, etc.
There's a few good talks on this, credential spraying and account shops.
The only security hygiene that can stop your email from leaking is using a different address for literally every service you ever log into. This is of course possible with your own domain, but in practice totally infeasible for the vast majority of people.
> (…) or even using a masked email address service such as the one 1Password provides through Fastmail. Absence of an email address in HIBP is not evidence of possible fraud, that's merely one possible explanation.
It can, but that's kind of the nature of anti-spam systems these days. Come in on a Tor IP with a randomly-generated burner e-mail with a Curl user-agent and you're gonna get blocked from almost anything that spammers have an interest in. Come in on the e-mail address, aged cookies, and a geolocation associated with your credit card for years and you're gonna be fine. Do things in the middle and expect some amount of false positives.
This feels a lot like email providers assuming that if you're running your own mail server, you must be spamming people.
This depends on the lack of use of good tools like FF's relay to anonymize accounts. I mean, HIBP is great, but Troy is self-consciously not interested in handling subaddressing, which would improve his service and its (mis)use in detecting "humanness".
> but Troy is self-consciously not interested in handling subaddressing, which would improve his service
I don't think Troy is not interested in handling subadressing in the general sense, I think he just dismisses it as "not worth the time" given current statistics.
If it is worth the time and you were writing one of these "Pwned or Bot" "email credit score" detectors, it is easy: you could easily strip +whatever before an @ and check if that exists as well. (Check both!)
> which would improve his service
It's not actually his service he's talking about in this particular article. He doesn't run an explicit "Pwned or Bot" "email credit score" service. He's pointing out it is an interesting use of the HIBP API and also to do it right it needs some sort of value add/scoring system, which he hints at ways to do that but does not provide one (and especially not as a service).
HIBP itself doesn't support subaddressing as a feature, but that's on purpose for a different reason: many of the people that use subaddressing, especially consistent users, use HIBP to narrow down specific account threats and it is useful to them today that HIBP tracks all of their subaddresses independently.
Maybe I'm an outlier but the e-mail-adress I use for online payments or shops for over 10 years now has not been pwned. Maybe because I don't use this email for other sites where no money is involved or for social media. But I think hibp is not a great bot indicator.
So the crux of the technique is to roughly date how long an email has existed for, using leaked databases as a timestamping measure. I'm not sure this metric is a good one though, as older and importantly "pwned" emails are far more likely to have been taken over.
Without an idea for the percentage of emails that are still in the original owners hands, this risks a high false negative rate.
> This is called "sniping", where an individual jumps the queue and snaps up products in limited demand for their own personal gain and consequently, to the detriment of others.
This reminds me of Utility Monsters[0]. From Wikipedia:
> the utility monster, receives much more utility from each unit of a resource that it consumes than anyone else does. For instance, eating a cookie might bring only one unit of pleasure to an ordinary person but could bring 100 units of pleasure to a utility monster.
I'm a utility monster, and shops and convenience stores either love or hate us (since the monster consumer derives a skewed amount of utility from certain items). Some stores deliberately up their prices on certain items if they see utility monsters taking advantage, other times, they let the price remain stagnant, in full knowledge the utility monster brings them good business.
GeekedIn: In August 2016, the technology recruitment site GeekedIn left a MongoDB database exposed and over 8M records were extracted by an unknown third party. The breached data was originally scraped from GitHub in violation of their terms of use and contained information exposed in public profiles, including over 1 million members' email addresses. Full details on the incident (including how impacted members can see their leaked data) are covered in the blog post on 8 million GitHub profiles were leaked from GeekedIn's MongoDB - here's how to see yours.
Compromised data: Email addresses, Geographic locations, Names, Professional skills, Usernames, Years of professional experience
> it may be that they're uniquely subaddressing their email addresses (although this is extremely rare)
That “extremely rare” is about plus-addressing. My experience is that catch-all subaddressing (e.g. *@chrismorgan.info in my case) is considerably more popular, only rare rather than extremely rare.
I have always wondered why pricing can’t fix the issue. On launch day, or for your first batch or whatever, start the pricing higher than you expect most anybody to pay. Target a constant rate of purchase by gradually lowering and raising the price to maintain some target sales per min/hour. Bots and scalpers get stuck holding the bag if they buy on launch day because the price will likely never be higher than what they had to pay to get the product. The company makes marginally more money on launches. People who really really want the product get it at a fair price (they were willing to pay).
> People who really really want the product get it at a fair price (they were willing to pay).
If people were content to get the product at a fair price, scalpers wouldn't be a problem in the first place. The whole reason scalpers are considered a problem is that people want the product at a cheaper than fair price, and scalpers prevent that by buying up any inventory that is being sold for below the market rate.
Basically, if companies employed the strategy you suggest, then they'd effectively become the scalpers in the eyes of people who consider scalping a problem, with all the PR issues associated with that.
That's not to say it's necessarily a bad idea though. Once you accept the fact that scalpers exist, it makes sense for companies to capture those profits themselves rather than let scalpers just have them for free.
Yes I understand and I do see your point. I wonder if the problem can be solved semantically. Instead of thinking of the company as the scalpers you want people thinking of the launch event as an auction. I don't think people would be so fickle if you framed the practice as “launch auctions”. Then it would be clear to everyone that there is no msrp until the supply and demand stabilize. And people would be at liberty to pay whatever they thought a fair price for the item all things considered. If someone else bids higher, well, tough luck.
I think the problem bot vs real-person needs to be solved by the governments. Every government doing its own thing to tackle this wouldn't work, it would be great if they created an open-source project/standard that they implement. Alternative would be using bank accounts which is actually what Scandinavian countries do (e.g. in Sweden it is Bank ID) to verify that you are a real person.
All these methods of trying to recognize government ID pictures and etc. just seem very inefficient and not accurate enough for wide-spread use.
Unfortunately, not many governments are well-run to manage such solutions.
What about the unbanked (i.e. people who don’t have bank accounts)?
And even if bank accounts were free, getting a bank account means accepting the terms and conditions written by the bank. Not to mention the laws and regulations regarding banking, which include sending your bank details to the US government, even if you are a European using a European bank.
>Now, think of it from Nike's perspective: they've launched a new shoe and are seeing a whole heap of new registrations and purchase attempts. In amongst that lot are many genuine people... and this guy How can they weed him out such that snipers aren't snapping up the products at the expense of genuine customers?
Is it true that Nike actually wants to cut the snipers out? It seems like they're selling the shoes either way, possibly faster this way, and the resellers are doing free promotion for their shoes in order to resell them.
Ha I do this all the time for buying second hand concert tickets! Scammers usually use throw away email addresses. If the seller has a pwned account I trust them more ;)
Everything from that Stripe data about IP addresses in Europe is just pure cringe.
Did you know, that at least in my country, nearly everybody is behind CGNAT, so hundreds if not thousands households has exactly same external IP address and this rotates very often.
So you constantly have same IP address, which hosts tons of torrents with porn or movies (nobody cares about torrents in my country). etc.
How does this work with data protection laws? Is there a way for me to object to a company doing this, i.e. an automated background check on my email address with stolen personal data?
I guess I cannot effectively object to my email being included in data leaks…
Very bad idea. Most people are not terminally online like HN folks, and they barely register and barely appear in leaks. Unless every single facebook and instagram and wechat etc user is leaked, it will already have too false positives.
Snipers are just as much legitimate customers as anyone else. They only snipe under priced products so if you don't want people reselling them do not sell them for so cheap.
Astonishing how people will argue against their own interests in defence of a free market. If goods are underpriced and snipers are prevented, consumers pay the lower prices. If goods as "right-priced", the consumers pay a higher price.
Wanting an RNG / ping based system is not in everyone's interests. Plenty of people want to be able to just buy a product wherever they feel like and not have to spam refresh at specific time for a chance to get a product. Resellers offer this convince and clearly people are willing to pay for it.
It's either some consumers get a good deal while others get nothing, or all consumers pay a fair price and get it.
This isn't even getting to the higher resale price if resellers are blocked because there is less competition between resellers.
There must be large swaths of people that have either been careful or have specific emails that they use for certain purposes that haven't been pwned.
The question, what should happen if I haven't been pwned? Should I not be able to purchase the thing or would I face some annoying captcha?
I like Troy Hunt, but this idea penalize people with good habits and that is just something I can't support.