Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are only 2^32 IPv4 addresses, if you know the nonce you just try them all... no privacy provided.

If you don't know the nonce, you can't match against other users-- so not useful for abuse.

But I'm skeptical re: abuse uses. For commenters, sure-- you may need to store IPs to combat abuse. But for readers? At most you would need sampled data or in-memory counters (e.g. to catch high volume bots).

Unfortunately, there really isn't any penalty for failing to minimize private data collection.



But of course, the real reason is that those ips are worth analytics $$$.


It's also useful forensic data if your site is ever hacked.


An example of using IPs to combat abuse is Wordfence. It's a WordPress plugin which blocks traffic from known abusive IPs. A quick glimpse at the "live traffic" for one of my websites reveals several IPs within the last hour that have attempted to access the site which were blocked.

A site I was repairing after a hack fortunately had server logs which included IP data. That IP allowed me to identify the specific exploit used.

So, there are definitely uses for IP data in security terms.


If you use a difficult hash function that takes ~1 seconds to calculate then it would take over 120 years to iterate through the IPv4 address space. At the very least, this could cut down on dragnet surveillance


This requires that you add ~1 second of latency to every request that requires you to hash the IP. Even if we assume relatively aggressive caching, this is still incredibly unacceptable from a user experience perspective.

Assuming you do that, you are looking at about 1193046 hours to hash the entire address space. More specifically, you are looking at 1193046 CPU hours.

You can rent a 96 vCPU c5.24xlarge instance from AWS for a rate of $4.08/hour; or $0.0425/CPU-Hour. Assuming this offers the same per-cpu hashrate as the general purpose web-server, you are looking at a cost of $50,704 to construct a rainbow table. That is no where near a prohibitive sum of money.

You can probably reduce the cost by shopping around for compute or using bare metal. You could see significant cost reductions by using hashing optimized ASICs.

Combine this with the fact that no website is going to spend 1000ms just computing the hash for every request (even if you allow for caching). And the fact that they can probably narrow down the address space they are interested in considerably if they wanted to save money.

2^32 is just too small of an asymmetry between legitimate use and an attack to be a viable defense.


From a user experience perspective, you can perform the computation asynchronously. There are also hash algorithms resistant to ASIC.

But yeah, everything else you said makes sense.


And now you have a ~1000ms latency between when some events happen, and when you can log them. Even assuming all such events get logged, you will be left with a jumbled mess of out-of-order events.


Why does your logging system rely on the order of entry insertion and not on the entry timestamp?


Yes, but then I’m burning a second of compute time every time I want to log something.

Also, by removing unlikely candidates (IPs owned by irrelevant entities or that are not US based) you could get the search range much much smaller, and with the FBIs budget you could probably compute it all in a few days even with a 1-second hash time.


But then a single user clicking on links quickly would bring your webserver to its knees. So much for using those addresses to combat abuse... :)

Plus the FBI could probably narrow their search to a few hundred thousand addresses (relevant ISPs, no unroutable/multicast/etc), then only use the list to confirm.

Finally, if it takes 120 years on one core, it'll take 1.4 months on 1000 cores. I'm willing to be the FBI has access to more computing power than I do. ~100 CPU years isn't a particularly daunting amount of computing work, even for fairly low stakes research.

That search would also decode all addresses in the logs, not just one targeted one...


1 second on a CPU can easily be 100x faster on a GPU, then distributed over 1000's of GPUs. For reference argon2 was supposed to be an ASIC-resistant, GPU-resistant memory-hard hashing algorithm, but a K20X from 2013 is 5x faster than a CPU [1] and GPUs have only gotten faster since then compared to CPUs.

[1]: https://github.com/WebDollar/argon2-gpu


The best model would be to display publicly commenters IPs, never store readers', store error logs (like people bruteforcing a password).

You d have a triple virtuous effect: people would stop being such insuferable asses once they understand basically their name is on the comment, readers would be completely safe because why not and abusers would be logged still.

It's even probably what most websites do: it news to me to keep the IP of every visitor, I'd have pruned them.


And then my modem reconnects and I get a new IP that used to belong to some insufferable asshole, and suddenly I’m blocked / blackholed / shadowbanned everywhere and some vigilante is flood pinging me.


Bingo. IRC tried the strategy of banning users by IP and half the time you'd end up k-lining entire countries because their ISPs were too cheap to buy more endpoints.


Maybe in the 56k days, but my DOCSIS ISP rarely re-assigns IPs.


Any examples? I like the transparency and self-filtering. What is/isn't this approach suitable for? Anonymous is a very common pen-name.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: