Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Storing Passwords in a Highly Parallelized World (hynek.me)
167 points by hynek on Jan 7, 2016 | hide | past | favorite | 65 comments


It does not really matter what password hash you choose in 2016. Argon2 is great, and if you have a library for your platform that does it, use it. But if all you have is PBKDF2, please don't freak out and come up with some complicated alternative to storing passwords altogether.

The real, big risk is not using a password hash at all, and instead using "salted hashes".


Unless you generate a high entropy password for the user, in which case you can just use one round of a hash function and be done. For web authentication, most users will just have their browser remember it anyway and if not they can write it on a pice of paper.


Are you referring to the comment -

> The real, big risk is not using a password hash at all, and instead using "salted hashes".

If so, your statement is incorrect. Without salting you are vulnerable to trivial rainbow table attacks [1]

[1] https://en.wikipedia.org/wiki/Rainbow_table


I think the parent was assuming a lovely world where all passwords would be large randomly-generated strings. In that case, salting is no better than just increasing the size of the randomly-generated string, right?


Yes, thanks. And the thing is they don't even have to be that long. Still inconvenient to enter a bunch of those every day, but the same is true with any password and most of us don't do that - we maybe set a master password and our web browser remembers them.

Salting is one of the best and easiest measures if there is a possibility of lowish entropy passwords, but if not it doesn't help (edit: on further thought I think there is a fairly substantial entropy range where the only viable attack is a very large scale batch attack and salting would prevent that, although the cost/benefit on actually doing that makes it an unlikely attack anyway). Iteration and memory hard functions also effectively add just a few bits of entropy. But when users choose passwords, a huge portion will be low enough entropy that nothing you can do will help enough.

A second factor does help in that case, and can provide additional protection to high entropy passwords (as long as the second factor is distinct from the system that stores the high entropy password, which in theory it should be to be called a second factor).

technion: yeah, my suggestion won't work if there is any way for the user to set the password.


Which the user will promptly reset to the same password they use everywhere else, meeting whatever entropy requirements are in place.


I don't have anything to add to this discussion, I just wanted to thank the author; this blog is essentially what I've always wished my blog to be: clean, with few but very informative updates, especially for us Pythonistas. The articles about deployment have been particularly useful. Thanks!


Aw thank you!


Is there any reason to prefer Argon2 over scrypt? The article dismisses it as not popular, but doesn't say why we should invest in making Argon2 popular over scrypt.


The scrypt-derived yescrypt participated alongside argon2 in the PHC[1], and argon2 was chosen in favour of it – mainly because scrypt is a bloody complex mess to implement –, but yescrypt is still recommended if you need backwards compatibility with it.

[1](https://password-hashing.net/)


One of the nice things that Argon2 has is server relief. Instead of doing the entirety of the computation on the server you can have the client do the most demanding parts and send those intermediate values to the server which can then finish the much easier computatation.


Really, any password hash `PH(m,p)` can have server-relief by adding a regular cryptographic hash function `H` and making H∘PH your hash function. By this, you store `p||H(PH(m,p))` in the database, and have the client compute `PH(m,p)` and send it to the server rather than sending the password `p`. You still have to send the salting information to the client though or use .

I don't see the server relief in Argon2's latest specification document though.


I.E., cram-md5 and related password schemes. It's a shame it fell out of favour, I'd sure prefer something like this over sending plain text passwords via TLS…


Ideally we eventually get rid of passwords and use secret/private keypairs in the browser, or have some browser-based SRP.


Can someone help me understand why the author of the python library chose to raise an exception on verification failures as opposed to returning false?


Two reasons:

1. mainly: the C library has no concept of “wrong password”; only “verification failed with an error”. If you want to know why it failed, you need the error. As you can see in the example, a wrong password is "Decoding failed” which can also be your fault. It seems like they want to interpret their own failures as little possible. Therefore raising an exception with the error seemed the best way forward. 2. secondarily: in security context, I tend to prefer loud failures for dangerous problems so they don’t pass unnoticed by accident. ymmv


On that topic, if it's raising an exception in the middle of testing the hash, could this permit a timing attack?


It isn’t raised in the middle of testing the hash. The testing is completely done in the Argon2 C library and the bindings raise an error if it returns an error. The Python library doesn’t do anything smart at all except calling C functions on strings.


Comparisons after hashing are naturally resistant to timing attacks, because you are not in direct control of the bytes being compared.

Just ask Bitcoin miners how hard it is to pick an input which results in a hash with a desired n-bit prefix.

But as a belt-and-suspenders you often see an attempt at fixed time comparisons of digests in any case.

Coincidentally, hashing before comparing can be used in scripting languages where the compare function will often be optimized out from under you, making constant time compare difficult or impossible to actually guarantee.


A core tenet of Python is that "it's easier to ask for forgiveness than permission":

https://docs.python.org/2/glossary.html#term-eafp


This principle doesn't necessarily mean functions should never return booleans, though.

Booleans are used in a variety of (popular) Python libraries when checking whether a password is correct (e.g. Django's `check_password` returns False if the password is wrong).


This is most likely the reason. It allows your code to follow the happy path for logging in, and treat verification as the exceptional case. Definitely a design choice by the author who self-identifies as a Pythonista.

  def handle_login():
    try:
      ph.verify(hash, "s3kr3tp4ssw0rd")
      log_login()
      redirect_to_page()
    except VerificationError:
      log_bad_login()
      redirect_to_login()


While we're talking Python best practices, I highly recommend using an `else` clause to keep the code being 'excepted' to one line:

  def handle_login():
    try:
      ph.verify(hash, "s3kr3tp4ssw0rd")
    except VerificationError:
      log_bad_login()
      redirect_to_login()
    else:
      log_login()
      redirect_to_page()
It's not a big deal for this code, but in general this is good practice because

1. It makes it very obvious to the next developer which line is the one that is expected to raise that exception

2. One of the other lines could unintentionally raise that exception and mistakenly trigger the except clause. (This is more of an issue with Python's built in exceptions than with something very specific like this `VerificationError` example.)


Given the recent ACM article about faster-than-cpu persistent memory on the horizon, how does that affect memory-hard hashing algorithms?

A machine with a measly 256GB of memory can do around half a million Argon2 hashes simultaneously (using the Python library defaults for memory complexity of 512k), and that doesn't include the memory built into high end GPUs, or what could be added to ASICs or FPGA boards.


I do not think ACM article is very relevant here, because what is speaks about is, essentially, the fact that what we traditionally thought of "disks" is now getting as fast as RAM, because persistence is achieved via new technologies.

A machine with a measly 256GB of memory can do around half a million Argon2 hashes simultaneously

Not necessarily. There is always the limitation of memory bandwidth. Although, I do not see saturating that bandwidth as a design goal of Argon2.


precisely.. if a hash is too intensive, it's a clear vector for DDOS attacks.


Memory being faster-than-cpu does not mean faster-than-gpu or faster-than-asic.


I'm skimming info from all over the place. Does anyone know the performance of argon2 over the other stuff? I'm figuring out whether to use bcrypt or argon 2 for a side project.

https://www.npmjs.com/package/bcrypt

https://github.com/ranisalt/node-argon2

edit: found it. I'm still reading

https://password-hashing.net/submissions/specs/Argon-v3.pdf


Can you unroll argon2 for cpu-vs-memory tradeoff like scrypt? Both are supposed to be memory-intensive, but in scrypt you can get 2x slowdown by using 2x less memory and just computing the missing bits. (http://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scryp...)


From the white paper https://password-hashing.net/argon2-specs.pdf:

> Argon2i is more vulnerable to tradeoff attacks due to its data-independent addressing scheme. We applied the ranking algorithm to 3-pass Argon2i to calculate time and computational penalties. We found out that the memory reduction by the factor of 3 already gives the computational penalty of around 214. The 214 Blake2b cores would take more area than 1 GB of RAM (Section 2.1), thus prohibiting the adversary to further reduce the time-area product. We conclude that the time-area product cost for Argon2d can be reduced by 3 at best.


Password hashes focused too long on computation time, neglecting memory usage. It's great to see some progress on that topic.

On the other hand, we still see password databases stored in plain MD5, sometimes even without salt. So in addition to provide better password hashes, making them mode widespread is important, too.


Password hashes focused too long on computation time, neglecting memory usage. It's great to see some progress on that topic.

You mean, like scrypt, seven years ago?


Since you're here: any comment on Argon2 versus your own scrypt? (I've been meaning to dive into it, but...)


It's a minor improvement at best. As far as I'm concerned, the winner of the PHC was scrypt, in that the PHC demonstrated that nobody could do significantly better than scrypt.


I would agree with you that scrypt's design was vindicated by the PHC, but I think calling it the winner is a bit much. I wouldn't want to go back to scrypt, given the choices available now. Incremental progress is still progress.


yescrypt?


One thing I wonder about, as password hashes get both processor and memory-hard -- are we presenting a trivial DoS attack on our servers, by basically letting any IP submit a request for a server to dedicate this (non-trivial) level of RAM and processor to a given task (hashing a random string to verify that this login is not correct)?

I suppose the answer is yes, but it's worth it. It's possible to fix this sort of hole (partially, at least) by capping the number of hashes processed concurrently. But most simple implementations will just assume "we're not likely to have more than X users every signing in concurrently, so we can set the work factors based on that plus some headrooom".


If you're not throttling the number of incorrect password guesses per IP, or per IP per account, or both, you're opening your users up to a dictionary attacks anyway. If I can get a box close your yours (~5ms is doable if we use the same colo), then it's completely feasible.

You're right however that this is going to be painful, because most web frameworks don't have any of this throttling architecture (like persistent in-memory hashtables) in place, and as soon as they switch to Argon2/Scrypt, it's going to be an easy DoS vector... particularly for services running on weak VPSs.

It's another reason why solid, secure password authentication protocols, that do client-side hashing, will eventually happen. Even in fancy zero-knowledge asymmetric protocols however, online rate-limiting is essential.


Rate limiting per-IP assumes an attack from a single IP, or a very small range of them (so, only defends against a trivial DoS, not DDos... which are sadly easy to set up these days).

Per-IP per-account as well doesn't work if the attacker has a large list of usernames. Even brute-force "dictionary" attacks can dodge simple limiters by submitting one password with 2 million diff usernames, then a second password with 2 million usernames, etc..

I'm not saying these are bad (though if someone can trivially stop your real users from signing in by hitting the limit on their accounts, that's just a DoS in another shape). But we're agreed already... these are non-simple problems, really.


Just an aside, if you're using nginx, I'd look at ngx_http_limit_req_module which can alleviate this case.


If you throttle your password validations, a large enough attack will effectively be a DOS attack anyway since it would be processing more invalid than valid requests, even if your server is "still working".


You should rate-limit your logins anyway lest you want to end like Apple & Fappening.


We still see passwords stored in plain text, let alone MD5. But you're absolutely right.


We also still see 123456 and friends...


It's sounding like it's too difficult to keep up with password security to try to do it all yourself. You'd be better outsourcing it such as with OAuth if you can.


Implementing OAuth to avoid picking a password hash is a crazy security tradeoff.

This is the kind of thing that made me feel like the Password Hashing Contest might be a net negative for systems security. In reality, despite what this blog post says, you could select the worst of the password hashes --- PBKDF2 --- in relatively weak parameters, and still be fine.

If you give nerds 3-4 things that do the same thing, we will find 10^(3-4) different ways to have arguments about them.


While it certainly would be nice to outsource this problem,

    ph.hash("s3kr3tp4ssw0rd")
    ph.verify(hash, "s3kr3tp4ssw0rd")
Is pretty damn easy.


You're missing the point (which is hilarious, because the very first sentence of the linked article spells it out explicitly). The APIs for password hashing are simple, and always have been (c.f. man 3 crypt).

It's the problem area that is complicated and rapidly changing. And use of APIs and mechanisms that just a few years back were Best Practices is now discouraged. That's not something that can be treated by a cute Python API.


The best practice a few years ago was "use the best PBKDF available", which hasn't changed – just what algorithm to use. This isn't different from any other part of crypto – we have a constant algorithm churn as we identify and replace problematic algorithms.

And OAuth is no exception: If you implemented OAuth five years ago, it was OAuth1, which was obsolete and ripe for replacement… four years ago.


Uh... the specific advice I was replying to was precisely to use a specific python wrapper around Argon2, which is certainly not isomorphic to the "best PBKDF available" (even if it might be right now).

And the OAuth point is misdirected: we're talking about hash choices here, which OAuth is not. An OAuth1 client, while perhaps obsolete for other reasons, would certainly be protected against the need for a change in password hash, because all that stuff happens upstream and it never sees the password. That is by design, therefore remote authentication against highly-trusted password verifiers[1] is a good idea, which is what the great-grantparent post was saying.

[1] Which, sure, has its own list of worries unrelated to hash behavior.


So what is your point then? No matter what you do – whether you verify password yourself or externally – the state of the art constantly changes and you have to update your code either way. There's no escape from that. The reasons change, but not the code churn.


Uh... no. That's exactly wrong. If bcrypt or whatever gets broken tomorrow, and my OAuth provider needs to reset all the passwords in the database (Linode literally did this last week, though not because of a hash change, it's a common kind of thing), I don't need to change a line of code.

Ergo, if you're worried about changes to password hash vogues, OAuth is a good idea. Thus the point way upthread, which both you and the other poster seem to have missed.


Because it's a nonsensical point. You trade one very simple thing to worry about – password hashing algorithms, for which you can count all variations in the past ten years on one hand – with a massively complex system like OAuth (1.0? 1.0a? 2.0?), which in turn relies on TLS for transport security, for which best practices change every two or three months.

If bcrypt gets broken tomorrow, your passwords are safe until you have a data breach. If OAuth gets broken tomorrow, you're immediately at risk.


What about the code for taking initial passwords, changing passwords, password recovery, securing login forms, storing passwords, transitioning to new hashes methods and even 2-factor authentication? OAuth providers would do all this for you in an undoubtable more secure way than most people would have time for.


Do you mean outsurcing the to OAuth regarding the various implementations or outsourcing to an OAuth provider?


Unfortunately, the user experience with the nascar buttons is pretty awful.


One concern I have with Argon2 is that it uses BLAKE internally. This is understandable, since BLAKE was developed by one of the conveners of the Password Hashing Competition, but it seems to me that using SHA3 internally would be a better choice, in order to leverage the industry-wide investments in development & testing which we're likely to see.

Assuming that it's well-designed, Argon2's structure shouldn't rely on BLAKE.


SHA-3 (Keccak) is not a better choice, because it has a considerable performance difference between hardware and software implementations, while BLAKE is very fast in software (or current general-purpose CPUs), which is good for most defenders (as they use general purpose hardware) and bad for attackers (as the cost of specialized hardware is increased). None of the Password Hashing Competition submissions used Keccak, and even the authors of Keccak recommended against using it for password hashing.


Gambit used Keccak.


Ah, true, I forgot about it. With the following remark in specs (https://password-hashing.net/submissions/specs/Gambit-v1.pdf):

Keccak is known to be very fast in hardware, which opens up the path to highly optimized cheap circuits.

(although with strange follow-up: "but the same can be said about modern CPUs and GPUs, which closes the gap")

As I wrote when discussing candidates, "I like the simplicity of Gambit, but it would look a bit silly if we selected an algorithm as a winner of competition with a notice that it better be used in the future, when Intel adds a SHA-3 instruction."


BLAKE was pretty close to becoming SHA3. The only reason Keccak was chosen is that BLAKE was too similar to SHA2 (which has no known weaknesses).


When can we expect something for PHP?



(Serious question) why don't people just use firefox Sync or Chrome password store? Why worry about all these? In both cases passwords are salted+encrypted and uploaded.


Firefox Sync and Chrome password store are for users storing passwords. They are not hashed (because that would make it impossible to log into a site, because you need the actual password to log in, and a hash is irreversible). I don't know what you mean by salted, because as far as I know, salts are only used with hashes and these are not hashed.

Bcrypt, scrypt, and argon2 are for server storage of login information. The server does not store the actual password, it just stores a hash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: