Modular password hashing with pwhash

When building web applications, we usually also have to store user authentication data. When doing so, there’s generally two choices: Either we use an external authentication provider like OAuth, or we store passwords for our users in our database. This blog post focuses on how to do the latter correctly.

Password hashing basics

Before we dive into the specifics of the pwhash library, we briefly discuss the basics of what we need to be wary of when storing passwords. The first, and most important point is that we do not actually ever store our users’ passwords. Instead, we run them through a cryptographic hash function. This way, in case our database ever gets compromised, the attacker doesn’t immediately know all our users’ passwords. But this is not considered enough any more. Because if we just put our users’ passwords through a simple hash function, like SHA-3, two users that pick the same password will end up with the same hash value. Knowing this is of course useful to an attacker, because now they can attack multiple passwords at the same time, if they get the user data.

Editor’s note: so choose passwords likely to be unique… and unguessable even to people who know you. Please.

Enter salts

To mitigate this problem, we use so-called salts. For each password, we generate a random salt, and prepend (or append) it to the password, before passing it through the hash function. This leaves us with
sha3(firstsalt:password) = 0bee3940e2d74f5155e73a9e90ea75b5d06407db85527f62acefa97af2d59f69589b276630e20a1ff8b0b781a372ae17db88b9f782acf7ed0022ab4c2fc766df
sha3(secondsalt:password) = edaa1e8b31e2d2943d689928a5ba1c503bb3220ddc164d6feff9e178dd18a4727b1905400252ef071d28b370c0b9727420cd0e2011109ca3e8934a4082d2bf9e
As we can see, this yields two completely different hashes even though two users used the same (admittedly very bad) password. The salt can be stored alongside the hash in the database. An attacker now has to break each password individually, instead of attacking them all at the same time. Great, right? Surely, now we’re done and can get to the actual library? Not quite yet. The next problem we have is that modern graphics cards are really fast at calculating these kind of hashes. For example, an Nvidia GTX 1080 calculates around 800 million SHA-3 hashes per second.

Dedicated password hashing functions

To get rid of this problem, we do something we usually don’t want to do in computing: We make things intentionally slow. There are several functions that can be used for this. An older idea is simply applying many rounds of the same hash function over and over again. An example of this approach is the PBKDF2 (Password-Based-Key-Derivation-Function). But since GPUs are really fast these days, we also want functions that are harder to compute on GPUs. Functions that use a lot of memory, and a lot of branches are generally very hard to compute on graphics cards. An example of this is argon2, a function which was specifically designed for hashing passwords, and won the Password Hashing Competition in 2015. The question, then, is how do we create a re-usable approach that allows us to stay current with always-current password hashing requirements?

The pwhash library

While there are already Java implementations and bindings for argon2 and other password hashing functions such as bcrypt, what is still missing in the Java ecosystem is a library that unifies them under a single interface. Functions like argon2 come with different versions, and a lot of adjustable parameters. So if we hardcode those parameters into our application in 2018, the parameters are probably going to be outdated (and inefficient or insecure) in five or ten years. So ideally, we want our library to handle this for us. We change our parameters in one place, and the library automatically takes care of upgrading both new and existing password hashes every time a user logs in or signs up. In addition to this, it would also be convenient to not just switch between parameters of a single algorithm, but also to switch the algorithm to something newer and better altogether.
That’s the goal of the pwhash library.

The HashStrategy interface

To offer all this, the core functionality of pwhash is the HashStrategy interface. It offers 3 methods:

  1. String hash(String password)
  2. boolean verify(String password, String hash)
  3. boolean needsRehash(String password, String hash)

This interface is largely inspired by the PHP APIs for password hashing, which offers the functions password_hash, password_verify and password_needs_rehash. In the author’s opinion, this is one of the better things in the PHP core library, and hence I decided to port the functionality to Java in a bit more Object-Oriented style.
The first two methods are relatively straightforward: String hash(myPassword) produces a hash, alongside with all its parameters, which can later be passed to boolean verify(myPassword, storedHash) in order to verify if the password actually matches the given hash.
The third method is the magic that lets us upgrade our hashes without needing to write new code every time. In the first step, it checks whether the given password actually verifies against the hash, in order to make sure that we don’t accidentally overwrite hashes when the user supplied an incorrect password. In the next step, the parameters used for the current HashStrategy (which are generally supplied by the constructor, or some factory method) are compared with the ones stored alongside the password hash. If they match, false is returned, and no further action needs to be taken. If they do not match, the method returns true instead. Now, we call hash(myPassword) again, and store the newly generated hash in the database.
This way, we only have to write code once for our login, and every time we change our parameters, they are automatically updated every time users log in. However, for a single implementation, like the Argon2Strategy, this only handles changes in parameters. If somehow a weakness is discovered, we still do not have a good way to migrate away from Argon2 to a different algorithm.

The MigrationStrategy class

Migration between two different algorithms is handled by the MigrationStrategy class, which implements the above interface, and in its constructor, accepts two other strategies: One to migrate from, and another to migrate to. Its implementation for String hash(String) simply calls the hash functions of the latter strategy, since we want all new hashes to be performed with the new algorithm.
Its implementation of boolean verify(String, String) first attempts to verify against the old strategy, and if that fails, against the new strategy. If neither succeed, the password was incorrect. If either succeeds, the password was correct, and true can be returned.
Again, the boolean needsRehash(String, String) method is a little bit more complicated. It first attempts to verify the given password against the old strategy. If this succeeds, the password is obviously in the old format, and needs to be rehashed, and we immediately return true. In the second step, we attempt to verify it with the new strategy. If this is also unsuccessful, we return false as we don’t want to rehash if the user supplied an incorrect password. If the verification succeeds, we return whatever newStrategy.needsRehash(String, String) returns.
This way, we can adjust parameters on our new strategy while some passwords are still hashed with the old strategy, and we receive the expected results.
Note that it is not necessary to use this class when migrating parameters inside one implementation. It is only used when switching between different classes.

Chaining MigrationStrategy

Fun fact about MigrationStrategy: Since it is also an implementation of the HashStrategy interface, you can even nest it in another MigrationStrategy like so:

MigrationStrategy one = new MigrationStrategy(veryOldStrategy, somewhatBetterStrategy);
MigrationStrategy two = new MigrationStrategy(one, reallyGoodStrategy);

This might not seem useful at first, but it is useful if you have a lot of users. You may want to switch to yet another strategy at some point, while not all your users are migrated to the intermediate strategy yet. Chaining the migrations like this allows you to successfully verify against all three strategies, still allowing logins for even your oldest users who haven’t logged in in a while, and still migrating all passwords to the newest possible algorithm.

Examples

Examples for the usage of the pwhash library are hosted in the repository on GitHub.
Currently, there are two examples. One uses only the Argon2Strategy and upgrades parameters within that implementation. The second example uses the MigrationStrategy to show code that migrates users from a legacy Pbkdf2Strategy to a more modern Argon2Strategy.
The best thing about these examples: The code that actually authenticates users is exactly the same in both examples. The only thing that is different is the HashStrategy that is plugged in. This is the main selling point of the pwhash library: You write your authentication code once, and when you want to upgrade, all you have to do is plug in a new strategy.

Where to get it

pwhash is available on Maven Central. For new applications, it is recommended to only use the -core artifact. Support for PBKDF2 is mainly targeted at legacy code bases wanting to migrate away from PBKDF2. JavaDoc is available online on the project homepage. JAR archives are also available on GitHub, but it is strongly recommended you use maven instead.

JavaChannel’s Interesting Links podcast, episode 13

Welcome to the thirteenth ##java podcast. It’s Tuesday, January 30, 2018. Your hosts are Joseph Ottinger (dreamreal on IRC) and Andrew Lombardi (kinabalu on IRC) from Mystic Coders. We have a guest this podcast: Cedric Beust, who’s always been very active in the Java ecosystem, being a factor in Android and author of TestNG as well as JCommander and other tools – and it’s fair to say that if you’ve used modern technology, Cedric’s actually had something to do with it. Really.
As always, this podcast is basically interesting content pulled from various sources, and funneled through the ##java IRC channel on freenode. You can find the show notes at the channel’s website, at javachannel.org; you can find all of the podcasts using the tag (or even “category”) “podcast”, and each podcast is tagged with its own identifier, too, so you can find this one by searching for the tag “podcast-13”.
A topic of discussion from ##java last week centers on code coverage: what numbers are “good”? What numbers can be expected? What’s a good metric to consider? Joseph likes (apparently) absurdly high numbers, like 90% or higher; Cedric recommends 50% code coverage as a good baseline; Andrew targets 70%. Expect a poll in the channel on this! It’s a really good discussion, but it’s not really going to be summarized here; listen to the podcast!

  1. Grizzly – an HTTP server library based on NIO – has been donated to EE4J. That’s not particularly interesting in and of itself, but there’s a question of whether all the projects being donated to EE4J imply an abandonment of Java EE as a container stack. It may not be; after all, EE4J is an umbrella just like Java EE itself is, so this may be very much what we should expect – which makes pointing it out as news rather odd. (The original item was from Reddit.)

  2. Pivotal gave us a really interesting article, called “Understanding When to use RabbitMQ or Apache Kafka.” Kafka and RabbitMQ are both sort of message-oriented, but there’s a lot of confusion about when you’d use one against the other; this article discusses both RabbitMQ’s and Kafka’s strengths and weaknesses. It would have been nicer to talk about AMQP as opposed to RabbitMQ, but the article works nonetheless. Kafka is a high-performance message streaming library; it’s not transactional in the traditional sense; it’s incredibly fast. AMQP is slower (but still really fast, make no mistake) and provides traditional pub/sub and point to point messaging models. The main point of the article, though, is that if you need something other than a traditional model, Kafka is there… but it’s going to involve some effort.

  3. Gradle 4.5 has been released. It’s supposedly faster than it was, and has improvements for C/C++ programmers. It also has better documentation among other changes; Gradle’s good, and this release is important, but it’s not earth-shattering. This discussion veered off quickly and sharply to Cedric’s homegrown build tool, kobalt – and mentioned Eclipse’ Aether library, since migrated to Apache under the maven-resolver project.

  4. More Java 9 shenanigans: Java EE modules – including CORBA, specifically – aren’t part of the unnamed module in Java 9. This comes to us courtesy of InfoQ, which pointed out CORBA specifically – CORBA being harder to reach isn’t really a big deal, I’d think, because nobody’s intentionally dealt with it who hasn’t absolutely had to. And it’s not really a Java EE module, really, so pointing out the removal along with Java EE is accurate but misleading. What does this mean? Well, if you’re using one of the nine modules removed, you’re likely to have to include flags at compilation and runtime to make these modules visible for your app. (See http://openjdk.java.net/jeps/320 for the actual Java Enhancement Proposal.)

  5. There’s a Java Enhancement Proposal for multiline strings. It’s in draft, but has Brian Goetz’ support; this is one of those features that Java doesn’t have that’s left people wondering why for a long time, I think – every other JVM language seems to include it. This doesn’t come up very often – if it was actually all that critical it would have been done a long time ago – but it’ll be nice to see it when (and if) it does make it into Java. It’s done with backticks; it does not use interpolation. Interesting, though.

  6. Baeldung has an article called “The Trie Data Structure in Java,” which, well, presents a Trie. It’s a good article, and explains the data structure really well – but doesn’t explain why you’d use a Trie as opposed to some other similar data structures. Tries represent a tradeoff between data size and speed; Tries tend to be incredibly fast while being more memory-hungry than some of their counterparts. Incidentally: there’s a question of pronunciation! “Trie” is typically pronounced the same was as “tree” is – while Joe pronounces it like “try” and struggled mightily to concede to peer pressure and say “tree.” Naturally, he was inconsistent about it; early pronunciation was in fact like “try” but, as stated, convention says “tree.” And it is a tree structure…

  7. Simon Levermann, sonOfRa on the channel, published a reference to his new pwhash project, a result of a series of discussions that seem to have gone on for a few weeks on the channel. It’s a password hashing library; it provides a unified interface to a set of hashing algorithms, like argon2 and bcrypt.