Dustin Boswell - Brain Dumps about Computers, Programming, and Everything Else
dustwell.com About Me Past Academic Work
Blog Posts How to hash passwords Snapshotting with Rsync "An hour" vs. "A hour" Yes/No proposition bets div, span, and CSS "display:" Pair programming + screen SSH keys in 2 easy steps Fixing Your Flaky Internet How X-over-SSH works Drinking Distilled Water djb-dns installation Vim Cheat Sheet Oil ETFs leak money
Storing User Passwords Securely: hashing, salting, and Bcrypt
June 18, 2012

In this article, I'll explain the theory for how to store user passwords securely, as well as some example code in Python using a Bcrypt library.

Bad Solution #1: plain text password

It would be very insecure to store each user's "plain text" password in your database:
user account plain text password
john@hotmail.com password
betty@gmail.com password123

This is insecure because if a hacker gains access to your database, they'll be able to use that password to login as that user on your system. Or even worse, if that user uses the same password for other sites on the internet, the hacker can now login there as well. Your users will be very unhappy.

(Oh, and if you think no one would ever store passwords this way, Sony did just this in 2011.)

Bad Solution #2: sha1(password)

A better solution is to store a "one-way hash" of the password, typically using a function like md5() or sha1():
user account sha1(password)
john@hotmail.com 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8
betty@gmail.com cbfdac6008f9cab4083784cbd1874f76618d2a97
Even though the server doesn't store the plain text password anywhere, it can still authenticate the user:
def is_password_correct(user, password_attempt):
    return sha1(password_attempt) == user["sha1_password"]

This solution is more secure than storing the plain text password, because in theory it should be impossible to "undo" a one-way hash function and find an input string that outputs the same hash value. Unfortunately, hackers have found ways around this.

One problem is that many hash functions (including md5() and sha1()) aren't so "one-way" afterall, and security experts suggest that these functions not be used anymore for security applications. (Instead, you should use better hash functions like sha256() which don't have any known vulnerabilities so far.)

But there's a bigger problem: hackers don't need to "undo" the hash function at all; they can just keep guessing input passwords until they find a match. This is similar to trying all the combinations of a combination lock. Here's what the code would look like:

database_table = {
  "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8": "john@hotmail.com",
  "cbfdac6008f9cab4083784cbd1874f76618d2a97": "betty@gmail.com",

    if sha1(password) in database_table:
        print "Hacker wins! I guessed a password!"

You might think that there are too many possible passwords for this technique to be feasible. But there are far fewer common passwords than you'd think. Most people use passwords that are based on dictionary words (possibly with a few extra numbers or letters thrown in). And most hash functions like sha1() can be executed very quickly -- one computer can literally try billions of combinations each second. That means most passwords can be figured out in under 1 cpu-hour. Programs like John The Ripper are able to do just this.

Aside: years ago, computers weren't this fast, so the hacker community created rainbow tables that have pre-computed a large set of these hashes ahead of time. Today, nobody uses rainbow tables anymore because computers are fast enough without them.

So the bad news is that any user with a simple password like "password" or "password123" or any of the billion most-likely passwords will have their password guessed. If you have an extremely complicated password (over 16 random numbers and letters) you were probably safe.

Also notice that the code above is effectively attacking all of the passwords at the same time. It doesn't matter if there are 10 users in your database, or 10 million, it doesn't take the hacker any longer to guess a matching password. All that matters is how fast the hacker can iterate through potential passwords. (And in fact, having lots of users actually helps the hacker, because it's more likely that someone in the system was using the password "password123".)

sha1(password) is what LinkedIn used to store its passwords. And in 2012, a large set of those password hashes were leaked. Over time, hackers were able to figure out the plain text password to most of these hashes.

Summary: storing a simple hash (with no salt) is not secure -- if a hacker gains access to your database, they'll be able to figure out the majority of the passwords of the users.

Bad Solution #3: sha1(FIXED_SALT + password)

One attempt to make things more secure is to "salt" the password before hashing it:
user account sha1("salt123456789" + password)
john@hotmail.com b467b644150eb350bbc1c8b44b21b08af99268aa
betty@gmail.com 31aa70fd38fee6f1f8b3142942ba9613920dfea0

The salt is supposed to be a long random string of bytes. If the hacker gains access to these new password hashes (but not the salt), it will make it much more difficult for the hacker to guess the passwords because they would also need to know the salt. However, if the hacker has broken into your server, they probably also have access to your source code as well, so they'll learn the salt too. That's why security designers just assume the worst, and don't rely on the salt being secret.

But even if the salt is not a secret, it still makes it harder to use those old-school rainbow tables I mentioned before. (Those rainbow tables are built assuming there is no salt, so salted hashes stop them.) However, since no-one uses rainbow tables anymore, adding a fixed salt doesn't help much. The hacker can still execute the same basic for-loop from above:

    if sha1(SALT + password) in database_table:
        print "Hacker wins! I guessed a password!", password
Summary: adding a fixed salt still isn't secure enough.

Bad Solution #4: sha1(PER_USER_SALT + password)

The next step up in security is to create a new column in the database and store a different salt for each user. The salt is randomly created when the user account is first created (or when the user changes their password).
user account salt sha1(salt + password)
john@hotmail.com 2dc7fcc... 1a74404cb136dd60041dbf694e5c2ec0e7d15b42
betty@gmail.com afadb2f... e33ab75f29a9cf3f70d3fd14a7f47cd752e9c550
Authenticating the user isn't much harder than before:
def is_password_correct(user, password_attempt):
    return sha1(user["salt"] + password_attempt) == user["password_hash"]
By having a per-user-salt, we get one huge benefit: the hacker can't attack all of your user's passwords at the same time. Instead, his attack code has to try each user one by one:
for user in users:
    PER_USER_SALT = user["salt"]

    for password in LIST_OF_COMMON_PASSWORDS:
        if sha1(PER_USER_SALT + password) in database_table:
            print "Hacker wins! I guessed a password!", password

So basically, if you have 1 million users, having a per-user-salt makes it 1 million times harder to figure out the passwords of all your users. But this still isn't impossible for a hacker to do. Instead of 1 cpu-hour, now they need 1 million cpu-hours, which can easily be rented from Amazon for about $40,000.

The real problem with all the systems we've discussed so far is that hash functions like sha1() (or even sha256()) can be executed on passwords at a rate of 100M+/second (or even faster, by using the GPU). Even though these hash functions were designed with security in mind, they were also designed so they would be fast when executed on longer inputs like entire files. Bottom line: these hash functions were not designed to be used for password storage.

Good Solution: bcrypt(password)

Instead, there are a set of hash functions that were specifically designed for passwords. In addition to being secure "one-way" hash functions, they were also designed to be slow.

One example is Bcrypt. bcrypt() takes about 100ms to compute, which is about 10,000x slower than sha1(). 100ms is fast enough that the user won't notice when they log in, but slow enough that it becomes less feasible to execute against a long list of likely passwords. For instance, if a hacker wants to compute bcrypt() against a list of a billion likely passwords, it will take about 30,000 cpu-hours (about $1200) -- and that's for a single password. Certainly not impossible, but way more work than most hackers are willing to do.

If you're wondering how Bcrypt works, here's the paper. Basically the "trick" is that it executes an internal encryption/hash function many times in a loop. (There are other alternatives to Bcrypt, such as PBKDF2 that use the same trick.)

Also, Bcrypt is configurable, with a log_rounds parameter that tells it how many times to execute that internal hash function. If all of a sudden, Intel comes out with a new computer that is 1000 times faster than the state of the art today, you can reconfigure your system to use a log_rounds that is 10 more than before (log_rounds is logarithmic), which will cancel out the 1000x faster computer.

Because bcrypt() is so slow, it makes the idea of rainbow tables attractive again, so a per-user-salt is built into the Bcrypt system. In fact, libraries like bcrypt on pypi store the salt in the same string as the password hash, so you won't even have to create a separate database column for the salt.

Let's see the code in action. First, let's install it:

sudo apt-get install libffi-dev libssl-dev
sudo pip install bcrypt
python -c "import bcrypt"   # did it work?
Now that it's installed, here's the Python code you'd run when creating a new user account (or resetting their password):
from bcrypt import hashpw, gensalt
hashed = hashpw(plaintext_password, gensalt())
print hashed    # save this value to the database for this user

Let's dissect that output string a little:

As you can see, it stores both the salt, and the hashed output in the string. It also stores the log_rounds parameter that was used to generate the password, which controls how much work (i.e. how slow) it is to compute. If you want the hash to be slower, you pass a larger value to gensalt():

hashed = hashpw(plaintext_password, gensalt(log_rounds=13))
print hashed

Notice that there is now a 13 where there was a 12 before. In any case, you store this string in the database, and when that same user attempts to log in, you retrieve that same hashed value and do this:

if hashpw(password_attempt, hashed) == hashed:
    print "It matches"
    print "It does not match"

You might be wondering why you pass in hashed as the salt argument to hashpw(). The reason this works is that the hashpw() function is smart, and can extract the salt from that $2a$12$... string. This is great, because it means you never have to store, parse, or handle any salt values yourself -- the only value you need to deal with is that single hashed string which contains everything you need.

Final Thoughts: choosing a good password

If your user has the password "password", then no amount of hashing/salting/bcrypt/etc. is going to protect that user. The hacker will always try simpler passwords first, so if your password is toward the top of the list of likely passwords, the hacker will probably guess it.

The best way to prevent your password from being guessed is to create a password that is as far down the list of likely passwords as possible. Any password based on a dictionary word (even if it has simple mutations like a letter/number at the end) is going to be on the list of the first few million password guesses.

Unfortunately, difficult-to-guess passwords are also difficult-to-remember. If that wasn't an issue, I would suggest picking a password that is a 16-character random sequence of numbers and letters. Other people have suggested using passphrases instead, like "billy was a turtle for halloween". If your system allows long passwords with spaces, then this is definitely better than a password like "billy123". (But I actually suspect the entropy of most user's pass phrases will end up being about the same as a password of 8 random alphanumeric characters.)

blog comments powered by Disqus