He's not pointing to the length of the password, but to something subtler. He's basically claiming that when asked to create strong passwords (i.e. with numbers, mixed case, and punctuation), people end up following a predictable formula. The formula is explained in the first panel, with an estimate of how much variation there is in each piece, measured in bits. E.g. a number can be stored in just three bits (actually wrong, you need four),I was rounding to the closest bit, since that gives you a smaller error on the resulting time than rounding up—even though four is the number of bits required (since for storage you obviously have to round up). But there's actually another reason–I don't think all numbers are equally probable, so the entropy is actually less than 3 bits. Here's a table of ending digit frequency from an actual set of decrypted passwords:
while a single bit suffices to indicate whether there's an initial capital or not.I actually cheated a little on that one, because "staple" isn't actually in the 2048-most-common wordlist I checked, but it sounded funnier to me :P (and it depends on your choice of list anyway.)
The four-common-words password is also a formula, and his estimate of 11 bits for each one is equivalent to assuming that the words are chosen from a list of 2048 words. (Which is an adequate definition of "common".)
I think he's underestimating the "strong password" variability-- e.g. three bits for "common substitutions" means he thinks there are only 8 possibilities, which is awfully low. It looks like he's just counting the vowels, but there are other easy substitutions.There are other easy substitutions, but they're made in a pattern, so the practical entropy is again reduced. People usually either do all the numeral substitutions or none, and the more esoteric ones (like 7 for t) come up a lot less frequently. That aside, you also have to look at how frequently there are opportunities for substitutions in the pool of base words. I came up with my number based on frequencies of common and uncommon substitutable letters in a list of six- to nine-letter words. But my guesses for what substitutions are most common could, of course, be wrong!
On the other hand, I think he's underestimating the pool of dictionary words too. No one has the OED memorized, but an educated speaker knows at least ten times that number of words, so his total goes up to 56 bits. And even more if you use a quirky non-dictionary word.I did random sampling from the default Debian dictionary and from a few other corpuses, and against whatever algorithm they have at http://testyourvocab.com/ and decided that even if people *were* picking randomly, 60,000 was a generously high estimate for the number of base words in their vocabulary. 30,000 is closer to typical based on dictionary words. I suspect including non-dictionary words doesn't expand the list nearly as much as one might think; we spend all our lives reading and learning dictionary words, and comparatively little of the text we write pulls from any larger vocabulary of strings. Now, if I took into account the fact that people are without a doubt not capable of generating anything close to a "random word", my entropy would actually be a serious overestimate, but I decided to be generous with that one to make the comparison more fair—since I was assuming that in the passphrase example, the person had a good method for picking a random word. In practice, I bet if you asked a bunch of people on the street to pick a random word they thought no one else would guess, the result would have 8 or 9 bits of entropy at best; I mean, half of them would say "lol, ok, i'm so random ... penguin!"
cinnamon leopard thunder boycycleA dictionary attack *could* get the first three words, although it would take a mindboggling amount of work to get all three of those in the right order. However, "boycycle" does not exist in the dictionary, so now you just have a very long, secure password.
"If you've got access to the password hash, I'm told that rainbow tables make passwords of any reasonable length mostly useless, unless the password+salt exceeds the size of the rainbow table, which becomes more and more improbable each day.Rainbow tables aren't magical. In fact, even without a salt a strong password won't get cracked.
So, where does the highly-entropic password actually still have a use?"
#!/usr/local/bin/perlAs my /usr/share/dict/words has 234936 line, the first n lines of output are a password with n * 17.8 bits of entropy. You should however select more familiar words and conjugate them to obtain something I'll remember, greatly reducing this entropy, but you'll certainly avoid the penguin penguin penguin problem.
use Tie::File;
tie (@D, 'Tie::File', "/usr/share/dict/words", mode => O_RDONLY) or die;
print ($D[rand $#D+1] . "\n") foreach (1..15);
untie @D;
#!/usr/local/bin/perlYou could slightly tweak up the entropy by using a flatter probability distribution on the vowels.
%vowel_freqs = (a => 82, e => 127, i => 70, o => 75, u => 28, y => 20);
while (($c,$f) = each %vowel_freqs) { push(@vowels,$c) foreach (1..$f); }
foreach $i (1..10) {
do { $c = chr(ord('a') + rand 26) } while ( $c =~ /[aeiouy]/ );
print $c;
print $vowels[rand $#vowels+1] if ($i % 2);
}
You are not logged in, either login or create an account to post comments
The Wikipedia article on Passphrases gives a good overview.
posted by Gary at 1:25 AM on August 10, 2011 [1 favorite]