pickle teapot enthusiasm membrane
December 15, 2011 4:53 AM   Subscribe

this xkcd panel talks about human friendly passwords, that there is more entropy in four random english words eg: 'pickle teapot enthusiasm membrane' than the standard password format of about ten gibberish characters 'S@Nt&KLaWs'. Has anyone integrated this idea into a public-key cryptography system to allow keys that are words therefore more memorable, or encrypting and hashing into english words? what [ are / would be ] the [problems/other interesting qualities] of such a system?
posted by compound eye to Computers & Internet (27 answers total) 5 users marked this as a favorite
Do see this earlier thread about this.
posted by Gyan at 4:56 AM on December 15, 2011

(earlier thread probably says the same thing in other ways) my uneducated guess is that concatenatedwordpasswords have more entropy just as long as gibberish password can be used as well -- if you're only using concatenated words entropy and available passwordspace shrink considerably.
posted by 3mendo at 4:59 AM on December 15, 2011

posted by 3mendo at 5:00 AM on December 15, 2011

Response by poster: actually what I want to know about is not whether randall is write, but whether this idea could be/has been applied to public key cryptography.

I once had to exchange to challenge phrases and responses over the phone, they were alpha numeric sequences:

'D9abh23...' which spoken is 'Capital D, Number 9, lower case A' etc

(26+26+10) different characters as the encoding symbols meant long painful exchanges, actually I cant remember if there were lowercase so maybe there were only 36 symbols

if instead there was a selection of several thousand words as the symbols (with no homonyms) we could have had more entropy with fewer symbols and the spoken exchanges would have been shorter and more human friendly:

'donkey houseboat banana...'
posted by compound eye at 5:14 AM on December 15, 2011

OPIE generates passwords consisting of a sequence of short 'words' (they're not actually words, but they are pronounceable).
posted by pharm at 5:20 AM on December 15, 2011

Response by poster: I've re-read my original post and realize I wasn't very clear. It looks like I'm talking about passwords. I'm wondering if the idea of using words instead of character strings has been applied to encryption mechanisms, specifically public key cryptography
posted by compound eye at 5:20 AM on December 15, 2011

Best answer: PGP will produce a human readable collection of words to allow you to verify another user's public key over the phone.
posted by devnull at 5:23 AM on December 15, 2011

Best answer: Well, it would be trivial to encode each pair of characters to a word, which would need (26+26+10)²=3844 words to choose from.

A cleverer and easier to use system might use a Markov chain to make the words stick together a little better, and would filter the dictionary words to be a little more distinct from each other (you probably don't want both weather and whether in the dictionary, let alone wether (a castrated male goat).

But it would be a very fun project. If PGP hadn't done it already.
posted by ambrosen at 5:29 AM on December 15, 2011

Best answer: It is possible, but impractical for longer key length. The Oxford dictionary contains 176000 words in current use, which is 17 bits of information. A pgp key of 128 bits can be expressed with 8 words. But what if we would only use common words? Then the dictionary would be around 8000 words, which is 13 bits. A bigger key, 1024 bits, can be expressed in 79 words.
posted by Psychnic at 5:39 AM on December 15, 2011

Response by poster: thank you devnull, i didn't know that
i'll read up on what tricks pgp has up its sleeve.

ambrosen, I can't follow - what would you would be using the markov chain for?

(bedtime in australia, I'll come back tomorrow)
posted by compound eye at 5:42 AM on December 15, 2011

But what if we would only use common words?

Practically, you could also add in a lot of proper nouns like Bob and Sarah and Utah and France, which are just as memorable as common words without being obscure.
posted by smackfu at 5:47 AM on December 15, 2011

Potentially related.
posted by devnull at 6:04 AM on December 15, 2011

I think you are right, but as Psychnic says, it doesn't save as much as it would seem.

I assume you are talking about using each word as a "character" or symbol. IE, the numbers 0-9 are base 10- each symbol has 10 different possibilities. The alphanumerics are base-62- each symbol has 62 possibilities. So if you choose say, 1000 common words as your alphabet, your system is base-1000. A one-character password has 1000 possible guesses. A two character password has 10002 possible guesses, a three character password has 10003, and so on.

But the trouble is that inside the computer, each "character" has to be encoded into binary. One bit of entropy is where you have two choices, 0 or 1. Two bits of entropy is where you have four choices, 00, 01, 10 or 11. Three bits is 8 choices, four is 16, and so on. Two (number of choices per character) to the power of the number of characters. To convert between your arbitrary number of choices per character and how it is represented in binary, you take the log2 of the number of choices. (The logx of y being "to what power do I have to raise x to to get y?") Then you multiply that by the number of characters you are using.

So, using 1000 common words, a single character password has 1000 different possibilites, which is 9.96 bits of entropy per character, so a 6 "character" password has 59.79 bits of entropy. While it takes 10 characters of alphanumerics to get 59.54 bits of entropy. If you make the pool 5000 words, you only reduce your word-character count by one.

What xkcd was saying is that doghouseelephant is way easier to remember (and say) than f3Gi81d, it has WAY more entropy because as far as the computer is concerned, it's just 16 characters versus 7. Or 95 bits of entropy versus 41. Even if an attacker knows that you are only using words, he doesn't know the lengths of the words, the length of the whole password, and so on. As far as he is concerned, that password has 52 bits of entropy, while the 7 character one only has 41 bits, because he has to go through the entire dictionary (170,000 words) times the entire dictionary times the entire dictionary (170,0003, or 4.193*1015 iterations), versus 627, or 3.522*1012 iterations).
posted by gjc at 7:35 AM on December 15, 2011 [1 favorite]

In other words, you gain more by adding length to a password than you do by adding complexity.
posted by gjc at 7:38 AM on December 15, 2011 [2 favorites]

Best answer: Wow a lot of the answers here are not really getting your question at all.

I'm wondering if the idea of using words instead of character strings has been applied to encryption mechanisms, specifically public key cryptography

The point of the words in the xkcd panel is to exploit people's natural ability to remember specific words to help them use what amounts to a very large random number as their password. With an 8000 word dictionary, a random 4 word sequence gives you over 10^16 values, whereas an 8 character 26 (lowercase) + 26 (upercase) + 10 (numerals) + 10 (symbols) random password is just under 10^16 values which is comparable. The xkcd panel points out that people usually don't bother making a truly random 8 character password though, which means that there are a lot less actual passwords being used in that scheme. Both of those schemes are easier than memorizing a 16 digit base-10 number though, which is the numerical equivalent.

With public key cyrptography, the keys involved are much larger. The 2^1024 key which is the standard minimum key size for RSA, is 10^308 possible values. That's huge. Memorizing a random value that large would be nearly impossible no matter what scheme you use, which is why nobody is expected to memorize keys in public key encryption schemes. Your public key is not secret, so you can write it down in 10 different places. Your private key is secret, but there's no reason for you to ever give it to anyone else because it's not used for authentication, so you can physically keep it yourself in digital form somehow.

'D9abh23...' which spoken is 'Capital D, Number 9, lower case A' etc

(26+26+10) different characters as the encoding symbols meant long painful exchanges, actually I cant remember if there were lowercase so maybe there were only 36 symbols

What you're talking about here is having an encoding scheme that works well when exchanging data through human voices. English is pretty terrible for this sort of thing, in that h and H do not have distinctive one or two syllable phonemes and even ones that do have unique names are not distinct enough (so if you are exchanging hex values, 3, B, and C all sound enough alike that there will be transmission errors). The UN phonetic alphabet that devnull linked to above is the most popular solution to this problem, and the words chosen for it are purposely distinct enough that they can be transmitted correctly even over significant noise (such as a flaky radio connection).

Most encoding schemes designed for data like public keys are not designed to be spoken aloud, but instead exchanged over things like email that require values to be in the form of standard printable characters. So you end up with the various binary-to-text schemes, such as hex or base64. As devnull also mentioned, PGP has its own scheme for voice based transmission and that Wikipedia article has a few other examples listed of similar schemes that are less well-known. I don't know offhand if that scheme would be much faster than the standard phonetic alphabet if the speakers were both experienced with it. Also note that other schemes designed for plaintext transmission do not work well for random data, Morse code for example is excellent for text transmission but performs much more poorly when each character is equally likely. Overall the gains you get for going word-based as an encoding scheme is much less in terms of speaking clarity than it is for memorization.
posted by burnmp3s at 7:53 AM on December 15, 2011

I'm hoping 40-50 character long phrases in a variety of languages will be safe enough.

Not that I have anything of immense value to protect.
posted by flippant at 9:08 AM on December 15, 2011

From what I've read (and correct me if this is out of date), but a password of sufficient length (which I think was something like 12 or more characters) is pretty much impossible to crack, even using the latest techniques and government supercomputers, in anything approaching a reasonable amount of time.

So something to the effect of "I freaking hate passwords." would be super super safe, and the presence of non-characters and uppercase letters would satisfy most corporate networks retardiculous "complexity" requirements.
posted by Afroblanco at 11:21 AM on December 15, 2011

Response by poster: Yes gjc you're right that a long word based pass phrase takes more memory to represent in a binary machine if it stored as text. But i'm assuming PGP doesn't do this, that it has some kind of sequence to dictionary look up, perhaps for each two characters as has been suggested.

but what I'm really interested in is how much memory it requires to represent in my head and my speech, and as you have pointed out that's where it is a (relative) winner. We've all spent a lot of time learning words, we can remember them as a chunk not a sequence, and they have all sorts of associations with them.

Even if I can ALPHA CHARLIE TANGO with the best of them, and even if I use 'anton caesar theodor' for the lower case, I'm using a very small subset of the words I know.

If I use 3844 words instead it will have the number of chunks I have to say or remember when compared to alphanumeric strings.
posted by compound eye at 2:20 PM on December 15, 2011

Response by poster: I just check the pgp wordlist less words than I expected but it pgp is using only one letter case it's just as effective
posted by compound eye at 2:35 PM on December 15, 2011

Best answer: Computer memory shouldn't be an issue for most of the cryptosystems you want to be using. Passphrases should be immediately hashed and/or strengthened to a fixed-length byte sequence. If I remember the PGP/GPG protocol properly, this key is used via conventional encryption to unlock the larger private key.

The PGP word list was primarily used to communicate key fingerprints, a 160-bit hash of the larger 1024 or 2048 public key. While you could use the wordlist to communicate a full public key, that would require 128 words using the PGP Wordlist or 16 words using the S/Key 2048 wordlist.
posted by CBrachyrhynchos at 3:04 PM on December 15, 2011

Response by poster: the S/Key 2048 looks like part of the answer I'm looking for.
I can memorize 16 words
posted by compound eye at 3:10 PM on December 15, 2011

Seriously though, if your security strategy involves remembering passphrases in your own head instead of using something like KeePass on your actual physical keyring, you're probably doing it wrong.

By using a personal password safe, the only passphrase you need to remember - indeed, the only password you need ever have seen - is the one that gets you into the safe itself, and because you use that every time you need to authenticate for any of the services you use, you'll be typing it frequently enough to keep it in muscle memory. And provided you can remember it, a randomly generated sequence of 16 letters and numbers is quicker to type than a 16 word passphrase.

My own passwords.kdb file currently holds authentication details for 24 different services. Each of those includes a randomly generated password with at least 120 bits of entropy, none of which I know. I actually have a good memory for this kind of thing, but I would probably react to a need to maintain a mental list of 24 long passphrases by doubling-up on at least some of them, or at the very least feeling a strong disincentive to give them the six month expiry time that's so easy to set up with KeePass.
posted by flabdablet at 7:34 PM on December 15, 2011 [2 favorites]

The company I work for has a partnership with a large credit card company where we pre-screen guests on our website (who give us their name/address information) for credit card offers. Since our guests are accessing a secure section of the CC company's website, we have to send credentials along in an HTTP request. Currently the password is something like "understand-delicious-groaning-meatball" - basically a string of four plaintext words.

I don't know for sure what they're doing on their end, but it could very well be that this string gets decrypted on the CC company's site somehow to make the actual password. Reading through this thread has made me very curious. I know we don't encrypt this passphrase on our end so at what point does this become really secure?
posted by bendy at 11:19 PM on December 15, 2011

PLEASE tell me that the HTTP request you're sending those credentials in is actually an HTTPS request.
posted by flabdablet at 12:03 AM on December 16, 2011

It's not about how much memory it takes up, but the computational power to crack. Each bit of entropy doubles the number of guesses a computer has to make. No matter what the original "alphabet" you use, it has to get converted to binary at some point inside the computer, so that's the common denominator used. Binary bits. The more bits of entropy, the stronger the scheme will be. And if one scheme has the same number of bits of entropy as another, they are equally secure.

Each "character" in the password only has so many possible values. Using words adds a lot of possible values to each character, but the logarithmic nature of adding those possibilities together into a string of "characters" means that you gain a LOT more by stringing spots together than you do by adding complexity.

The point is that adding characters to your alphabet reaches a point of diminishing returns, ie, a logarithmic curve. Because it all comes down to binary inside the computer, every time you double the wordcount, you only add one bit of entropy to each spot.

Or, like you mention, if you are using the 256 character alphabet, in order to cut the length of the password in half, you have to add 15x as many words to your dictionary.

16 words using the s/key alphabet gets you ~160 bits of entropy. If you instead just used plaintext and concatenated a string of those same dictionary words together, you would only need to remember approximately 12 words to get the same amount of entropy.

It doesn't really matter whether it is PGP or any other thing. As far as I know, you can create your own public key.
posted by gjc at 8:30 AM on December 16, 2011

Because it all comes down to binary inside the computer, every time you double the wordcount, you only add one bit of entropy to each spot.

Just to be clear, "wordcount" here refers to the size of the list of candidate words, not to the number of those eventually strung together to make a passphrase.

If there are P words in the passphrase, each of which is selected at random from a list of W words, then the entropy of each word is log2(W) and the entropy of the whole passphrase is P × log2(W).

It might be thought that using your entire vocabulary as a candidate word list must yield stronger passphrases than choosing from a restricted list like the PGP word list, but in fact this might not be true at all; human beings are notoriously poor random number generators, and skewing your choices toward words you remember more easily or use more often could easily drop the entropy of at least some of them below 8 bits per word. Using lots of words picked mechanically at random from a restricted list is the right way to do it.
posted by flabdablet at 7:11 PM on December 16, 2011

« Older Cool machines for techie fianc√©?   |   Guess the number of Jelly Beans, Win a Prize! Newer »
This thread is closed to new comments.