Statistics filter: How do I calculate the keyspace size of a password?
August 27, 2011 1:35 AM   Subscribe

How do I figure out what the size of this password's keyspace is?

I currently work for a large company with an archaic IT infrastructure and am forced to change my password every 90 days. While I don't specifically think that such a policy is unwarranted, I am constantly annoyed by the arbitrary restrictions that are placed on the passwords that they will allow me to use. I'm currently estimating the total number of distinct passwords that are possible in this system to be in the realm of 160 trillion, an astonishingly small keyspace for a modern password.

Help me figure out what the exact size of the keyspace is given the following requirements:
  • Must be exactly 8 characters.
  • Must contain at least 1 uppercase character
  • Must contain at least 1 lowercase character
  • Must contain at least 1 number
  • Must contain a leading letter (upper or lower)
  • May contain up to 2 special characters ($ or # only)
  • May not have repeating characters
posted by vmrob to Computers & Internet (10 answers total) 2 users marked this as a favorite
 
Uppercase characters: 26
Lowercase characters: 26
Digits: 10
Characters including specials (upper, lower, digits, special): 26+26+10+2=64

We have to choose an upper or a lowercase letter first, so we'll say 26*2=52.

Now, the other characters are in no particular order. We want to pick 1 character from the uppercase/lowercase set we didn't use before (from 26), 1 number (from 10), and then five more characters from a set of 61 (none of the characters already picked - so 61 choose 5, or 5949147). Multiplying those together gives us 5949147*26*10 for the seven non-leading characters; putting those into any order we multiply our space by 7!=5040, and then we multiply back in the initial 52 (for our leading character).

This gives us:
5949147*26*10*5040*52, which google says is about 4.05379636 × 10^14

To put this in perspective, compare this with an exactly-8-character password selected only from letters and numbers, with no restrictions like "at least 1 of these" or "no repeating", and you get 62^8 ~= 2.2*10^14.

It's been forever since I did stats, someone please check my math? But I'm reasonably certain the thought process is right.
posted by spaceman_spiff at 2:02 AM on August 27, 2011


Can you clarify "no repeating characters"? Does that mean you can't have two capital As in succession, or that you can't have two capital As at all?
posted by The Shiny Thing at 2:11 AM on August 27, 2011


Best answer:
  1. leading letter: 52 (25*2 characters left)
  2. letter of opposite case: 25 (24*2 characters left)
  3. mandatory digit: 10 (9 digits + 24*2 characters left)
  4. pick 9 digits + 24*2 characters + 2 special - 59 (subtract 1 or 2 characters each time depending on weather picking special/digit/character)
  5. 58
  6. 57
  7. 56
  8. 55
perl -MList::Util=sum -E 'say sum(map{log}52,25,10,59,58,57,56,55)/log(2)'
42.8284526970396

So 42.8 bits... This is assuming A=a when counting duplicates. And it makes your best bet to use both special characters and all digits where you can. Choosing a character like 'z' removes 'Zz' from the choices. It's the same if case is significant, your choices start going down by one character for each you pick. Things like ignoring the special characters really don't change the outcome much, taking characters and subtracting 2 from the available each time gives:

perl -MList::Util=sum -E 'say sum(map{log}52,25,10,59,57,55,53,51)/log(2)'
42.5634625763891

posted by zengargoyle at 2:40 AM on August 27, 2011 [1 favorite]


The reset-every-n-days policy is cargo-cult security - at best it's of almost no benefit and can sometimes make things worse. Unfortunately it's often incredibly difficult to explain this because it sounds so reasonable at first.

If they have published a very strict, specific password content policy then it becomes a template for guessing the password, particularly because people will tend to satisfy it as closely as possible - if it says "at least one uppercase" most people will include one uppercase letter, etc. This isn't going to be a problem for casual attempts but is something to bear in mind if your company may attract attention from more serious threats.

The mathematical bit size for particular policy is usually much higher than the actual range in passwords created using the policy, as normal people do not choose random passwords. Password guessing software also does not use purely random passwords. The mathematical, theoretical security properties of the password are often far less of a risk than the actual human use of the passwords. There's a lot of psychology involved as well as mathematics.
posted by BinaryApe at 3:54 AM on August 27, 2011 [1 favorite]


Response by poster: I should have clarified: No repeating consecutive characters where lower case and upper case characters are treated as different. "A" does not equal "a".

bLizz4rd would be invalid whereas bLizZ4ard would be fine.

It is this requirement and the requirement of up to 2 special characters that makes this a difficult problem to me.

b#IzZ$#d would be invalid because it contains 3 special characters as would anything with 4, 5, 6, or 7 special characters. Of course, if you have 6 or 7 special characters, you would break some of the other rules as you wouldn't have room for a number or uppercase and lowercase letters.
posted by vmrob at 9:55 AM on August 27, 2011


Just to point out the standard way most people get around the "change your password every X days" rule, just add a digit at the end and increment it every time you change the password.

You can pretty much bet the farm that the vast majority of people with forced regular changes do exactly that - Which, incidentally, has bearing on your original question, in that it effectively reduces the real-world password complexity to seven chars, times ten.

As an aside... exactly eight? That makes me think they chose the rule more to deliberately make it possible to crack your passwords on an overnight-ish scale of time, than out of a genuine interest in security. Given modern CPU power, anything less than ten chars and you may as well just use some token "keeps out the honest folks" password, because anyone determined to get in, will.
posted by pla at 6:36 PM on August 27, 2011


Best answer: Perhaps the Password Haystack analysis page at grc.com would help. It looks like it does all of the math for you, if you can supply an example of a password that matches your criteria.
posted by Wild_Eep at 7:01 PM on August 27, 2011


No repeating consecutive characters where lower case and upper case characters are treated as different. "A" does not equal "a".

That doesn't change things much. I still back-of-the-envelope calculate between 42 & 43 bits of entropy using basically the same approach as zengargoyle above. To help you wrap your head around the special characters thing, you might try doing the same sort of calculation for 0, 1, and 2 special characters.
posted by axiom at 8:37 PM on August 27, 2011


Yeah, I tried thinking for a while and the more detailed rules don't really change it much, maybe one extra bit.
  1. 52 - letter
  2. 26 - letter of opposite case, automagically not a duplicate letter
  3. 10 - digit
  4. 52 + 10 + 2(specials) - the digit you chose previously = 63
  5. 52 + 10 + 2(specials) - the character previously chosen
  6. ditto on and on. You always loose one character to avoid duplication.
52*26*10*63*63*63*63*63. If you choose the special characters, they don't come back so you end up with some 62*61*61... in there. Either way you get around 43 bits plus some fractiona l bit. An extra bit would be doubling the space, but is peanuts compared to forcing a digit or upper and lower case which give you a 62->26 or 62->10 reduction of factors. Same with the two special characters, not much of a difference, they actually hurt you if you choose them because they reduce the available choices.

# best case
perl -MList::Util=sum -E 'say sum(map{log}52,26,10,63,63,63,63,63)/log(2)'
43.6092071486691

# no special characters available at all
perl -MList::Util=sum -E 'say sum(map{log}52,26,10,61,61,61,61,61)/log(2)'
43.376494218984

posted by zengargoyle at 9:32 PM on August 27, 2011


The up to two special characters part is easy. Just find three keyspaces.
1 - find keyspace with zero special characters
2 - find keyspace with one special character
3 - find keyspace with two special characters
posted by BurnChao at 2:38 AM on August 28, 2011


« Older How do you avoid getting physical when a kid just...   |   Am I being flirted with without knowing it? Newer »
This thread is closed to new comments.