Why are my bank's confirmation numbers non-random?
December 27, 2022 3:39 PM   Subscribe

When I log in my bank sends me a 7 digit confirmation number via SMS. I've noticed these are not random. They don't exactly conform to Benford's Law, but a leading digit of 1 is much over-represented (42% of all numbers), as is a leading 2 (26%). See the table of my last 137 confirmations here. I would have thought these would be generated by a random number generator.

Is it likely the method they use to generate these could be reverse engineered, and why wouldn't they just send a random number between 1,000,000 and 9,999,999?
posted by Rumple to Science & Nature (13 answers total) 7 users marked this as a favorite
 
Taking a look at my recent codes, they seem to be pretty decently random with the exception that none of them start with the same number as the phone number that the text comes from. If you're using a bigger bank, maybe they have more phone numbers and they're doing something similar? They might exclude a bunch of numbers for some reason, like numbers that match the pattern of their account numbers?
posted by Garm at 3:56 PM on December 27, 2022


Humans are really, really bad at detecting randomness. What we perceive as random is really distributed. One-time passwords (ie, the 6 digit codes they send to your phone or generate in your authenticator app) are typically pseudo-random using a time-based algorithm. According to this article in Wired, Google at the very least does omit some possible numbers that may be confusing or insensitive but they aren't generated based on a pattern.
posted by muddgirl at 4:09 PM on December 27, 2022 [4 favorites]


There are a number of standards relating to One Time Passwords. The main doc page for pyOTP links to them. Your bank probably uses pyOTP or an equivalent in another language, and almost certainly attempts to follow those standards, in addition to any additional OTP requirements that are banking specific.
posted by rockindata at 4:35 PM on December 27, 2022


This from your Benford's Law link intrigues me:
Another example is the leading digit of 2^n. The sequence of the first 96 leading digits (1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, ... (sequence A008952 in the OEIS)) exhibits closer adherence to Benford’s law than is expected for random sequences of the same length, because it is derived from a geometric sequence.[14]

If they took a random 7 digit chunk of the list of first digits of powers of 2, for example, they would get a Benford like distribution such as you see.

And if all their machines knew the number that generated the list of powers on a given day — I gather that it doesn’t have to be 2 — they wouldn’t have to have the machine that gave you the number and the machine that checked it communicate with each other to be able to quickly calculate that it was in fact a legitimate number.
posted by jamjam at 4:35 PM on December 27, 2022


Also consider that random number generators are only pseudo random.
posted by oceano at 4:53 PM on December 27, 2022 [2 favorites]


Why if they were taking random numbers up to 2^31 mod 10e7, or something? That pattern of the digits doesn’t suggest a benfords law like geometric pattern to me but rather some kind of threshold effect.
posted by goingonit at 5:12 PM on December 27, 2022 [3 favorites]


Best answer: Good eye, but I'm pretty sure you're conflating a few things. I can generate perfectly random numbers that skew heavily to those starting with one. I can generate random walks that always end where they started.

This is part of why mathematicians often prefer the word "stochastic" when being precise.

What your numbers are not is "uniformly random", ie every number in the universe of discourse has an equal chance of being picked.

But there are lots of stochastic processes (ie cannot be deterministically predicted, as in rolls of dice) that produce 'random' number streams from probability distributions that are far from uniform (hence the random walk that returns to its starting position).

So: having a certain digit (or lots of traits really) being overrepresented compared to a uniform distribution emphatically does not imply that the numbers are non-random, or that they can be easily guessed/cracked.

Finally: one-time codes are often decidedly intentionally deterministic and pseudorandom in very intentional ways, so that two independent devices can agree on a string of digits being 'correct' long after any secure communication between devices (or people) have been severed. This is more the heritage of a one-time-pad. Which don't have to be random at all to help a lot in security, just ideally hard to guess. A one-time pad can be a page from an agreed upon phone book, or the 'ma' page of a dictionary.

So: interesting stuff here for sure! But the issues are different than you seem to think. Happy to provide more detail/links if the above isn't clear :)
posted by SaltySalticid at 5:23 PM on December 27, 2022 [5 favorites]


Oh and to add: I get 2FA codes on my phone from an institution in blocks of 10, to aid in code distribution (what if I don't have SMS service). The first digits always progress 0-9. They even give a hint: your code starts with a '2' or whatever. Basically you should expect some aspects of a well-designed modern 2FA single-use codes to have some sort of usable structure to them that is there intentionally and is not considered a weakness due to lack of randomization.
posted by SaltySalticid at 5:36 PM on December 27, 2022


Best answer: SaltySalticid: you are talking about synced, shared-secret OTPs, right? I don't see any particular benefit to having structure to SMS-shared codes.

I have been fiddling to try and find some silly procedure to go from a set of integers chosen uniformly at random to what you see and here's what I can get:
  1. Pick an integer uniformly at random between 1,000,000 and 25,000,000
  2. If the integer is 8 digits long, divide by 10.
Why you'd do this, I don't know. The 2^31 thing made some kind of sense at least but 2s wouldn't be as common as you're seeing.

This procedure and related ones would have the distinctive property that numbers starting with 2 with high second digits would be less common -- if you check that, it may shed some more light on what's happening.
posted by goingonit at 5:44 PM on December 27, 2022 [1 favorite]


Best answer: I think goingonit is on the right track. More simply, you would get this kind of distribution by taking the leftmost seven digits of a random number less than 26,000,000,000,000,000, or any other large number that begins with 26 (or 25 or 27 or thereabouts).

So my guess is that they generate the OTP by taking the leftmost seven digits of a random unsigned 48-bit integer (2^48 = 281,474,976,710,656), or something like that.
posted by Syllepsis at 6:18 PM on December 27, 2022 [7 favorites]


One benefit I described: you can batch send the codes and then send a plain text, human-readable unique identifier. Granted it's not a huge benefit, but it's helped me a in that use case. Here's someone at Wired speculating that they may be designed to be more easily usable by humans, so that they can be typed over with less human error. I've definitely been annoyed at the time it takes to punch in really long pseudorandom codes carefully, so I'd vote that as a good application too.

I don't know how or why this specific system in question uses this structure, my main point is that structure and bias in the digits do not mean a sequence of numbers is non-random, and it does not necessarily mean things are prone to reverse engineering or being 'hacked'. I suppose it's true in theory that for a given string length, any structure is an aide to guessing. But that needn't be any barrier to using it anyway. As an example, you can just decide how many digits you need and then append whatever structure you want. The Wired article talks like 6 digits is a given, I often get 7 digit codes, and some systems make me wrestle with 10! So lots of room there.
posted by SaltySalticid at 6:05 AM on December 28, 2022


Why would they be completely random?

Sure, a random component is good but what they’re attempting to protect against is someone guessing the code they just sent to your phone/mail. Four digits would be fine for that and five would be overkill.

Seven strikes me as either people trying to impress you or people who have encoded additional data in the number.
posted by Tell Me No Lies at 9:22 AM on December 28, 2022 [2 favorites]


Best answer: Going from Syllepsis's comment, there is a pretty standard function drand48() that returns a random 48-bit number. There's probably a programming environment that uses it as a default random number generator. You would hope they would do better than just taking the left decimal digits, but it's probably good enough as is.
posted by Horselover Fat at 10:54 AM on December 28, 2022 [2 favorites]


« Older Podcasts discussing AI technology and applications   |   LED Light bulb in lamp can't be turned off.... Newer »
This thread is closed to new comments.