Not all random numbers are created equal
October 25, 2009 11:30 PM   Subscribe

How do I get a controlled distribution of random numbers to fairly determine a start position.

In a sporting event, start position is decided based on the last digit of your registration number. Each week, random numbers are drawn to decide the start order. For example, the random draw order for a single week is 4, 0, 3, 5, 1, 8, 7, 9, 6, 2. So everyone with a number ending in 4 starts first. Everyone with a number ending in 0 starts second. And so on. The next week the draw order is again random.

While this works, the distribution of of numbers can end up being unfair (one particular number can be "lucky" or "unlucky" for many weeks). Statistically, how would one generate a set of "random" start orders so that the value of each registration number was roughly equal over the course of a season (for ease of calculation, let's say 10 weeks).

I don't know anything about math or statistics, so my description of this situation probably uses lots of words incorrectly. I'd google this, but I don't even know how to start.

Basically, is it possible for value of all of the numbers to even out. But in a random order. So, for example, that 0's aren't always going after 4's. And one group doesn't always start in the middle.
posted by monkeystronghold to Education (21 answers total)
 
As you've just found out, random numbers are only fair when the number of random numbers generated is large compared to the number of possibilities. If fairness is what you're after, you need to reduce your randomness, or increase the number of weeks you're evaluating fairness over. In fact, given N possible starting positions for N events, complete fairness is not even possible.

For 10 positions and events, you won't do better than round-robin position allocation, where you run the first event with starting order 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; second event with 1, 2, 3, 4, 5, 6, 7, 8, 9, 0; third with 2, 3, 4, 5, 6, 7, 8, 9, 0, 1 and so on until 9, 0, 1, 2, 3, 4, 5, 6, 7, 8. That gives everybody a chance at starting first, but it does mean that people with a number ending in N will start before people with a number ending in N+1 in nine out of ten events. Whether that's adequately counterbalanced by the fact that in the one out of ten events where that start order is reversed it's way reversed will, I'm sure, cause loads of debate.

You can probably improve the perceived fairness of round-robin by using all ten cyclic permutations of a random draw, rather than those of a sequential numbering. So your first event could go 4, 0, 3, 5, 1, 8, 7, 9, 6, 2; your second 0, 3, 5, 1, 8, 7, 9, 6, 2, 4 and so on until 2, 4, 0, 3, 5, 1, 8, 7, 9, 6. Base the next season on a different initial draw, and hopefully your participants will never catch on :-)
posted by flabdablet at 12:47 AM on October 26, 2009


We'd like each digit to get each starting position exactly once over the course of the races, right? So we could just let each one start the race in order:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0],
[2, 3, 4, 5, 6, 7, 8, 9, 0, 1],
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2],
[4, 5, 6, 7, 8, 9, 0, 1, 2, 3],
[5, 6, 7, 8, 9, 0, 1, 2, 3, 4],
[6, 7, 8, 9, 0, 1, 2, 3, 4, 5],
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6],
[8, 9, 0, 1, 2, 3, 4, 5, 6, 7],
[9, 0, 1, 2, 3, 4, 5, 6, 7, 8]]

That's totally fair, but boring. 3 always follows 2 always follows 1. Let's mix things up by scrambling the columns. By moving a whole column at a time, we're maintaining the fairness. Each digit still gets one chance in each starting position, but in a random order:

[[5, 6, 2, 9, 4, 8, 0, 7, 1, 3],
[6, 7, 3, 0, 5, 9, 1, 8, 2, 4],
[7, 8, 4, 1, 6, 0, 2, 9, 3, 5],
[8, 9, 5, 2, 7, 1, 3, 0, 4, 6],
[9, 0, 6, 3, 8, 2, 4, 1, 5, 7],
[0, 1, 7, 4, 9, 3, 5, 2, 6, 8],
[1, 2, 8, 5, 0, 4, 6, 3, 7, 9],
[2, 3, 9, 6, 1, 5, 7, 4, 8, 0],
[3, 4, 0, 7, 2, 6, 8, 5, 9, 1],
[4, 5, 1, 8, 3, 7, 9, 6, 0, 2]]

Ah, but this is still a bit too predictable. If I know that position 5 went first last time, I know that position 6 will go first next time. Let's mix things up by scrambling the rows. Again, we're maintaining the fairness by moving a whole row at a time:

[[2, 3, 9, 6, 1, 5, 7, 4, 8, 0],
[6, 7, 3, 0, 5, 9, 1, 8, 2, 4],
[4, 5, 1, 8, 3, 7, 9, 6, 0, 2],
[8, 9, 5, 2, 7, 1, 3, 0, 4, 6],
[1, 2, 8, 5, 0, 4, 6, 3, 7, 9],
[5, 6, 2, 9, 4, 8, 0, 7, 1, 3],
[0, 1, 7, 4, 9, 3, 5, 2, 6, 8],
[3, 4, 0, 7, 2, 6, 8, 5, 9, 1],
[7, 8, 4, 1, 6, 0, 2, 9, 3, 5],
[9, 0, 6, 3, 8, 2, 4, 1, 5, 7]]

Each digit appears once in each row and each column, so this is still fair, but it's pretty random.
posted by aneel at 1:25 AM on October 26, 2009


Best answer: Here's some python code that generates these matrices, in case that would be useful:

import random

# build fair matrix
first = range(10)
races = [[(f + i) % 10 for f in first] for i in range(10)]

# choose an order for the columns
columns = range(10)
random.shuffle(columns)

# apply that order to the columns
races = [[race[c] for c in columns] for race in races]

# shuffle the rows
random.shuffle(races)
print races
posted by aneel at 1:31 AM on October 26, 2009 [1 favorite]


And aneel's answer boils down to the same as flabdablet's, I think ("all ten cyclic permutations of a random draw"). Nice exposition.
posted by hattifattener at 1:33 AM on October 26, 2009


flabdablet brings up a good point about the relative advantages of numbers. The method I suggested doesn't attempt to make things fair between pairs of racers, just overall. For example, if you look at the example I gave, 1 gets to go before 3 in 6 of the 10 races. I think that really bad pairwise matchups will be rare, but they're definitely possible (since one of the possible scrambles is no scramble, which has the 9 of 10 problem).
posted by aneel at 1:45 AM on October 26, 2009


hattifattener: flabdablet's suggestion makes it possible to trivially tell what the order of the second race will be based on the first.
posted by aneel at 1:51 AM on October 26, 2009


I'm going to take a more low tech look at this. Make some cards with 0-9 on them. Make some piles for starting 1st - 10th and make sure these piles are as hard to manipulate as possible each holding a set of those 0-9 cards we made earlier. Pick a number randomly from each pile for each starting position before the race. Don't put that number back in the pile.

aneel's method is great, but since a human is still in charge of putting the numbers in an order there can still be some bias in the ordering.

With mine it's more work and if people have been paying attention they'll know the order of the last race (or last race before you restock the piles) when you announce the order of the second to last race.

Short of manually picking the numbers and making sure that every number starts before any other number 5 times I don't know of any solution that would make sure 1 doesn't start ahead of 3 an unfair amount of the time.

It's also roughly what you've been doing so it isn't too foreign of a system. The difference here is that you're not putting the number back in the pile after you determine the order for the race. 10 is a terrible sample size, so don't worry about how things even out over 10 weeks.
posted by theichibun at 4:17 AM on October 26, 2009


Best answer: After thinking about this a bit more, it seemed to me that doing repeated faro shuffles on your starting positions ought to give you a more satisfactory spread than doing cyclic permutations. So I learned just enough python to try it out:

def faro(list):
    cut = len(list) / 2
    for i in range(0, cut):
        list.insert(i * 2, list.pop(cut + i))

start = range(10)

for i in range(10):
    print start
    faro(start)

which results in the following output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 0, 6, 1, 7, 2, 8, 3, 9, 4]
[2, 5, 8, 0, 3, 6, 9, 1, 4, 7]
[6, 2, 9, 5, 1, 8, 4, 0, 7, 3]
[8, 6, 4, 2, 0, 9, 7, 5, 3, 1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[4, 9, 3, 8, 2, 7, 1, 6, 0, 5]
[7, 4, 1, 9, 6, 3, 0, 8, 5, 2]
[3, 7, 0, 4, 8, 1, 5, 9, 2, 6]
[1, 3, 5, 7, 9, 0, 2, 4, 6, 8]

which doesn't immediately reveal any gross fairness faults. And if you do a random draw for the first start rather than using sequential numbers, then do faro shuffles for the next nine, like this

import random

def faro(list):
    cut = len(list) / 2
    for i in range(0, cut):
        list.insert(i * 2, list.pop(cut + i))

start = range(10)
random.shuffle(start)

for i in range(10):
    print start
    faro(start)

you can get something like this:

[2, 5, 8, 0, 1, 9, 6, 4, 3, 7]
[9, 2, 6, 5, 4, 8, 3, 0, 7, 1]
[8, 9, 3, 2, 0, 6, 7, 5, 1, 4]
[6, 8, 7, 9, 5, 3, 1, 2, 4, 0]
[3, 6, 1, 8, 2, 7, 4, 9, 0, 5]
[7, 3, 4, 6, 9, 1, 0, 8, 5, 2]
[1, 7, 0, 3, 8, 4, 5, 6, 2, 9]
[4, 1, 5, 7, 6, 0, 2, 3, 9, 8]
[0, 4, 2, 1, 3, 5, 9, 7, 8, 6]
[5, 0, 9, 4, 7, 2, 8, 1, 6, 3]

which is equally fair and looks random enough to satisfy the average sports fan.
posted by flabdablet at 6:21 AM on October 26, 2009


First, let me note that the mechanism you're currently using should generate real, honest to god random numbers. But being random means that sometimes a number will be on the short end for a few trials. Random doesn't mean "fair," and one of the commonest ways that humans screw up random number selection is to enforce "fairness" on the string of numbers they're producing. The easiest way to make randomization "fairer" is to just do it more often, so that instead of one random draw each week you have one random draw for each individual competition.

Here are two easy ways to generate random starting orders on a per-competition basis instead of a per-week basis.

The easiest way is if you have a computer with you at these sporting events. Enter the names quickly in Excel. Add a column called random, fill that column with "=rand()". This will assign every athlete a random number between 0 and 1. Then sort on that number, and the athletes will be in a random order.

Less easy but still easy way, if you don't have a computer. First decide on an initial ordering. Maybe alphabetically; it doesn't matter. Then, before you show up, use Excel to generate a long list of random numbers between zero and one. Next to each random number, print a box. On game day, if the first competition has ten athletes, put their names down in the first ten slots on the basis of your initial ordering. This will assign each athlete a random number. Then look at the numbers and lowest goes first, etc. If the second competition has nine athletes, put their names down in eleven through nineteen. And so on; just keep moving through your list of random numbers.

Alternately, if your sporting event has levels like heats and quarters and semis and finals, an option is to use the random order of the week for the initial round, and then use the reverse of the random order of the week for the next round, and then back to the order, and then reverse it, ad infinitum.
posted by ROU_Xenophobe at 6:38 AM on October 26, 2009


Random doesn't mean "fair," and one of the commonest ways that humans screw up random number selection is to enforce "fairness" on the string of numbers they're producing.

Yeah, that was kind of my initial point. But the primary goal here is to generate demonstrable fairness for the purpose of avoiding complaints from aggrieved competitors, and this is probably best done by avoiding mathematically satisfactory randomness except in the initial draw for the first start.
posted by flabdablet at 6:56 AM on October 26, 2009


an option is to use the random order of the week for the initial round, and then use the reverse of the random order of the week for the next round, and then back to the order, and then reverse it, ad infinitum

If you do that, you will get complaints from competitors who always end up starting near one another, and from competitors who never get to start in pole position. The faro shuffle method spreads things around well enough to avoid those complaints.

Incidentally, for ten possible starting positions, the sixth through tenth faro shuffled lineups are in fact the exact reverses of the first through fifth.
posted by flabdablet at 7:01 AM on October 26, 2009


If you want maximum fairness, you should also do the shuffling with the full list of registration numbers, not just on the last digit.
posted by Rhomboid at 7:43 AM on October 26, 2009


theichibun: aneel's method is great, but since a human is still in charge of putting the numbers in an order there can still be some bias in the ordering.

There's no need for human intervention in the method I suggested. See the code I posted.
posted by aneel at 9:01 AM on October 26, 2009


I'm going to take a more low tech look at this. Make some cards with 0-9 on them. Make some piles for starting 1st - 10th and make sure these piles are as hard to manipulate as possible each holding a set of those 0-9 cards we made earlier. Pick a number randomly from each pile for each starting position before the race. Don't put that number back in the pile.

So the number on the card is the digit who starts in that place? What happens when, for the first race, you draw "4" from pile 1 and pile 2, and "5" from no piles at all?
posted by aneel at 9:03 AM on October 26, 2009


flabdablet's faro shuffle method is predictable after the second race. In the example, if I'm #9, I can look to see what position #2 had last time and I know that'll be my number this time. With the randomized starting numbers, I can't figure out who I'm "following" until the second race, but after that, it's easy.

(For all schemes where everybody gets one turn in every position, the order will become more predictable as the races go on. The last race will have everyone in the position they've never been in. The second to last race each player will only have two positions they've never been in, so they know they have a 50% chance of each. And so on.)
posted by aneel at 9:16 AM on October 26, 2009 [1 favorite]


Response by poster: These are great answers, everyone. Thanks.

Just for some clarification. While doing this for each individual racer would be the best way, it would be time consuming. The sporting event has anywhere from 1200 - 1700 participants, broken up over 7 or 8 races. The time involved with lining people up individually would make it impossible.

And I agree with the statements above that what we're seeking is fairness above true randomness. But the appearance of being un-guessable for the following week is valuable.
posted by monkeystronghold at 9:24 AM on October 26, 2009


flabdablet: Incidentally, for ten possible starting positions, the sixth through tenth faro shuffled lineups are in fact the exact reverses of the first through fifth.

This is a neat observation. This means that this ordering is pairwise fair, since for every race I get to go before you, there's a race where you get to go before me.

You could make this method less predictable by generating the race positions using this method, and then choosing them in a random order.
posted by aneel at 9:26 AM on October 26, 2009


"choosing them in a random order" is unclear. I mean scrambling the list of complete races.
posted by aneel at 9:33 AM on October 26, 2009


Yes, I was just thinking the same thing.

import random

def faro(list):
    cut = len(list) / 2
    for i in range(0, cut):
        list.insert(i * 2, list.pop(cut + i))

start = range(10)
random.shuffle(start)

starts = []
for i in range(10):
    starts.append(start[:])
    faro(start)

random.shuffle(starts)
for start in starts:
    print start

Sample result:

[9, 8, 6, 4, 7, 3, 5, 1, 0, 2]
[4, 1, 9, 7, 0, 8, 3, 2, 6, 5]
[2, 0, 1, 5, 3, 7, 4, 6, 8, 9]
[1, 7, 8, 2, 5, 4, 9, 0, 3, 6]
[6, 3, 0, 9, 4, 5, 2, 8, 7, 1]
[5, 6, 2, 3, 8, 0, 7, 9, 1, 4]
[0, 5, 7, 6, 9, 2, 1, 3, 4, 8]
[8, 4, 3, 1, 2, 9, 6, 7, 5, 0]
[3, 9, 5, 8, 1, 6, 0, 4, 2, 7]
[7, 2, 4, 0, 6, 1, 8, 5, 9, 3]

I could get used to this Python thing!
posted by flabdablet at 3:21 PM on October 26, 2009


How I would do this depends on what "fair" means in this sport. But here's my simple proposal (no computer needed).

Get a deck of cards and pull out Ace through 10. These are your digits. Shuffle those ten and lay them out. This is your first run. Then take the first five and last five, shuffle them separately and lay them back out but in the opposite group (previous first five is now the last five). This has the advantage of being simple to understand, randomish, and you can quickly see the worst and best cases (1st and 6th is best, 5th and 10th is worst). A small and probably better variation would be to split them up into groups of 3a, 3b, and 4, shuffle those, then lay them back out as 4, 3a, and 3b. This has the advantage of eventually shuffling across those sub groups but has the disadvantage of being harder to see the best/worst cases.
posted by chairface at 6:20 PM on October 26, 2009


If you want to do manually something equivalent to what my Python script does:

Select ten cards, Ace through 10, from a deck and put the rest aside.
Randomly shuffle them, then lay them out in a line, face up.

Do the following ten times:
  • Write down the corresponding digits on a fresh slip of paper.
  • Slide the second, fourth, sixth, eighth and tenth cards down, to turn the line OEOEOEOEOE into two like OEOEOEOEOE.
  • Slide all the E cards to the left and the O cards to the right to get EEEEEOOOOO.
You now have ten slips of paper with one starting lineup on each. Shuffle these (or draw them from a hat) and that's your starts program done.

The shuffle in the middle of that is actually the inverse of a faro, but for ten cards on a tabletop it's easier to get right and in this application it does the same job.
posted by flabdablet at 6:51 PM on October 26, 2009


« Older Is there a UK teasmade that will work in the U.S.?   |   Where was Mr. Show filmed (besides Los Angeles... Newer »
This thread is closed to new comments.