Random Numbers In Excel, Without Duplicates, Based On Weighted Value
May 8, 2014 12:43 PM   Subscribe

I need a way to generate a random set of numbers in excel, with some numbers having a more likely chance to be selected based on a weighted value associated with that number. But without the number being duplicated.

I am trying to create a random list of 5 questions from a list of 25 questions. Some of these questions are more important than others, and therefore need to be generated more often than others. I would like to tie a weighted score to those more important questions which would in turn increase the possibility that it would be generated more often than others.

I would then like to create an additional 5 questions, but I do not want them to be a repeat of any that were generated the first time.

Any ideas?
posted by ndillon to Computers & Internet (10 answers total) 4 users marked this as a favorite
rand() function for the random number from 0 to 1.
Then make a range. If rand() returns say a number from 0-0.4 then that's question #1, the rest are questions #2-25 (for example).
posted by St. Peepsburg at 12:49 PM on May 8, 2014 [1 favorite]

Try Testmoz or another, similar site.

I used to do this when I was a teacher. I'd make 4 versions of the test and scramble the questions to foil cheating.
posted by Ruthless Bunny at 12:50 PM on May 8, 2014

Create a random number between zero and one for each question via Excel's random number generator. Divide each number by the weighting value for that particular question. Take the questions with the lowest ten numbers.
posted by Mapes at 12:50 PM on May 8, 2014 [1 favorite]

This is not in excel-speak, but this is the process I would go through to do this:

Make the weighted score an integer between 0 and 10. For the first run, assign each question a score from 1-10 (no zeroes).

Make a list of questions as follows: For each question, add it to the list the number of times equal to its weighted score (1 means 1 time (poor Prim!), 5 means 5 times, etc).

Choose 5 random numbers between 1 and the number of questions on the list. Those are your 5 questions for the first iteration.

For the next iteration, change the weight on the first 5 picked to 0 so they won't end up on the question list at all.
posted by sparklemotion at 12:54 PM on May 8, 2014

So technically random numbers means no weighting. Each number has the same likelihood of being picked and can be repeated. So, question 3 has a likelihood of appearing twice/thrice etc within 5 questions.

I think to tackle your problem , it would be better if you make two categories. (High weight , low weight). For example Question 1-5 (High weight). Question 5-25(Low weight).

The test should have 1 or 2 high weight and 3 low weight equally across all students to be fair. Then use randbetween(1-5) for 2 high weight and randbetween(1-20) for 3 low weight.
Based on number generated , pick question from bank.
posted by radsqd at 1:08 PM on May 8, 2014

Take the questions with the lowest ten numbers.

Follow-up: it looks like you can sort automatically in Excel by right-clicking the sheet tab and inserting the following code:

Private Sub Worksheet_Change(ByVal Target As Range)
Target.EntireColumn.Sort Key1:=Target, Order1:=xlAscending, Header:=xlGuess, OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom
End Sub

Some tweaking could make the desired 5+5 questions show up automatically when the random numbers are generated.
posted by Mapes at 1:10 PM on May 8, 2014

I wouldn't mess with macros for this. Just use the formula as Mapes suggests, turn autofilter on, and set to filter to the bottom 10 values on the random number column. Then just hit Data > Reapply (in the Sort & Filter group) to re-roll the dice.
posted by zixyer at 1:13 PM on May 8, 2014

What I want to use this for is Audits at work. I have a database of 25 questions.

I want a formula that randomly chooses questions, but takes into account how many times we had an audit findings.


Question 1 - 10 Findings
Question 2 - 5 Findings
Question 3 - 7 Findings
Question 4 - 7 Findings
Question 5 - 11 Findings
Question 6 - 10 Findings
Question 7 - 0 Findings
Question 8 - 2 Findings
Question 9 - 5 Findings
Question 10 - 6 Findings

Based on that list, I would like the probability that Questions 1, 5, and 6 chances of being selected are greater than say Questions 7 or 8. But not necessarily so that 1, 5, 6 are always the first to be selected?

So then to say that out of 10 times, maybe 1,5,6 are selected 50% of the time, while 7 and 8 maybe 25% of the time.

Does this explain it better? Maybe?
posted by ndillon at 1:17 PM on May 8, 2014

Yeah, create a column for the random numbers and use the formula

=RAND() / (IF(B2=0,1,B2)

(assuming the number of findings is in column B and your data starts on row 2.)

RAND generates a number between 0 and 1 and you're dividing by the number of findings. So if there's 1 finding, it would give you a number between 0 and 1, but if there are 10 findings, it would give you a number between 0 and 1/10.

Therefore 90% of the time the random number for the item with 10 findings will be less than the random number for the item with 1 finding.

The turn on the filters (Data > Filter), and filter the random number for the bottom 10 values (click the drop down arrow on the column, then pick Number Filters > Top 10 > Bottom 10 items).

When you want to get a new random set, use the command Data > Reapply.
posted by zixyer at 1:24 PM on May 8, 2014 [1 favorite]

So technically random numbers means no weighting. Each number has the same likelihood of being picked and can be repeated.

All it means for something to be random is that it varies based on chance. The probability a given number is picked doesn't have to be the same to be random. It just so happens that most random number generators (like RAND in excel) are designed so that all numbers are given equal weight. This is because most of the time that's what people want. Thankfully there are ways to change the distribution, as demonstrated by the above answers.

To further explain something about zixyer's answer: the IF statement in the formula is used to cover the situation where there were 0 findings which would generate an error because you can't divide by 0. When there are 0 findings that formula will treat it as if there was 1 finding. Otherwise the formula uses the number of findings from column B. The downside to this is that questions with 0 or 1 findings are given the same probability. If you want to give findings of 0 and 1 different probabilities then you could use this formula:

=RAND() / (B2+1)
posted by Green With You at 9:28 AM on May 9, 2014

« Older Quick! Cameras in Atlanta?   |   Prevent staining on crappy plastic laminate... Newer »
This thread is closed to new comments.