Optimize my experimental setup
January 5, 2024 6:13 AM   Subscribe

My experimental setup is analog to the following (no people involved): I have 8 groups of 4 subjects and their weight. There are 4 treatments, so in the end each treatment has 2 groups with 8 subjects. I want the variation in weight between the groups and the variation inside the group to be minimal. For this scenario I brute forced it, there are 2520 possible permutations (8!/2!^4) and I calculated ANOVA for all of them and picked the one with the highest p value. I did this in R on my work PC and it took about 3 minutes. It would be possible two split each group in half, and then I could optimize the group composition. However, there are too many possibilities to brute force it. I can distribute the 4 individuals in each group in three ways: aabb abab abba. Then I have 3^8=6561 possibilities to combine these groups. Now I have 16 groups for my four treatments, but to get all of the combinations is too much for my PC, there are 16!/4!^4=63063000 of them for every one of my 6561 possible split scenarios. How would I go about finding the perfect group split for my experiment?
posted by SweetLiesOfBokonon to Science & Nature (6 answers total) 2 users marked this as a favorite
 
The standard for this in the vast, vast majority of applications is to just randomize group assignment. Is there a reason you can’t do that?
posted by A Blue Moon at 8:57 AM on January 5 [2 favorites]


Another option could be stratified random assignment using weight bins as your strata.
posted by neutralhydrogen at 11:20 AM on January 5 [1 favorite]


When you say you want the variation in weight between the groups and the variation inside the group to be minimal, do you mean you want each of those to be minimized mathematically? Outside of some really specific sets of weights in your subject, there's not going to be a way to minimize both at the same time.
posted by augustimagination at 12:45 PM on January 5 [1 favorite]


Best answer: There is a famous problem in decision theory called the "Secretary Problem".

Let's say you're trying to hire a secretary and you have a pool of 100. You don't have time to interview all 100. Instead you interview, say, 10 of them and then hire the next interviewee who is better than all 10 of them. You can read more about it (and the problem set up is not exactly like yours) but the basic idea is, once you have sampled some random fraction of the entire population and chosen the best from among that group, the chances of finding another candidate who is vastly superior are quite small. And in a normal distribution, there are going to be "better" candidates out there, but the marginal gains over the best candidate you find by choosing the best among a fairly small, randomly selected subgroup, are generally going to be very small indeed.

Point is, if you can generate even 100 randomly selected combinations from your problem space, and choose the best from among those, there is a good chance that best choice of 100 is pretty darn close to the best there is.

Continue generating another 100 randomly selected combinations and choose the best from among the 200. Is the new choice much better than the first? Do the same with another 100, and another 100, and so on.

As you do this, the point of diminishing returns will soon become very obvious to you. You could do another million simulations and maybe increase your best answer by another 2% or whatever.

What you need to accomplish this is some way to generate random distributions throughout the space (ideally without any bias, or at least without too much bias) and some way to compare two of the generated distributions to decide which is better.
posted by flug at 12:52 PM on January 5


Maybe I’m misunderstanding the problem but you’ve already optimized your 8 groups of 4, right? So why are you treating this as if you are now trying to optimize 16 groups of 2?

Why not just look at each group of 4 separately and see if you should pair the four people up as 1-2, 1-3, or 1-4?
posted by cali59 at 1:46 PM on January 5


Response by poster: Thanks flug, I did exactly what you said and after about 300 iterations through random groupings I got a p-Value of 1 for one of the random combinations!.
posted by SweetLiesOfBokonon at 3:18 AM on January 30 [1 favorite]


« Older What is going on with this guy at work   |   Eldercare from a distance Newer »

You are not logged in, either login or create an account to post comments