Oh, how very random (or, how best to randomise my experiment)
April 16, 2009 5:50 AM Subscribe
Okay, here is my dilemma. I'm trying to get an experiment running in which I present a series of BMPs to my participants. After months of pain and suffering, I have finally tweaked a set of stimuli which work for me, so I'm swimming in around 3500 BMPs.
But, since I can only torture my participants for an hour, I can only ask them to go through about 1200 of these BMPs, so I need some way to quickly and easily generate a selection of random BMPs from my bigger collection.
Here is where my dilemma comes in. I can't use any stock-standard randomisation, because these stimuli are broken up into a variety of different conditions, and I want to keep the number of stimuli from each condition consistent (say, 30 items from each of 24 conditions). Now, for added fun, I want to, at will, be able to ignore certain conditions and the stimuli which fall within those and create a list of trials only using what is left.
I have tried to use the randomisation and loop functions within the experimental control software that I'm suing, but the problem is that you have to declare every BMP that could possibly be used. This means that the application tries to load 3500 BMPs into RAM, which, apart from being slow and redundant, is impossible on a 32-bit system.
This also leads into another impossible demand I have. I would love to be able to somehow manipulate the output, so that it is easier to transfer it into the experimental control program I'm using (Neurobehavioral System's Presentation, in case anyone is interested).
Here is where my dilemma comes in. I can't use any stock-standard randomisation, because these stimuli are broken up into a variety of different conditions, and I want to keep the number of stimuli from each condition consistent (say, 30 items from each of 24 conditions). Now, for added fun, I want to, at will, be able to ignore certain conditions and the stimuli which fall within those and create a list of trials only using what is left.
I have tried to use the randomisation and loop functions within the experimental control software that I'm suing, but the problem is that you have to declare every BMP that could possibly be used. This means that the application tries to load 3500 BMPs into RAM, which, apart from being slow and redundant, is impossible on a 32-bit system.
This also leads into another impossible demand I have. I would love to be able to somehow manipulate the output, so that it is easier to transfer it into the experimental control program I'm using (Neurobehavioral System's Presentation, in case anyone is interested).
As odinsdream points out, it depends on how/if your software can import file lists.
As I'm lazy, I'd shuffle the images, then check the shuffled array to see if it matches my constraints. Purists will be horrified (no guarantee it will halt), but I have processor cycles to burn.
posted by Leon at 6:00 AM on April 16, 2009
As I'm lazy, I'd shuffle the images, then check the shuffled array to see if it matches my constraints. Purists will be horrified (no guarantee it will halt), but I have processor cycles to burn.
posted by Leon at 6:00 AM on April 16, 2009
Response by poster: @valkyryn, bitmaps. Those image files of yesteryear.
@odinsdream. Basically, I'm after something quick that I can use to script this thing in. I'd rather not fiddle with Excel and macros, which initially came to mind, but will try anything at this stage. I could use jpeg or png, but the only advantage is that it saves space on the HDD. Once Presentation loads the images into RAM, they are all the same size regardless of their original file size.
@Leon, I'm happy to do the import by hand as long as I can fiddle with the output so it looks the way I want it to. Any suggestions on what's best to use to get the shuffling done?
posted by doctor.dan at 6:07 AM on April 16, 2009
@odinsdream. Basically, I'm after something quick that I can use to script this thing in. I'd rather not fiddle with Excel and macros, which initially came to mind, but will try anything at this stage. I could use jpeg or png, but the only advantage is that it saves space on the HDD. Once Presentation loads the images into RAM, they are all the same size regardless of their original file size.
@Leon, I'm happy to do the import by hand as long as I can fiddle with the output so it looks the way I want it to. Any suggestions on what's best to use to get the shuffling done?
posted by doctor.dan at 6:07 AM on April 16, 2009
Response by poster: @odinsdream, I've bugged the Presentation folks about this a few times, and their replies to me, as well as others on the forums, is to stick to bitmap images over JPEGs as there is no real gain.
As for OS, both XP and Vista, both 32-bit. The way presentation handles this stuff is that you declare your images, as such:
bitmap {filename = "stimuli\\BMP\\FemaF1.C.C2.bmp"; } FemaF1CC2;
bitmap {filename = "stimuli\\BMP\\FemaF1.L.C2.bmp"; } FemaF1LC2;
…
and you then add these stimuli to different arrays:
array {
picture { description = "24;FemaF1CC2"; bitmap FemaF1CC2; x = 0; y = 0;};
picture { description = "24;FemaF1LC2"; bitmap FemaF1LC2; x = 0; y = 0;};
} Array1;
You can then manipulate the arrays using a custom programming language.
Ideally, I'd want my randomisation output to provide me with both the stuff that should go into my arrays, and what goes into my bitmap declaration. (Note: each individual bitmap will reside in four arrays because of how the stimuli work).
posted by doctor.dan at 6:23 AM on April 16, 2009
As for OS, both XP and Vista, both 32-bit. The way presentation handles this stuff is that you declare your images, as such:
bitmap {filename = "stimuli\\BMP\\FemaF1.C.C2.bmp"; } FemaF1CC2;
bitmap {filename = "stimuli\\BMP\\FemaF1.L.C2.bmp"; } FemaF1LC2;
…
and you then add these stimuli to different arrays:
array {
picture { description = "24;FemaF1CC2"; bitmap FemaF1CC2; x = 0; y = 0;};
picture { description = "24;FemaF1LC2"; bitmap FemaF1LC2; x = 0; y = 0;};
} Array1;
You can then manipulate the arrays using a custom programming language.
Ideally, I'd want my randomisation output to provide me with both the stuff that should go into my arrays, and what goes into my bitmap declaration. (Note: each individual bitmap will reside in four arrays because of how the stimuli work).
posted by doctor.dan at 6:23 AM on April 16, 2009
Response by poster: @odinsdream, AFAIK, Presentation is happy to read a textfile to get file names. Not sure if it will do the same for arrays though. The files created by Presentation are nothing more than Unicode text files with different extensions.
It is getting late at my end of the world, so I won't be able to respond until tomorrow. Thanks to everyone for the help thus far.
posted by doctor.dan at 6:26 AM on April 16, 2009
It is getting late at my end of the world, so I won't be able to respond until tomorrow. Thanks to everyone for the help thus far.
posted by doctor.dan at 6:26 AM on April 16, 2009
The page for your software contains instructions for how to load and unload images so that you don't have to have them all in memory at once. It seems like the answer is to create a matrix (or database, but a flat file will work), where each row in an image and there are columns for each characteristic you'd like to randomize on. If there is more than one such characteristic, defining the restricted randomization requires a little care.
Next write a loop in whatever software that you like which goes through the 24 groups of images that you'd like to randomize within, picks 30*N random elements where N is the number of subjects you'll test (without replacement if 30*N <>
Then a little text manipulation in perl or python turns the list of images for each scenario into an SDL file for your software.>
posted by a robot made out of meat at 6:39 AM on April 16, 2009
Next write a loop in whatever software that you like which goes through the 24 groups of images that you'd like to randomize within, picks 30*N random elements where N is the number of subjects you'll test (without replacement if 30*N <>
Then a little text manipulation in perl or python turns the list of images for each scenario into an SDL file for your software.>
posted by a robot made out of meat at 6:39 AM on April 16, 2009
Oh yeah, html
30*N < min(group_size), otherwise do this N times with 30 picks ), permutes the order, and tacks them onto the end of each of the scenarios you're building. That kind of randomization may not be optimal experimental design (depending on what you're going to be looking for), but it's easy to understand and implement.
Then a little text manipulation in perl or python turns the list of images for each scenario into an SDL file for your software.
posted by a robot made out of meat at 6:41 AM on April 16, 2009
30*N < min(group_size), otherwise do this N times with 30 picks ), permutes the order, and tacks them onto the end of each of the scenarios you're building. That kind of randomization may not be optimal experimental design (depending on what you're going to be looking for), but it's easy to understand and implement.
Then a little text manipulation in perl or python turns the list of images for each scenario into an SDL file for your software.
posted by a robot made out of meat at 6:41 AM on April 16, 2009
JPEG compression is lossy. The amount of information from the original file just isn't there, even when it's "uncompressed" for viewing.
Right. But RAM doesn't store information, it stores data. And the lossiest JPEG expands to just as much data as a 24-bit PNG of the same size, once it's uncompressed.
But back to the original question: this Presentation software requires you to predeclare all of your bitmaps, but it lets you do it in a text file? So write a randomization script in something else (Python, Perl, whatever you're already familiar with or can pick up fast) that produces such a properly formatted text file as output. Then when you need a new set you rerun your script (with some options to turn off specific conditions) and restart Presentation, which then only has to load the specific list of bitmaps that it will need for your next run.
posted by roystgnr at 6:52 AM on April 16, 2009
Right. But RAM doesn't store information, it stores data. And the lossiest JPEG expands to just as much data as a 24-bit PNG of the same size, once it's uncompressed.
But back to the original question: this Presentation software requires you to predeclare all of your bitmaps, but it lets you do it in a text file? So write a randomization script in something else (Python, Perl, whatever you're already familiar with or can pick up fast) that produces such a properly formatted text file as output. Then when you need a new set you rerun your script (with some options to turn off specific conditions) and restart Presentation, which then only has to load the specific list of bitmaps that it will need for your next run.
posted by roystgnr at 6:52 AM on April 16, 2009
Is there a reason to retain all 3600 images and not just pick 1200 and go with those? Or make 3 sets from 1-1200, 1201-2400 and 2401-3600? I mean if no one is ever going to see all 3600 what good are the extras (unless there is some arguable subjectivity in your sets you are trying to cancel out, but I can just argue you now have three times as much subjectivity).
Do any of your images appear in more than one stimulus set? Like, oh, lets call one set puppies and the other kittens. Is there ever a case when a kitting and a puppy are in a picture so that picture goes in both? (I think that's what your saying, but I want to make sure.) If so, give each condition a prime number and rename your files "product of primes"-"old name". Now you can write a pretty simple program that loads in the name and can tell what sets it belongs in by finding all the image names where the tag is neatly divisible by, say seven.
I'd do my proposed rename, make a text file of filenames and then write code that generated lists from that. Then you can grind out a thousand or so lists of previously randomized filenames and go through them one at a time.
posted by Kid Charlemagne at 7:03 AM on April 16, 2009 [1 favorite]
Do any of your images appear in more than one stimulus set? Like, oh, lets call one set puppies and the other kittens. Is there ever a case when a kitting and a puppy are in a picture so that picture goes in both? (I think that's what your saying, but I want to make sure.) If so, give each condition a prime number and rename your files "product of primes"-"old name". Now you can write a pretty simple program that loads in the name and can tell what sets it belongs in by finding all the image names where the tag is neatly divisible by, say seven.
I'd do my proposed rename, make a text file of filenames and then write code that generated lists from that. Then you can grind out a thousand or so lists of previously randomized filenames and go through them one at a time.
posted by Kid Charlemagne at 7:03 AM on April 16, 2009 [1 favorite]
If 30*N > group_size, when you go through the loop, make a temporary group from which you remove the ones you've already used. When the temporary group is less than 30 in size, use what's left and replace it with the original for that set and keep going. That way you avoid embarrassing randomizations like one image getting used every time or never (or the extreme, everybody gets the same set of 30).
posted by a robot made out of meat at 7:13 AM on April 16, 2009
posted by a robot made out of meat at 7:13 AM on April 16, 2009
I'm with Kid Charlemagne - use random sets of your stimuli and go from there. Does it even need to be exactly 1200 images? With enough samples, you may be able to get away with fewer images. How many Rorscarch inkblots were there? OK, OK, your field, not mine :humble bow to your superior knowledge:
As an *absolute* last resort, Powerpoint can shuffle images just fine, though I don't know if that's what you're looking for. It might take awhile for a 1200 slide 'presentation' to load, but if the computer is fast enough it shouldn't be an issue.
posted by chrisinseoul at 7:42 AM on April 16, 2009
As an *absolute* last resort, Powerpoint can shuffle images just fine, though I don't know if that's what you're looking for. It might take awhile for a 1200 slide 'presentation' to load, but if the computer is fast enough it shouldn't be an issue.
posted by chrisinseoul at 7:42 AM on April 16, 2009
Consider switching to more powerful, flexible tools like Matlab and Psychtoolbox. We've migrated almost entirely off Presentation for exactly these kinds of reasons.
posted by fake at 7:46 AM on April 16, 2009
posted by fake at 7:46 AM on April 16, 2009
This would be pretty trivial to accomplish with perl. You don't even need a database, though you'll have to set up the files initially.
Create a file to classify your conditions. You could use this example from perlmonks.
%AllConditions = {
'ConditionA' => {
1 => 'file1a',
2 => 'file2a',
3 => 'file3a',
...
},
'ConditionB' => {
1 => 'file1a',
5 => 'file2b',
6 => 'file3b',
...
},
...
}
Once that's done you'll just be able to use that file into your perl script and run through it and take out as many or as few items as you like.
A hash is overkill BUT you could keep your numbers unique so that if you DO have images in multiple conditions (as Kid C mentions) you could use that mechanism to make sure you don't duplicate images if you don't want to. I did it in my example above - you could write it so you keep track of used numbers as you read them so you don't re-use.
Once that's done you can just write out a template file. Just have your perl program use a standardized naming scheme (or pass it as a paramater) and you can have your actions and things kept seperate.
You don't even need to randomize across the conditions because you can use the template option to mix em up at that point.
Hope this gives you some ideas.
posted by phearlez at 12:11 PM on April 16, 2009
Create a file to classify your conditions. You could use this example from perlmonks.
%AllConditions = {
'ConditionA' => {
1 => 'file1a',
2 => 'file2a',
3 => 'file3a',
...
},
'ConditionB' => {
1 => 'file1a',
5 => 'file2b',
6 => 'file3b',
...
},
...
}
Once that's done you'll just be able to use that file into your perl script and run through it and take out as many or as few items as you like.
A hash is overkill BUT you could keep your numbers unique so that if you DO have images in multiple conditions (as Kid C mentions) you could use that mechanism to make sure you don't duplicate images if you don't want to. I did it in my example above - you could write it so you keep track of used numbers as you read them so you don't re-use.
Once that's done you can just write out a template file. Just have your perl program use a standardized naming scheme (or pass it as a paramater) and you can have your actions and things kept seperate.
You don't even need to randomize across the conditions because you can use the template option to mix em up at that point.
Hope this gives you some ideas.
posted by phearlez at 12:11 PM on April 16, 2009
Response by poster: a robot made out of meat, thanks for the tip. I have seen the unloading "feature" that Presentation has. This, however, renders the software useless as I lose milisecond accuracy (the only reason I'm willing to shell out $300 a year to use this application). It does look like getting familiar with Perl or Python might help me out.
Definitely agree that the randomisation will have to be sophisticated enough to not reuse stimuli in some silly way. My initial attempt at this experiment failed because the randomisation procedure built into Presentation shuffled all of the stimuli and selected from them, rather than the arrays, so I was left with some arrays whose stimuli were used 12 times, while others had their stimuli used 40 times in the one run.
roystgnr, my weakness is that I'm not a programmer, but have to pretend to be one when it comes to things like this. Looks like Perl or Python are the way to go.
Kid Charlemagne, splitting the list of stimuli might bring in sequence effects when the actual experiment is run, so any randomisation procedure is preferable. My main sticking point was that each individual bitmap sits in four different categories (say "kittens", "puppies", "ducklings" and "infants"). I think I might have to get my head around working with the arrays and not the actual image files (if I have 3456 image files and they take part in 4 categories, that is 13,824 items!).
chrisinseoul, looks like MeFi has spoken, the way to go is with pre-randomised lists of stimuli. I wish my stimuli were as sexy as Rorschachs (if you want an idea of what one of my stims looks like, take a look at this sample).
odinsdream, yeah, no way in hell I'll be using powerpoint (sorry chrisinseoul).
fake, I am seriously considering it, but I'm also at that frustrating stage with this project that I need some results now and don't have the time to invest in Matlab. I am definitely going to look into Matlab and Psychtoolbox/Cogent for all future experiments.
phearlez, thanks for the example. I think I'll give Perl a quick whirl and see what happens. This is one of the many times I wish I was more familiar with any programming language. I'll probably have to move onto using a hash for the filenames.
Thanks for all your help guys.
posted by doctor.dan at 3:42 PM on April 16, 2009
Definitely agree that the randomisation will have to be sophisticated enough to not reuse stimuli in some silly way. My initial attempt at this experiment failed because the randomisation procedure built into Presentation shuffled all of the stimuli and selected from them, rather than the arrays, so I was left with some arrays whose stimuli were used 12 times, while others had their stimuli used 40 times in the one run.
roystgnr, my weakness is that I'm not a programmer, but have to pretend to be one when it comes to things like this. Looks like Perl or Python are the way to go.
Kid Charlemagne, splitting the list of stimuli might bring in sequence effects when the actual experiment is run, so any randomisation procedure is preferable. My main sticking point was that each individual bitmap sits in four different categories (say "kittens", "puppies", "ducklings" and "infants"). I think I might have to get my head around working with the arrays and not the actual image files (if I have 3456 image files and they take part in 4 categories, that is 13,824 items!).
chrisinseoul, looks like MeFi has spoken, the way to go is with pre-randomised lists of stimuli. I wish my stimuli were as sexy as Rorschachs (if you want an idea of what one of my stims looks like, take a look at this sample).
odinsdream, yeah, no way in hell I'll be using powerpoint (sorry chrisinseoul).
fake, I am seriously considering it, but I'm also at that frustrating stage with this project that I need some results now and don't have the time to invest in Matlab. I am definitely going to look into Matlab and Psychtoolbox/Cogent for all future experiments.
phearlez, thanks for the example. I think I'll give Perl a quick whirl and see what happens. This is one of the many times I wish I was more familiar with any programming language. I'll probably have to move onto using a hash for the filenames.
Thanks for all your help guys.
posted by doctor.dan at 3:42 PM on April 16, 2009
Do you need millisecond accuracy for the whole hour? Perhaps you could unload / load in chunks at defined breaks.
posted by a robot made out of meat at 6:31 PM on April 16, 2009
posted by a robot made out of meat at 6:31 PM on April 16, 2009
This thread is closed to new comments.
Dear Ask.MeFi users, how do you go about randomising something when you have more rules and restrictions than a convent?
posted by doctor.dan at 5:52 AM on April 16, 2009