significantly negative about the education I got in statistics
September 28, 2009 6:03 AM   Subscribe

Is a contingency table the right thing for this kind of data (detailed within), and either way, how do I analyze it in Prism 5.0?

The experiment includes at least two different treatments, sometimes more. There are always 10 experimental samples. Each treatment is replicated three times, for a total of 30 experimental samples at each treatment. Each experimental sample can have the outcomes: positive or negative, or inconclusive (if necessary, we could turn the inconclusive to "negative"). My stats seek to answer, does one treatment or another lead to significantly more positive or more negative sample?

Any recommended texts on the subject would be greatly appreciated -- I'm going to need to have sound documentation for how I'm doing this at some point, although for the time being, at least understanding would be a great help.

If it helps any, these experiments are sterilization testing (think autoclave or bleach, not tubal ligation...) and involve treatments that can achieve: total inactivation/sterilization (no positives), fractional inactivation (some positives, some negatives), or what i would view as probable lack of inactivation (all positives). Most of the results I am seeing now involve two treatments with fractional inactivation and I am trying to get a handle on whether there is a way to say that one treatment resulting in fractional inactivation is better at inactivating than another treatment resulting in fractional inactivation. Coworkers and boss want to think that more negatives = better inactivation, but when two treatments have 4/30 positives versus 10/30 positives, i am really not sure how to calculate whether that 4/30 is significantly more inactivated than 10/30.

I've search metafilter for info on contingency tables, as well as google for the same as well as trying to figure out if I should be using some other type of analysis....I am more confused than when I began. It seems like the examples I find include more complex data than that I am working with...
posted by Tandem Affinity to Science & Nature (6 answers total)
 
Best answer: A standard test of "are these rows the same?" would be the contingency table statistics like fisher's exact test.

If you want to say "X is better" with just a table, stats like the relative risk of positive are most descriptive (and come with tests and CIs built into prism). You can use both cutpoints and show that probability for being positive for each treatment.

If you want to keep it in 3 categories (although it seems like you can't be partially sterile), try spearman correlation with the treatment or ordered logistic regression. Spearman correlation is in prism, but I've never used it. If you have other covariates that you want to build in (like it was higher humidity on the day when X was done) then you should use ordered logistic regression. That's not in prism.

Another thing to think about is if there are any other sources of commonality between your samples. Were the treatment X's all done on the same day? Were 10/30 all done together? Were 10/30 of each done together?
posted by a robot made out of meat at 8:17 AM on September 28, 2009


Oh, and you're right in a way that randomized experiments are some of the simplest data structures possible assuming that there aren't any other sources of variance (for example, if the order that you did them in matters). There should be a section on cross-tabs in your intro stat book of choice. If not, the stata manual for "tabulate 2-way" might be a good place to start.
posted by a robot made out of meat at 8:19 AM on September 28, 2009


Response by poster: Thanks for the help and confirmation that the contingency table is the way to go. When building the table, how would I label the rows and columns? I could see trying to determine variability between groups within the same treatment conditions as, rows labeled with the treatment and each column containing the number of positives from one group. How would I go about comparing two different treatment groups though?

The splitting of the 30 samples into the three groups of 10 is that the machine that sterilizes can only process 10 at a time. So the 10 are run at different times. There could be other variables that are not immediately obvious, since we do try to keep everything the same...

thanks again for your help. i guess i'm just still a little hung up on the nitty gritty of the setting up of the tables...
posted by Tandem Affinity at 8:51 AM on September 28, 2009


Response by poster: p.s upon trying to use the contingency table feature in Prism, I find two problems:
a. it won't let me select Fisher's test for analysis of a contingency table, only a chi square test, even though its own help table specifically recommends using Fisher's whenever possible
b. "1" is not an acceptable value for a chi square analysis. But "1" is actually the value I have in many cases. What can be done? Is it acceptable to use percentages?
posted by Tandem Affinity at 9:14 AM on September 28, 2009


Best answer: Sadly, I have never used prism. Do you mean a cell which has only 1/10 positive entry in it? Or which could only have 1 entry in it? You want at least 5 entries in each cell in a 2x2 table. You should (assuming independence) be using fisher's exact test, but until proven otherwise I absolutely believe that each sample in the same batch is related.

What format is the data in? Could you make a text file like

Tx , Batch, Result=1, Result=2
1, 1, 3, 4
1, 2, 2, 7
1, 3, 1, 4
2, 1, 0, 2

etc where result =1 is sterile, result =2 is "kinda sterile"?
posted by a robot made out of meat at 12:23 PM on September 28, 2009


Response by poster: thanks for helping, especially when I keep carping on Prism! What I have figured out is that Fisher's test is only possible on a 2x2 table? Because that works fine in Prism, and I have found equal results using some online calculators. If you don't use Prism, would you recommend any of the online calculators? The simpler the better of course.....
For any other configuration in Prism (and I guess anywhere), I can only do chi-squared. Here is how I am setting up tables:


--------------- Treatment 1 ------- Treatment 2-----Total
Positive ------- 11---------------------- 16 ---------- 27
Negative--------29 --------------------- 24 ---------- 53
Colum Totals 40 ---------------------- 40 ---------- 80


where the experiments included 4 sets of 10 for each of two treatments and two possible outcomes for each sample. THe Fisher's analysis I got back was:

P value------------------------- 0.3444
P value summary ----------ns
One- or two-sided ----------Two-sided
Statistically significant? (alpha<0.05) No

What do you think of my table setup? "Positive" would be samples that don't get sterilized; "negative" are samples that do... SInce the p value is not-significant, I guess there's no correlation between my treatments and the likelihood of whether the samples are sterilized or not.

As for my previous problem of having "1" as a result for some samples, I guess Prism will actually do the calculation -- it just gives me a stern warning... Also, I now understand about the "binning" manipulation that can be done to avoid having only a count of 1. Still difficult to avoid in some cases...I hope there's not some better analysis I could be doing to deal with getting just 1 positive....
posted by Tandem Affinity at 10:49 AM on September 29, 2009


« Older I need help spicing up second grade.   |   Australia Post Alternatives? Newer »
This thread is closed to new comments.