# Biostatistics filter

December 18, 2012 9:29 AM Subscribe

What statistical test should I use to determine if there is a significant difference in the percent change in the presence of bacterial species observed among five groups before and after treatment.

I performed counts pre and post treatment for each group:

...............................Grp1........Grp2.......Grp3.......Grp4.......Grp5

Pre---treatment........1329.......1236 ......1346.......1252......1213

Post-treatmet............697........768.........891.........928........1045

% Change................-47.6%...-37.9%....-33.8%....-25.9%....-13.8%

Each group was screened by chip analsyis for the same set of bacteria (~20,000 species), the values in the pre-treatment and post-treatment row are the number of positives.

So, I would like to know if the change observed among all these groups are different.

I use Excel and Statview, but any suggestions outside of these programs might be helpful if there is an online calculator I can use.

Sorry about the dots, I didn't know how to enter a table.

I performed counts pre and post treatment for each group:

...............................Grp1........Grp2.......Grp3.......Grp4.......Grp5

Pre---treatment........1329.......1236 ......1346.......1252......1213

Post-treatmet............697........768.........891.........928........1045

% Change................-47.6%...-37.9%....-33.8%....-25.9%....-13.8%

Each group was screened by chip analsyis for the same set of bacteria (~20,000 species), the values in the pre-treatment and post-treatment row are the number of positives.

So, I would like to know if the change observed among all these groups are different.

I use Excel and Statview, but any suggestions outside of these programs might be helpful if there is an online calculator I can use.

Sorry about the dots, I didn't know how to enter a table.

Response by poster: But an ANOVA requires a continuous variable for each group, an array. I have binary (yes/no) data that creates a frequency. I was thinking more along the lines of chi-square but for 5 groups rather than 2? Or is there a type of ANOVA that deals with frequencies rather than an array?

posted by waving at 10:54 AM on December 18, 2012

posted by waving at 10:54 AM on December 18, 2012

Ah, I didn't realize the data was binary but then that may be from my misreading ("number of positives"). Then no, you can't use ANOVA.

My knowledge of chi-square is limited so I'll let someone else answer that question.

posted by Young Kullervo at 11:09 AM on December 18, 2012

My knowledge of chi-square is limited so I'll let someone else answer that question.

posted by Young Kullervo at 11:09 AM on December 18, 2012

I've never done this type of analysis before and frankly know precious little about it, but you might want to look into using a generalized estimating equation analysis. I know GEEs are supposed to deal with correlated data (e.g., repeated-measurements) and can model binary outcomes. Here is the SPSS helpfile, and here's a more statistic-y (scientific!) article on the subject meant to introduce the technique to people who aren't statisticians. Not sure if this is overkill?

posted by Keter at 12:47 PM on December 18, 2012

posted by Keter at 12:47 PM on December 18, 2012

Best answer: I'm not sure that Anova strictly requires that *both* variables be continuous -- that is, I think that it's sort of defined as determining the "probability that two sets of data are part of two different distributions, rather than one overlapping one," which can be a question along a single axis.

Specifically, I used one in my Ph.D. work many moons ago when looking at a bunch of electrical recordings from mice expressing one of two genes, and the question was whether the recordings were statistically different. So the x axis was pretty much binary, but the y axis was a cloud (or two clouds). A meaningful value for probability could be obtained. I no longer recall what tweaks to Anova analysis were required to apply it to my data, but I imagine you can track it down. (If nothing else, Memail me and I'll try to hunt up a reference by the guy I worked with on that part of my thesis.)

A bigger problem is whether you have enough data points to really speak in terms of one or two distributions. I think you would want to plot every individual, rather than the combined groups, in which case you might be ok.

posted by acm at 1:16 PM on December 18, 2012

Specifically, I used one in my Ph.D. work many moons ago when looking at a bunch of electrical recordings from mice expressing one of two genes, and the question was whether the recordings were statistically different. So the x axis was pretty much binary, but the y axis was a cloud (or two clouds). A meaningful value for probability could be obtained. I no longer recall what tweaks to Anova analysis were required to apply it to my data, but I imagine you can track it down. (If nothing else, Memail me and I'll try to hunt up a reference by the guy I worked with on that part of my thesis.)

A bigger problem is whether you have enough data points to really speak in terms of one or two distributions. I think you would want to plot every individual, rather than the combined groups, in which case you might be ok.

posted by acm at 1:16 PM on December 18, 2012

Clarification question: So you're asking if the relative proportions of the various groups is different pre- and post-treatment?

In other words, the null hypothesis would be that the pre- and post-treatment populations appear to have the same proportions of groups.

posted by Mercaptan at 1:58 PM on December 18, 2012 [1 favorite]

In other words, the null hypothesis would be that the pre- and post-treatment populations appear to have the same proportions of groups.

posted by Mercaptan at 1:58 PM on December 18, 2012 [1 favorite]

If I understand your design correctly, you'll need repetitions of this design in order to say anything meaningful. Presumably, if you did this experiment multiple times, these proportions would change. It's that variation - each group around its mean percentage - that is used in a statistical test of significance.

If the appearance/disappearance of a bacterial species in a group were completely independent of the appearance/disappearance of another species, you could do a test with the data you have, but that doesn't seem like the case here. Can you describe the design/data in more detail? What are the "groups"?

posted by Philosopher Dirtbike at 2:55 PM on December 18, 2012 [1 favorite]

If the appearance/disappearance of a bacterial species in a group were completely independent of the appearance/disappearance of another species, you could do a test with the data you have, but that doesn't seem like the case here. Can you describe the design/data in more detail? What are the "groups"?

posted by Philosopher Dirtbike at 2:55 PM on December 18, 2012 [1 favorite]

All scientific measurements should have error bars, I'm not sure you can make good sense of these numbers without first knowing your measurement errors.

But a general tip is that there are many different ways to perform statistical testing, you should use whatever tests are preferred for your field, what do you see people doing in the literature for your sub-field?

posted by Tooty McTootsalot at 7:55 PM on December 18, 2012

But a general tip is that there are many different ways to perform statistical testing, you should use whatever tests are preferred for your field, what do you see people doing in the literature for your sub-field?

posted by Tooty McTootsalot at 7:55 PM on December 18, 2012

I agree with Philosopher Dirtbike. You might be able to treat killing each species as an independent Bernoulli trial. However, if you want to publish this in a journal, I would recommend finding a real-life biostatistician and chatting with them to make a model that accurately reflects your beliefs and goals.

If you're just in the market for some casual consulting, then I think estimating a model where all groups have the same species elimination rate and getting a p value from that would work.

Do Messrs. Dirtbike and Mercaptan concur with the following test?

H0 is

Group1 ~ Binomial(1329,theta)

Group2~Binomial(1236,theta)

Group3~Binomial(1346,theta)

Group4~Binomial(1252,theta)

Group5~Binomial(1213,theta)

H1 is

Group1 ~ Binomial(1329,theta1)

Group2~Binomial(1236,theta2)

Group3~Binomial(1346,theta3)

Group4~Binomial(1252,theta4)

Group5~Binomial(1213,theta5)

Also, if Dirtbike's approach is justified, my eyeballs tell me your groups are certainly different.

posted by zscore at 10:13 PM on December 18, 2012 [1 favorite]

If you're just in the market for some casual consulting, then I think estimating a model where all groups have the same species elimination rate and getting a p value from that would work.

Do Messrs. Dirtbike and Mercaptan concur with the following test?

H0 is

Group1 ~ Binomial(1329,theta)

Group2~Binomial(1236,theta)

Group3~Binomial(1346,theta)

Group4~Binomial(1252,theta)

Group5~Binomial(1213,theta)

H1 is

Group1 ~ Binomial(1329,theta1)

Group2~Binomial(1236,theta2)

Group3~Binomial(1346,theta3)

Group4~Binomial(1252,theta4)

Group5~Binomial(1213,theta5)

Also, if Dirtbike's approach is justified, my eyeballs tell me your groups are certainly different.

posted by zscore at 10:13 PM on December 18, 2012 [1 favorite]

If you only have one data point for your before and after condition, you can't really say whether it's statistically significant or not. If you want to know whether the trend you're observing is real (and not just a chance fluctuation in the data or error in measurement) you need to make multiple measurements or, better still, run multiple replicate experiments. If you do that and show similar drops in all replicates (but not in your control which got a dummy treatment) then you can say that your results are statistically significant.

There is also clinical significance. If every time you look at your treatment and Grp. 1 you see a similar decrease, then yes, that's statistically significant. If your post treatment value remains about 700, but 25 is enough of this bacteria to kill you dead then your 50%ish reduction isn't clinically significant because it doesn't do you any good.

Clinically significant might not be the exact right term depending on what you're doing.

posted by Kid Charlemagne at 3:16 AM on December 19, 2012

There is also clinical significance. If every time you look at your treatment and Grp. 1 you see a similar decrease, then yes, that's statistically significant. If your post treatment value remains about 700, but 25 is enough of this bacteria to kill you dead then your 50%ish reduction isn't clinically significant because it doesn't do you any good.

Clinically significant might not be the exact right term depending on what you're doing.

posted by Kid Charlemagne at 3:16 AM on December 19, 2012

Best answer: If you can find a copy of Andy Field's book: Statistics for R (or for SPSS, but the R book is more recent and improved) there is a flowchart on the inside back cover that is absolutely brilliant and will show you exactly which test you need to run for any set of data.

posted by iamkimiam at 3:31 AM on December 19, 2012

posted by iamkimiam at 3:31 AM on December 19, 2012

(sorry for my sort of lame answer here...I commented without refreshing the page (it said "no new updates"!) and there was only one comment at the time. Upon posting mine, I see there were actually several.)

posted by iamkimiam at 3:33 AM on December 19, 2012

posted by iamkimiam at 3:33 AM on December 19, 2012

Response by poster: Thanks for your suggestions! Some clarification:

the study is an environmental survey whereby we took swab samples of five locations in a room before and after it was treated with a anti-microbial agent; each group is a separate swab representing a different location and material of swabbing (therefore I would not necessarily expect that they show the same level of DNA elimination based on previous work showing material affects killing efficacy.

-these data were generated from a chip analysis of a universal bacterial gene (16s). There are ~20,000 bacterial species represented from 60,0000 probes on the chip.

-All groups have been normalized against a control swab that should not have contained DNA

The null hypothesis is that the five groups will show the same DNA species present after treatment.

Being that this is a treatment outcome study, I'm thinking Chi-square. It's not clear to me how treating every data point separately would be done in a treatment outcome study rather than to compare groups.

posted by waving at 3:44 AM on December 19, 2012

the study is an environmental survey whereby we took swab samples of five locations in a room before and after it was treated with a anti-microbial agent; each group is a separate swab representing a different location and material of swabbing (therefore I would not necessarily expect that they show the same level of DNA elimination based on previous work showing material affects killing efficacy.

-these data were generated from a chip analysis of a universal bacterial gene (16s). There are ~20,000 bacterial species represented from 60,0000 probes on the chip.

-All groups have been normalized against a control swab that should not have contained DNA

The null hypothesis is that the five groups will show the same DNA species present after treatment.

Being that this is a treatment outcome study, I'm thinking Chi-square. It's not clear to me how treating every data point separately would be done in a treatment outcome study rather than to compare groups.

posted by waving at 3:44 AM on December 19, 2012

Best answer: I haven't been able to find much on analyzing an ANOVA with a binary outcome, but what about McNemar's test? It seems similar to Chi-Square and I think suitable for pretest/posttest data, but you'll have to determine if it suits your hypotheses.

posted by Young Kullervo at 8:27 AM on December 19, 2012

posted by Young Kullervo at 8:27 AM on December 19, 2012

Reading your extended description convinces me that you have N=1 in each group. In order to have a test of any sort, you need to repeat the experiment multiple times to get replicates in each cell of your design. Treating each species as an independent observation (as would be required by a binomial/chi square test) would be unjustified.

posted by Philosopher Dirtbike at 8:51 AM on December 19, 2012

posted by Philosopher Dirtbike at 8:51 AM on December 19, 2012

Response by poster: I tend to agree with Young Kullervo and came up with the McNemar's test too, but need to look into it more as I couldn't find a calculator for it.

Philsophper Dirtbike: I get what your saying but a couple things to keep in mind. Each group is considered a population, and within that population there are subpopulations (subtaxa). For major groups, like Phyla, there are thousands of taxa that are represented, so in essence there are thousands of replicates within each group. When doing a population study, or let's say a treatment study on humans, one utilizes an appropriately large number of individuals and observes their outcome. Replications for each individual are not necessary. Error bars are not possible when evaluating binary data on a population set.

posted by waving at 1:01 PM on December 19, 2012

Philsophper Dirtbike: I get what your saying but a couple things to keep in mind. Each group is considered a population, and within that population there are subpopulations (subtaxa). For major groups, like Phyla, there are thousands of taxa that are represented, so in essence there are thousands of replicates within each group. When doing a population study, or let's say a treatment study on humans, one utilizes an appropriately large number of individuals and observes their outcome. Replications for each individual are not necessary. Error bars are not possible when evaluating binary data on a population set.

posted by waving at 1:01 PM on December 19, 2012

*Error bars are not possible when evaluating binary data on a population set.*

Sure they are. y/N has a standard error (sqrt[y/N *(1-y/N)/N]), but you have to assume independence to get it. The point here is that observations in a particular location are stochastically dependent - that is, even assuming that you know the true probability of a species dying after the treatment, knowing that one species appeared in a location would change your bet about whether some other species would appear. That dependence structure is unknown to you, but it is there nonetheless. What that means is that your effective sample size is less than your actual sample size; how much less, you don't know.

Think about it this way; in your example about human populations, what would you say to a researcher who did a survey of 50 people in a particular province, but brought all the people to one place to do it at the same time, where the participants talked about the survey before they did it, they all did the survey in same room, with the same questioner, etc. Compare that to a situation where you have the same 50 people, but you visit each one of their houses separately. In the first situation, you've induced some dependence in the answers that are not of interest - dependencies on questioner, dependencies on other people, etc. You've got 50 people, but you might only have 10 people worth of data when all is said and done (if the dependence is absolute, all people give the same answer and effective N=1).

Likewise, in your design, there are dependencies in each location that you don't know. Those dependencies are quite apart from the dependency of interest, that is, your treatment. To get around this problem, we do repetitions, while counterbalancing aspects of the design (location, in your case). As long as the repetitions are independent (or you know the dependence structure) then you're fine.

BUT, if you wish to treat all observations as independent (and discounting the possibility of new species appearing after the treatment, so that all species after treatment are "survivals") then you can do a Chi-square test. Your table will be 2 by 5, with one row corresponding to the number of deaths, and one row to the number of survivals. The columns are group. Any software you like will do that Chi-square test for you (however, my advice reading your description would be that this is wrong, due to the lack of independence).

posted by Philosopher Dirtbike at 1:35 PM on December 19, 2012 [1 favorite]

Response by poster: In biological systems, there is generally dependencies, but that does not preclude viewing each outcome as independant. When looking at any one part or affect, there is a subsystem in place, such as genetics, nutrition, whatever. There is always going to be some level of co-dependence. When a drug study is undertaken, people all have the same disease, more or less, they all come to the clinic to get treatment and blood is drawn. They have congregated for a specific purpose due to circumstances that are likely similar (dependency). They are monitored and provided with a 1/0 outcome (simplistically). Thousands, hundreds, or quite often only tens of subjects are in the group depending on the availability of resources in the study and the frequency of cases in the population. The subjects are not repeatedly treated, however since there is a significant sample size given the expected outcome based on your power calculation, one can readily apply statistics to individual outcomes and compare the different groups as sub-populations. The bacteria are really no different, as they are environmental and have had no selection pressure outside of their environment--they are not -cultured, for example. The greatest level of independence amongst the groups I am examining is sample site material. The data reject the null hypothesis that sample site is not a factor in sterilization efficacy, and the Chi square will make correction for sample size number, although mine are quite large given they are often over 1,000 data points, and the variation between groups is sufficiently large in some cases. Given such a large N within each group, standard error will be quite small, as the estimate of the population mean improves. Standard deviation will be unaffected by sample size. In cases like this, St. Error is meaningless. Do you know of an equation to calculate the standard deviation within the groups?! I am not familiar with how that is calculated with binary data.

So, no, I do not agree that lack of independence is an issue here any more than it would be in a drug trial outcome.

Chip results are often not shown as replicate experiments, and I don't think they need to be given the groups are adequately selected. What is more important is that the internal controls are extremely well thought out and data subtraction is done correctly given the circumstances. Do most bacterial population chip results on this scale run replicates in your experience? I have not seen that, but I'm sure it's possible that replicates are performed given a larger budget. This analysis alone cost $15,000!!

Thanks for your feedback!

posted by waving at 7:13 AM on December 20, 2012

So, no, I do not agree that lack of independence is an issue here any more than it would be in a drug trial outcome.

Chip results are often not shown as replicate experiments, and I don't think they need to be given the groups are adequately selected. What is more important is that the internal controls are extremely well thought out and data subtraction is done correctly given the circumstances. Do most bacterial population chip results on this scale run replicates in your experience? I have not seen that, but I'm sure it's possible that replicates are performed given a larger budget. This analysis alone cost $15,000!!

Thanks for your feedback!

posted by waving at 7:13 AM on December 20, 2012

*Do you know of an equation to calculate the standard deviation within the groups?! I am not familiar with how that is calculated with binary data.*

In binary data, since the data depend on only one parameter, the mean and variance (or standard deviation, if you like) are redundant - if you know the within-group mean (that is, p-hat, your estimate), you know the variance automatically. Just like the mean (p-hat), standard deviation is computed the same way it is in other contexts. And just like in other contexts, it is standard error is the standard deviation divided by the sqrt(N), so you can just multiply your standard error by sqrt(N) if you want a within-group standard deviation. See any stat textbook's description of the binomial distribution.

To address the rest of your comment, if your standard errors are so small as to be "meaningless" (I don't know what you mean by "meaningless", in this context; they are perfectly meaningful) then you already have your answer. To the extent that the numerical estimates of the percentages differ, you know that the true means differ, because your estimate is arbitrarily close to the true value. You don't need a further test (unless, as I said, you don't have independence).

posted by Philosopher Dirtbike at 9:08 AM on December 27, 2012

This thread is closed to new comments.

posted by Young Kullervo at 10:36 AM on December 18, 2012 [1 favorite]