July 4, 2007 9:37 AM Subscribe

Graphing question: My boss has an idea of how our data should be presented. Please help me figure out how to make it happen!

We have three biological categories, each with several patients. We'd like to show that the different categories have different ages of onset of disease.

I've drawn an example of what my boss would like the graph to look like.

Basically, the three categories are along the y-axis. The month of onset is along the x-axis. The data points for each category at a particular month are spread out to show the number of points at that intersection (the points could be spread in a cloud instead of a line)

Please help!
posted by nprigoda to Science & Nature (5 answers total)

We have three biological categories, each with several patients. We'd like to show that the different categories have different ages of onset of disease.

I've drawn an example of what my boss would like the graph to look like.

Basically, the three categories are along the y-axis. The month of onset is along the x-axis. The data points for each category at a particular month are spread out to show the number of points at that intersection (the points could be spread in a cloud instead of a line)

Please help!

Sure thing. Create this using an x-y dot graph in Excel. If on Office 2007, this is called "scatter," if on earlier version I think it's called X-Y graph. First create the data. One column for X: this says either 1,2,3,4,or5. One column for Y: you will space the data in clusters, like 1.1, 1.2, 1.3 then 2.1, 2.2, 2.3. So, for your first column of data on the flickr graph there, the first column would be all "1"s and the second would be 2.1,2.2,2.3, 3. Highlight the two columns, click the first button under "scatter" and you're done. If you want to add the averages, put them in the same way. Then, once you make the graph, click on the associated data point for the average and change (edit) it's symbol from a circle to a sideways line.

posted by Eringatang at 10:18 AM on July 4, 2007

posted by Eringatang at 10:18 AM on July 4, 2007

Thanks Eringatang, this was the quick and dirty answer I was looking for!

I'll have a look at GraphPad Prism for future graphing!

Thanks again!

posted by nprigoda at 10:31 AM on July 4, 2007

I'll have a look at GraphPad Prism for future graphing!

Thanks again!

posted by nprigoda at 10:31 AM on July 4, 2007

You say in the example that you want little lines for the means of each group. But each cluster represents the number of patients in that category and that month. As I see it, you have only one value - N - for each data point. There is nothing to average to get a mean.

You might want to look at a bubble graph, where the center of the bubble is located at the category and the month, and the size of the bubble is the number of patients N. The graph your boss suggested is going to get very crowded if N gets a bit large.

posted by mediaddict at 11:42 AM on July 4, 2007

You might want to look at a bubble graph, where the center of the bubble is located at the category and the month, and the size of the bubble is the number of patients N. The graph your boss suggested is going to get very crowded if N gets a bit large.

posted by mediaddict at 11:42 AM on July 4, 2007

IA(Almost)AStatistician

First, there are lots of little tricks and possible problems in the analysis of longitudinal data, particularly this kind of longitudinal data. I recommend strongly sitting down with the stat/epi consults available for an hour before you try to publish a result with a statistical interpretation.

Second, if you have to have something close to this, I'd suggest an alteration or two to this plot. It makes more sense to put the dependent variable (onset time) on the y-axis. In addition, these kinds of stacked plots rapidly become very difficult to interpret. I'd suggest a standard vertical box/whisker plot for each category, or a solid shape (ie, circle or box) proportional to the number at each month.

Finally, I think that the most accurate and informative way to display this kind of data would be a set of kaplan-meier survival curve.

posted by a robot made out of meat at 11:48 AM on July 4, 2007

First, there are lots of little tricks and possible problems in the analysis of longitudinal data, particularly this kind of longitudinal data. I recommend strongly sitting down with the stat/epi consults available for an hour before you try to publish a result with a statistical interpretation.

Second, if you have to have something close to this, I'd suggest an alteration or two to this plot. It makes more sense to put the dependent variable (onset time) on the y-axis. In addition, these kinds of stacked plots rapidly become very difficult to interpret. I'd suggest a standard vertical box/whisker plot for each category, or a solid shape (ie, circle or box) proportional to the number at each month.

Finally, I think that the most accurate and informative way to display this kind of data would be a set of kaplan-meier survival curve.

posted by a robot made out of meat at 11:48 AM on July 4, 2007

This thread is closed to new comments.

posted by porpoise at 10:06 AM on July 4, 2007