Join 3,415 readers in helping fund MetaFilter (Hide)


[R] you experienced with plot() and barplot() ?
April 9, 2009 4:39 PM   Subscribe

How do I graph these data using R's plot() and barplot() commands?

I have data that I need to graph using R. Unfortunately, the examples I have found in books and on the web are too simplistic and do not give me much insight into the process. I have never used R before, and have been trying for hours and hours to figure out how to do what I need to do. I also already read the previous posts about R here on Ask MeFi, but I am still stumped.

I've made a fake set of data that represents the structure of my real data (which you don't want to see, it's too confusing).

Here is my sample data.

Take a quick look at that.

There is a main factor/category, Factor1 with values of "BOB" and "JILL", and a grouping within that called SubFactor with values "Q" and "P". I also have blocked reps of everything, labeled A, B, C, D. My measurements are Measurement1, Measurement2, Measurement3, etc which I will abbreviate as M1, M2, M3, etc.

To cut to the chase, I made a few hand drawings of the graphs I need. Please note that the graphs are illustrative only, and do not correspond to the data above.

Please walk me through this step-by-step and assume I know absolutely nothing about how to get my data into R, how to manipulate it as needed, and finally, how to produce the following graphs.

ALSO if any of you R graphing experts want to take me under your wing and help me learn more than I have outlined here (surely I will have future questions), please send me a MeMail.

----------

GRAPH 1...

In the first, I want to make a bar graph of just the average of the M1 measurements for each treatment, split into categories "BOB" and "JILL" with the grouped "P" and "Q" subfactors.

So that we're on the same page here, the mean of BOB(Q) for M1 = (100+500+200+300)/4 = 275, for example.

Here is my drawing for this: Bar Graph 1

----------

GRAPH 2...

In the second, I want a plot of the measurements for "BOB", with M1, M2, M3, M4, and M5 on the X-axis. The lines will represent the "P" and "Q" subfactors.

Here is my drawing for this: Plot 1

----------

GRAPH 3...

In another bar graph, I want to have M1, M2, M3, M4, M5 be the bars, and have them grouped by BOB (Q, P) and JILL (Q, P) on the x-axis. The Y-axis would show the means.

Here is my drawing for this: Bar Graph 2

----------

GRAPH 4...

In the final plot, I want to graph only the mean measurements for "BOB" sub-level "P" over time. Let's say that each M1, M2, M3, M4, M5 was taken at unique times T1, T2, T3, T4, T5 where the times could have values T1=15, T2=45, T3=55, T4=65, T5=70 and they need to be spaced properly on a timeline (i.e. the distance between T1 and T2 should be 30 units, T2 and T3 10 units, T3 and T4 10 units, T4 and T5 5 units) -- I spaced them evenly on the graph, so ignore that.

Here is my drawing for this: Plot 2

----------


THANKS!!!
posted by bengarland to Education (6 answers total) 2 users marked this as a favorite
 
A good book for learning graphing in R is A First Course in Statistical Programming in R.

If you get stuck on a command, type ?cmd for help. For example, ?rect will tell you about drawing rectangles, and ?lines for drawing lines.

I'll do the first graph to help you get started. The rest of your graphs are variations on a theme or can be modified fairly easily from your initial case:

outputFile <- "barGraph1.png"
outputFileWidth <- 6
outputFileHeight <- 6
resolution <- 150

samples <- read.table("sample_data.txt", header=TRUE, stringsAsFactors=TRUE, row.names=FALSE, col.names=c("rep", "factor", "subfactor", "m1", "m2", "m3", "m4"), sep="\t", fill=TRUE, colClasses=c('numeric', 'character', 'character', 'numeric', 'numeric', 'numeric', 'numeric'))

# set up subsets
bobPSubset <- subset(samples, ((samples$factor == 'BOB') && (samples$subfactor == 'P')))
bobQSubset <- subset(samples, ((samples$factor == 'BOB') && (samples$subfactor == 'Q')))
jillPSubset <- subset(samples, ((samples$factor == 'JILL') && (samples$subfactor == 'P')))
jillQSubset <- subset(samples, ((samples$factor == 'JILL') && (samples$subfactor == 'Q')))

# calculate values
bobPMean <- mean(bobPSubset$m1)
bobQMean <- mean(bobQSubset$m1)
jillPMean <- mean(jillPSubset$m1)
jillQMean <- mean(jillQSubset$m1)
bobMax <- ifelse(bobPMean > bobQMean, bobPMean, bobQMean)
jillMax <- ifelse(jillPMean > jillQMean, jillPMean, jillQMean)

# set up the dimensions of the plot
xmin <- 0
xmax <- ifelse(bobMax > jillMax, bobMax, jillMax)
ymin <- 0
ymax <- 5
boxWidth <- 0.4;

# plot the graph
bitmap(file=outputFile, type="png256", width=outputFileWidth, height=outputFileHeight, res=resolution)
samplePlot <- plot (range(xmin, xmax), range(ymin, ymax), type="n", axes=FALSE, mar=c(1,1,1,1))
bobPRect <- rect(1-boxWidth, 0, 1+boxWidth, bobPMean, col="red", border=NA)
bobQRect <- rect(2-boxWidth, 0, 2+boxWidth, bobQMean, col="red", border=NA)
jillPRect <- rect(3-boxWidth, 0, 3+boxWidth, jillPMean, col="red", border=NA)
jillQRect <- rect(4-boxWidth, 0, 4+boxWidth, jillQMean, col="red", border=NA)
horizLabel <- text(seq(1:4), 0-(xmax * 0.05), labels=c("BobP", "BobQ", "JillP", "JillQ"), adj=0.5)
dev.off()

posted by Blazecock Pileon at 5:29 PM on April 9, 2009


I mixed up x and y axes. Here's a correction:

# set up the dimensions of the plot
xmin <- 0
xmax <- 5
ymin <- 0
ymax <- ifelse(bobMax > jillMax, bobMax, jillMax)
boxWidth <- 0.4;

# plot the graph
bitmap(file=outputFile, type="png256", width=outputFileWidth, height=outputFileHeight, res=resolution)
samplePlot <- plot (range(xmin, xmax), range(ymin, ymax), type="n", axes=FALSE, mar=c(1,1,1,1))
backgroundRect <- rect(xmin, ymin, xmax, ymax, col=NA, border="black", lwd=1)
bobPRect <- rect(1-boxWidth, 0, 1+boxWidth, bobPMean, col="red", border=NA)
bobQRect <- rect(2-boxWidth, 0, 2+boxWidth, bobQMean, col="red", border=NA)
jillPRect <- rect(3-boxWidth, 0, 3+boxWidth, jillPMean, col="red", border=NA)
jillQRect <- rect(4-boxWidth, 0, 4+boxWidth, jillQMean, col="red", border=NA)
horizLabel <- text(seq(1:4), 0-(ymax * 0.05), labels=c("BobP", "BobQ", "JillP", "JillQ"), adj=0.5)
vertLabel <- text(0-(xmax*0.05), seq(0, ymax, 200), labels=seq(0, ymax, 200), adj=1)
dev.off()

posted by Blazecock Pileon at 5:37 PM on April 9, 2009


I think that it would take 40 minutes or so for me to write down the code for your plots (R graphics can be a frustrating enterprise, might take longer). What's above is a good start. If you're looking for a hint on something, Kickstart might be good, but there are lots of R graphics guides out there. The R tag has lots of hints. Your keys will be:

plot(...,type="n") is used above because the built in plots for most things aren't what you want (or are ugly). Some might be ok (graph 2 and 4 are standard line plots type="l" or "b" once you have the data wrangled).

mtext - put text in the margins
text - put text in the main field
axis - draw an axis
legend - draw a legend
points - draw points
lines / segments- draw lines
arrows - draw arrows (eg error bars)
rect - draw rectangles
polygon - draw polygons
symbols - draw things like circles, stars, bars
?par - learn about options
posted by a robot made out of meat at 6:54 PM on April 9, 2009 [1 favorite]


Thanks for the hints so far. This all really helps me out a lot and points me in the right direction. Of course, more answers are very much appreciated!!! Especially regarding the timeline graph.
posted by bengarland at 7:12 PM on April 9, 2009


I've produced code for all of these graphs here.

As far as I am concerned, the first rule of R graphics is do not use the legacy graphics functions. This means don't ever use plot() and friends. Instead, use the lattice library, which comes with R. It makes it much easier to do the sort of multilevel plots you are interested in. There is a little bit of code in there to reshape the data, but lattice takes care of all of the details of graphics programming for you. It is extremely configurable but the details are pretty good (except for the hideous default color scheme retained for historical reasons; use trellis.par.set(col.whitebg()) to fix it).

For beginners wanting to learn R, I recommend: Introduction to S & S-PLUS (which works just fine for R) and Lattice: Multivariate Data Visualization with R.

I do not recommend: the free documentation. It is frequently poorly written and makes this stuff even harder to learn.
posted by grouse at 7:39 PM on April 9, 2009 [1 favorite]


Here's another version that makes more explicit the separation between data reshaping and plotting.
posted by grouse at 7:42 PM on April 9, 2009 [1 favorite]


« Older Help please. Public Library pr...   |  QuoteFilter: "If a music ... Newer »
This thread is closed to new comments.