# Simple data analysis in Excel

May 15, 2013 10:00 AM Subscribe

How can I easily analyse test score data in a spreadsheet with multiple independent and dependent variables in each row?

I'm currently trying to do some simple data analysis on a spreadsheet of student test results that looks roughly like this in each row:

Gender | Age | Level | etc. | Section 1 result | Section 2 result | Section 3 result | etc. | Overall result

(The first few cells in each row describe attributes of the student; the remaining cells describe results attained by the student)

I'd like to see things like which gender struggles with which section, which age group does better overall, etc. At the moment I'm doing this by making a copy of the data, sorting it by the attribute that I'm examining, summarising that in cells at the bottom of the data, and then making a graph of the summary. I feel like there's probably an easier way to do this. Maybe PivotTables, but I haven't used those before, and can't seem to get them to do what I want.

Ideally, I'd also like to be able to find out things like whether people who did poorly in e.g. section 1 are more likely to have done poorly in section 3.

While my current task is analysing this particular data, any tips or resources for similar/related analyses would be appreciated; this isn't the first time I've run into a situation like this and hit a brick wall.

I'm currently trying to do some simple data analysis on a spreadsheet of student test results that looks roughly like this in each row:

Gender | Age | Level | etc. | Section 1 result | Section 2 result | Section 3 result | etc. | Overall result

(The first few cells in each row describe attributes of the student; the remaining cells describe results attained by the student)

I'd like to see things like which gender struggles with which section, which age group does better overall, etc. At the moment I'm doing this by making a copy of the data, sorting it by the attribute that I'm examining, summarising that in cells at the bottom of the data, and then making a graph of the summary. I feel like there's probably an easier way to do this. Maybe PivotTables, but I haven't used those before, and can't seem to get them to do what I want.

Ideally, I'd also like to be able to find out things like whether people who did poorly in e.g. section 1 are more likely to have done poorly in section 3.

While my current task is analysing this particular data, any tips or resources for similar/related analyses would be appreciated; this isn't the first time I've run into a situation like this and hit a brick wall.

Best answer: You can do this with Pivot Tables.

But you may want to organize your data differently, think more like database, and less like spreadsheet.

So Section would be on column label. Then use the section number in the row.

So your data would have multiple rows of the same person.

Gender Age Level Section Result

male 50 1 1 80

male 50 1 2 20

male 50 1 3 60

Like that. You'll have three entries for that one student.

THEN you can do pivot tables.

You can put one of the three attributes (Gender, Age, Level) on the Row Labels, and then you can do Section on the Column Labels. Then you can average or sum the scores in the Values Section.

You can do the next pivot on Age (you may want to do ranges) and the section after that on the Level.

It seems counter-intuitive but it's BOSS when you get it, because once your data is lined up correctly, you can do a pivot in less than 20 seconds!

posted by Ruthless Bunny at 10:24 AM on May 15, 2013 [1 favorite]

But you may want to organize your data differently, think more like database, and less like spreadsheet.

So Section would be on column label. Then use the section number in the row.

So your data would have multiple rows of the same person.

Gender Age Level Section Result

male 50 1 1 80

male 50 1 2 20

male 50 1 3 60

Like that. You'll have three entries for that one student.

THEN you can do pivot tables.

You can put one of the three attributes (Gender, Age, Level) on the Row Labels, and then you can do Section on the Column Labels. Then you can average or sum the scores in the Values Section.

You can do the next pivot on Age (you may want to do ranges) and the section after that on the Level.

It seems counter-intuitive but it's BOSS when you get it, because once your data is lined up correctly, you can do a pivot in less than 20 seconds!

posted by Ruthless Bunny at 10:24 AM on May 15, 2013 [1 favorite]

Pivot tables are a good option, but something else to consider is to just use a filter. Basically, it will allow you to filter the data you want to a new range of cells. It's similar to what you're doing with copy and paste, but it will save you from manually sorting everytime. For example, you can filter male for gender AND within just the male records, Section 1 < 50 (or whatever score cut-off you are looking for). Then from the new spreadsheet range that you put the filtered data, you can summarize and create a chart of those results.

posted by Eicats at 10:52 AM on May 15, 2013

posted by Eicats at 10:52 AM on May 15, 2013

p.s. I would use the advanced filter option, not the auto-filter. The auto-filter just puts drop-down arrows on all of the column titles and allows you to select from any option that your spreadsheet contains. Advanced filter will allow you to use formulas and more advanced parameters and will allow you to put your results in a permanent new area on the spreadsheet (auto-filter just hides the rows in place of whatever gets filtered out)

posted by Eicats at 10:54 AM on May 15, 2013

posted by Eicats at 10:54 AM on May 15, 2013

Best answer: I would tabulate my results like so:

Split the overall score into categories - low, medium, high. Then, separately within each overall score category, I would calculate the mean and standard deviation of age, the % white, the %black, the mean and standard deviation of section1_score, etc. I could then look across my overall score categories, and see if the mean age was lower in the lower score group, if the mean score in section1_score was about the same across overall score group, etc.

I would do this in excel by making a new tab, putting 'overall score=low', overall score=med', and 'overall score=high' in row 1, cols B:D. Then in col A, rows 2:... I would put 'age', 'race=black', 'race=white', etc. I would then use a formula to get the mean and standard deviation (or percent) over the cells of interest from the original sheet.

This would give you a nice, comprehensive exploratory look at your data.

I don't think you need R, tests, regressions, or p-values if you just want to see what's going on in there in a general way.

posted by everythings_interrelated at 12:39 PM on May 15, 2013

Split the overall score into categories - low, medium, high. Then, separately within each overall score category, I would calculate the mean and standard deviation of age, the % white, the %black, the mean and standard deviation of section1_score, etc. I could then look across my overall score categories, and see if the mean age was lower in the lower score group, if the mean score in section1_score was about the same across overall score group, etc.

I would do this in excel by making a new tab, putting 'overall score=low', overall score=med', and 'overall score=high' in row 1, cols B:D. Then in col A, rows 2:... I would put 'age', 'race=black', 'race=white', etc. I would then use a formula to get the mean and standard deviation (or percent) over the cells of interest from the original sheet.

This would give you a nice, comprehensive exploratory look at your data.

I don't think you need R, tests, regressions, or p-values if you just want to see what's going on in there in a general way.

posted by everythings_interrelated at 12:39 PM on May 15, 2013

All that rigamarole seems more complicated than "tab sex, summ(thingofinterest)" or "describe.by(dataset,dataset$sex)."

posted by ROU_Xenophobe at 1:03 PM on May 15, 2013 [1 favorite]

posted by ROU_Xenophobe at 1:03 PM on May 15, 2013 [1 favorite]

do you need to show statistical evidence, or just make some tables? if the later, pivot tables will be the easier thing. if the former, there are many tools that will do it, but which tool, and how much background you have in statistics will play a big part in which will be the easiest given your time constraints. since you asked your question, i'd have to assume you don't know much about statistics. which is why my first question is so important. if you don't need to say "group X had significantly higher scores than group Y in this section (p<.05)" there's no point in getting you confused with all this statistics stuff you don't need.

posted by cupcake1337 at 5:38 PM on May 15, 2013 [1 favorite]

posted by cupcake1337 at 5:38 PM on May 15, 2013 [1 favorite]

Response by poster: Thanks for the answers so far!

Yep, just see what's going on in a general way/make some tables. As you've correctly guessed, I have very little statistics knowledge.

I'm working on this for a non-profit, so unfortunately purchasing any statistics packages won't be possible. Free statistics packages are OK, but I'd prefer to keep it in Excel if possible - even if it's slightly more cumbersome, the other staff are more likely to keep using it after I leave.

posted by mxc at 7:28 PM on May 15, 2013

*I don't think you need R, tests, regressions, or p-values if you just want to see what's going on in there in a general way.**do you need to show statistical evidence, or just make some tables?*Yep, just see what's going on in a general way/make some tables. As you've correctly guessed, I have very little statistics knowledge.

I'm working on this for a non-profit, so unfortunately purchasing any statistics packages won't be possible. Free statistics packages are OK, but I'd prefer to keep it in Excel if possible - even if it's slightly more cumbersome, the other staff are more likely to keep using it after I leave.

posted by mxc at 7:28 PM on May 15, 2013

This thread is closed to new comments.

In stata, it would be as simple as "tabulate gender, summarize(overallresult)" or if you cared about statistical significance "ttest overallresult, by(gender)." In R, it looks like it would be as simple as bolting on the psych package and using the describe.by function.

Stata would cost you/your employer some moneys. R is free. PSPP, which claims to be a FOSS equivalent of SPSS, is free.

posted by ROU_Xenophobe at 10:13 AM on May 15, 2013