Statistics Question
October 10, 2007 4:36 PM   Subscribe

Statistics filter: Interval or Ordinal data?

Suppose a survey has the following question:

How old are you?
20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50+

You are to circle the range that you fit in.

Is the data collected from this question ordinal or interval? My first thought was ordinal, but then the intervals are all the same except for the last one, 50+. Because of that 50+ I still think ordinal. What if I got rid of that 50+ option, would the data collected then be interval? Am I on the right track?

Next question. If the data is ordinal is it appropriate to calculate the standard deviation? What about the variance?

Thanks for your help. If you need me to clarify my questions please ask.
posted by mjger to Education (11 answers total)
Kalvin scale = Ratio, what's the deal with the Kalvin scale? 0 = freezing! A ratio measurement starts at 0. Interval doesn't.
posted by k8t at 4:51 PM on October 10, 2007

I am a bad reader.

Wikipedia is your friend.
Ordinal measurement
In this classification, the numbers assigned to objects represent the rank order (1st, 2nd, 3rd etc.) of the entities measured. The numbers are called ordinals. The variables are called ordinal variables or rank variables. Comparisons of greater and less can be made, in addition to equality and inequality. However, operations such as conventional addition and subtraction are still meaningless. Examples include the Mohs scale of mineral hardness; the results of a horse race, which say only which horses arrived first, second, third, etc. but no time intervals; and many measurements in psychology and other social sciences, for example attitudes like preference, conservatism or prejudice and social class. The central tendency of an ordinally measured variable can be represented by its mode or its median; the latter gives more information.

See also: Ordinal scale

Interval measurement
The numbers assigned to objects have all the features of ordinal measurements, and in addition equal differences between measurements represent equivalent intervals. That is, differences between arbitrary pairs of measurements can be meaningfully compared. Operations such as addition and subtraction are therefore meaningful. The zero point on the scale is arbitrary; negative values can be used. Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly. But ratios of differences can be expressed; for example, one difference can be twice another. The central tendency of a variable measured at the interval level can be represented by its mode, its median, or its arithmetic mean; the mean gives the most information. Variables measured at the interval level are called interval variables, or sometimes scaled variables, though the latter usage is not obvious and is not recommended. Examples of interval measures are the year date in many calendars, and temperature in Celsius scale or Fahrenheit scale.
posted by k8t at 4:53 PM on October 10, 2007

Totally semi (read: poorly) educated guess, but I'd say ordinal - because of the differing interval sizes (as you point out, 50+, and maybe even the missing 0-19). Collected like that, you could really only use it to group/rank responses into categories.

SD & variance? Why not, presuming you know the # of samples in each category (and know, or can assume a reasonable value for, the upper limit for the 50+ category).

I like this one. I'm going to print it and ask my stats 101 lecturer this afternoon. It's the sort of question that's either going to make me look insightful or stupid ;-)
posted by Pinback at 5:41 PM on October 10, 2007

Response by poster: @Pinback

Please let me know what you find out after your lecture.
posted by mjger at 5:49 PM on October 10, 2007

Best answer: It's ordinal. Age is potentially ratio, but the way you've collected your data means that you've lost some of the detail. It is not interval level data because of the 50+ category. We should not assume the distances between the cut points are equivalent. If you didn't have the 50+, you could consider the data interval-level. And yes, if you do drop the 50+, you should feel free to calculate the SD, variance, etc. In the future, always err on the side of too much detail in your data. You can always simplify, but you can't get more detail once the data is in your hands.
posted by B-squared at 6:36 PM on October 10, 2007

Ordinal measures are those that you can tell there is a rank order, but you cannot tell how far each one is from the other. Example: 1st, second, third place. We know that 1 is better than 2 and 2 better than 3, but we don't know by how much. Another example is measuring satisfaction: very poor, fair, good excellent. We know which is better but have no sense of how much better one is than the other.

Interval measures have the rank order and the distance between each one has meaning. (We know how much more 24 is than 20, for example).

However, there is also ratio measure which is interval, plus it has a zero point. Generally, ages are considered to be either interval or ratio, depending on how you look at it (is zero really an age? etc). So, those might actually be ratio.
posted by sneakin at 7:25 PM on October 10, 2007

Good call, B-squared. That 50+ will bring it back down to ordinal.
posted by sneakin at 7:26 PM on October 10, 2007

Response by poster: Thanks, B^2. That's answers my question perfectly.
posted by mjger at 8:18 PM on October 10, 2007

Just for completeness: Spoke to my lecturer and yup, B-squared is on the money. The only thing to add is that, if you've got a reasonable number of samples in each group, it might be appropriate/possible to assume a normal distribution in each range and use the mean e.g. 50 of age 20-24 becomes 50 of age 22, 72 of age 25-29 becomes 72 of age 27, etc.

(Now to go and beat my head against the textbook examples of ANOVA...)
posted by Pinback at 12:17 AM on October 11, 2007

Pinback, I think you have an important point. In a long, possibly too long, career managing and analysing questionnaire data, I often used midpoints of age ranges, with moderate sized groups. You state it well. This approach, I think, preserves the maximum information in the data.
posted by judybxxx at 6:45 AM on October 11, 2007

oops, I should have added that in practical terms, the key here is what you know about the population and what to do with the 50+ . depends a lot on the study and situation.
posted by judybxxx at 6:47 AM on October 11, 2007

« Older beater minivans?   |   Which Wind Instrument Requres the Least Amount of... Newer »
This thread is closed to new comments.