Paired T Test sounds like parakeet test to me
November 13, 2009 11:37 AM Subscribe

Hi, I have a specific question about paired t-tests!

On the paired t-test, the spreadsheet looks like this
Let's say I'm doing a a paired t-test for timed trials.
The spreadsheet looks like this:

Subject A Subject B Difference
5 3 2
4 4 0
5 6 -1

I don't understand why the "Difference" column allows for negative numbers. That would imply that there was a meaningful difference in the order of the two columns. If the order of the two columns isn't meaningful (it doesn't matter what data goes in col.1 and what data goes in col.2, since the trials were not completed in a set order) then shouldn't the "Difference" column transform the difference into the positive number? (For example, the -1 above would become a 1). Or is the order of the columns meaningful somehow that it would need to be preserved through the negative number?

posted by amethysts to Grab Bag (13 answers total)

What is your study? Whether or not these negative numbers matter is entirely dependent on what you're doing & looking at.
posted by brainmouse at 11:48 AM on November 13, 2009

Not sure what the question has to do with paired t-tests, it seems like a question about Excel, or whatever spreadsheet software you are using. I assume whoever set up the "difference" column set it to be SubjectA - SubjectB. Just change the definition to be abs(SubjectA - SubjectB).
posted by sophist at 11:48 AM on November 13, 2009

I like to explain a paired t-test like this: the purpose of it is to take meaningful pairs of measurements where both items in the pair are similar in some respect.

One of the most common uses of a paired t-test is for a "before treatment" and "after treatment."

Say you have an experiment where you give children in a class a math test, then tutor them, and then give the math test again. If each and every student improves by 1 or 2 points consistently, then the tutoring "worked".

It may be the case that if you analyzed the "before" scores and found a mean of 71% with an SD of 15, and the after scores were a mean of 73% with an SD of 15, then this likely would not come out to be significant, if you analyzed the results as a two-sample t-test. The bell curves would be overlapping quite a bit.

But, if you took the "after" score for each person, and subtracted the "before" score, you might get a significant result. In effect, you are saying that for the purposes of assessment, each of the "before" and "after" scores themselves are not as important as the difference between them and whether that difference is reliable across subjects.

So, for the paired t-test, which column you subtract from which does not effect the significance, but it will effect the interpretation of the results as showing that tutoring led to an improvement or decrement in scores. Also, it is important that all of the "before" go in the same column, and all "after in the other column.
posted by Maxwell_Smart at 11:51 AM on November 13, 2009

Of course, in contrast to what Maxwell Smart says, if your question is "does doing X CHANGE the scores," but it is irrelevant whether the scores go up or go down, then what you want is an absolute value of the difference of the scores, not a direct after-before, because you may cancel out any results if half the people do better and half the people do worse, but each person is having a large net change.
posted by brainmouse at 11:53 AM on November 13, 2009

And, if I did not make this significantly clear, I meant that both items in the pair must be related in some way. In my example, the scores in each pair are related in the sense that they come from the same student.

In your example with columns titled "Subject A" and "Subject B", if these represent different people, most likely a t-test would not be applicable here.
posted by Maxwell_Smart at 11:54 AM on November 13, 2009

The study is for one participant to do a set of tasks on website A and then do the same set of tasks on website B. The numbers represent the number of seconds it took to complete. Each row represents one participant's data. I should have set that up more clearly but I'm on a bit of a deadline, my fault.

So, for the paired t-test, which column you subtract from which does not effect the significance, but it will effect the interpretation of the results as showing that tutoring led to an improvement or decrement in scores.

In this test, the subjects weren't presented with the websites in the same order (by design), therefore the order of them shouldn't matter.

We received the spreadsheet from a statistics expert, so I was wondering if he set it up this way because it's all part of the mysterious magic of the formula that I didn't understand. But apparently it's not, and it should be an absolute value in this case. I get it now! Thanks guys.
posted by amethysts at 12:37 PM on November 13, 2009

I don't think you've come to the right conclusion.

Your columns should be "result on column A" and "result on column B", regardless of which order they visited the sites, and then you should NOT use the absolute value, you should arbitrarily choose which site to subtract from the other, and you should make sure to make it clear which you subtracted from the other when you present the results.
posted by brainmouse at 12:58 PM on November 13, 2009

Oh OK, so it is part of the magic of the formula. I'm just not clear on why one site gets subtracted from another, since it's not really a "before" and "after" situation. It's fine if I don't get it, I just want to make sure I'm doing the right thing.
posted by amethysts at 1:01 PM on November 13, 2009

Because you care if people did faster on one site than the other. If all you cared about was whether people did DIFFERENT on one site or the other, you would use an absolute value. If you care whether the 2 websites had different effects (e.g., one was, by and large, better than the other), you would use a difference.

Let me explain a scenario where you would use the absolute value (these numbers are fudged, but the results are real):

Let's say you wanted to see how good people are at estimating their own weight, and you want to look at men vs. women.

So you have 2 columns: an actual weight column, and an estimated weight column.

Men tend to be off by 5-10 pounds. However, they tend to be off in random directions from their actual weight.

Women tend to be off by 5-10 pounds. However, they tend to always estimate low.

So if you use a raw difference, what you will get is that men will average out to having ~0 difference between estimated weight and real weight, and women will have ~5-10 pounds difference.

If you use an absolute value difference, however, what you will get is that men and women both are off by ~5-10 pounds.

So which you use depends on what question you're asking: For the first one, the question is how much do people underestimate their weight. For the second, the question is how incorrect are people on their weight estimations.

For your example you care about WHICH website is faster and which is slower. You don't care about HOW DIFFERENT are the two websites, overall.

Did that make any more sense?
posted by brainmouse at 1:08 PM on November 13, 2009

Ah, I see! Thank you!
posted by amethysts at 1:13 PM on November 13, 2009

Actually, I disagree with brainmouse, and I'll come back to that after a bit of exposition.

To be specific, I think that in your design, the sign matters. First, there are two understandings of the word difference. Our conventional understanding of the word "difference" is that difference is always positive, and it is the magnitude of distance between two numbers. Like abs(a-b), which is the same as abs(Mathematicians might talk about the "signed difference" which means the magnitude, plus a sign (+ or -) that indicates the direction of the difference. In this case "a-b" gives the same result as "b-a" EXCEPT for the sign. A (+) sign means that a is bigger, and a (-) sign means that b is bigger.

I assume that for your purposes you want to find out whether a participant is faster on website A or website B. (as a result of a specific experimental manipulation that differs between the website, like A has the menu at the top, and B has the menu at the bottom).

Now, since you are a good experimenter, you realized that there may be an order effect-- a participant will most likely be faster on the second website than the first. So you randomized the order that the websites are presented. That is a good thing. The ordering of the websites should not have anything to do with which column the data is placed in. All numbers for website A, regardless of whether the participant saw A first or not, should be in the first column, and all numbers for website B should be in the second column.

If this case is followed, then if website A has lower times than B, each of the *signed* differences will more likely be negative. A paired t-test that is significant might confirm that A is faster than B. If B is faster, the differences will more likely be positive, and a significant result would prove B faster. In either case, the null hypothesis is that the times are the same for website A and website B, and if there is no pattern of the signed differences tending to be positive or negative, then you will not be able to reject the null, and you will not have a significant result.

So, what I have outlined is how to test whether website A or website B is faster, with A and B representing the experimental manipulation of interest, and not the confound (i.e., which website was presented first).

The suggestion was that you could use abs(a-b) to see "does doing X CHANGE the scores"-- well, first of all, abs(a-b) is always positive. So a t-test with the null hypothesis that abs(a-b) = 0 is biased towards finding a significant result. I mean, what are the odds that subject 1 navigates "website a" in exactly 60.23 seconds, and "website b" in exactly 60.3 seconds as well? You expect variation, and you expect abs(a-b) to always be positive, and for that degree of positivity to correspond to the natural variation in the length of time that it takes someone to do something twice-- in other words the error term. Also, if the times for a and b are distributed normally, abs(a-b) will no longer be a normal distribution. You do not want to do this.
posted by Maxwell_Smart at 1:51 PM on November 13, 2009

Oops, ok, now on further viewing, I agree with brainmouse, and see that we are saying about the same thing, or at least coming to the same conclusion that in this case you most probably want to use the signed differences.

And about using abs(a-b) as a measure, that does not make a lot of sense, unless you introduce another variable (like gender) to use as your comparison of interest.

So, in conclusion, I agree with him now.
posted by Maxwell_Smart at 1:55 PM on November 13, 2009

This has been so helpful. Thank you so much for your detailed responses!
posted by amethysts at 2:20 PM on November 13, 2009

« Older Sci-fi tv episode | How to get a thin even layer of Marmite on toast? Newer »

This thread is closed to new comments.

Ask MetaFilter

Paired T Test sounds like parakeet test to me
November 13, 2009 11:37 AM Subscribe

Tags

Share

Paired T Test sounds like parakeet test to me November 13, 2009 11:37 AM Subscribe

Tags

Share

Paired T Test sounds like parakeet test to me
November 13, 2009 11:37 AM Subscribe