Is there circularity in the methodology used to establish my students' educational goals?
September 16, 2010 2:59 PM   Subscribe

I have some questions about the use of normalized testing data to establish goals for struggling students.

First, some background information. I teach remedial middle school math, and my school is signing up for Pearson's AIMSweb program, which uses standardized tests of basic skills in reading and math to establish educational goals and to monitor progress.

Here is the process in a nutshell:

Students are given a test at the beginning of the year. Pearson/AIMSweb have a ton of data from students who have taken the test across the country.

Students who score in the below average range (the lowest quartile) are flagged, and receive interventions designed by a team of teachers.

These students take short tests every week, and their scores on these tests establish a rate of improvement which is then used to predict whether they will reach their goal on the test given at the end of the year.

Now, here is how the goals are established:

Using one method, we try to get a student into the "average" range, meaning we look for the score that students at the 25th percentile get on the end of the year test, and have the student shoot for that.

Using a different method, we look at the average rate of improvement for students at the 25th percentile, we double that, and use that to establish the students' educational goals.

Here are some examples, to clarify how all of this works:

Suzie is a fourth grader who read 50 words per minute correctly on the beginning-of-year test. This places her in the lowest quartile. Students who scored at the 25th percentile on the end of the year test read 68 words per minute, so that's the goal we set for her.

Frank is a fourth grader who read 30 words per minute correctly on the beginning-of-year test. This places him in the lowest 10% of students at his grade level. The students who scored at the 25th percentile on the end of the year test had an average rate of improvement of .5 words per minute every week. To establish Frank's goal, we double this rate of improvement to 1 whole word per minute per week, multiply that by 36 weeks between the two tests, and add that to his base score of 30 words per minute to get a target of 66 words per minute on the end of the year test. This doesn't quite get him out of the lowest quartile, but it's still quite a bit of improvement.

Now, for my questions:

It seems to me that there's an inherent circularity to using averages to establish educational goals. Asking a student to DOUBLE the average rate of improvement, for example, seems to be asking a lot! My understanding of averages tells me that for Frank to improve at twice the average rate, there needs to be the equivalent of another student out there who does not improve at all!

This is all would all be well and good if the student's improvement came at the expense of people's improvement in the top 75%, but it would be REALLY weird for it to work out this way, and besides, our average rate of improvment was that of the 25th percentile. This, to me, implies that for each of these struggling kids who improved as fast as we wanted them to, there is the equivalent of a struggling kid who did not improve at all (or maybe several struggling kids who improved slightly less rapidly than we would have liked).

Another thing that would alleviate my apprehension about this would be if there were something "special" about the stuff that we're doing, since we're singling these students out for intervention. But Pearson/AIMSweb are giving us data that was aggregated from schools full of teachers also using this program, who are just like us and who are similarly working hard and implementing interventions for their kids.

Finally, maybe it IS possible that everyone in the country could score at the benchmarks set as described above. But wouldn't that mean that the scores were highly variable from year to year and therefore not reliable? The student still might wind up in the bottom 25%, because you can NEVER get everyone out of the lowest quartile.

I guess, in essence, my question is this. These two statements appear to me to be contradictory:

(1) It is possible for a large number of students to be successful at improving at an above average rate.

(2) The benchmarks scores used to establish these goals are stable and reliable.

I think #1 is the one that's bogus. Am I right?

A lot of effort is being put into this at my school, and teachers may be evaluated based on how successfully they meet these goals, not to mention the kids who are already struggling and who now have more testing to deal with.

Bonus question!

The main goal of this is to bring kids up to grade level NOT as defined by Pearson/AIMSweb but by state testing results. I asked the presenter if their data was correlated to the NJ ASK, and I was told that the scores on Pearson/AIMSweb's tests that predict with 70 or 80 percent confidence that a student would pass our state test are so high as to not be usable. I was also told that this was because our state test was "too hard." However, my statistical intuition tells me instead that this result shows that Pearson/AIMSweb's program simply isn't a good predictor of performance on the NJ ASK because the correlation is weak. What's the deal?
posted by alphanerd to Education (6 answers total) 1 user marked this as a favorite
 
Sped teacher here.

Basically? The deal here is the Pearson has been overpaid for what is essentially a reinvention of EVERY SINGLE INTERVENTION program since the beginning of time.

It won't change anything; it will help the kids as much as any dedicated teacher helps them, if they're lucky.

In short, it's a lot of crap. Marketing BS.

Yes, of course your state test is "too hard." So is mine in Massachusetts. And so is it in every single state when Pearson has sold you an intervention product.

Don't believe the hype. The entire thing is nonsense.
posted by dzaz at 3:14 PM on September 16, 2010 [1 favorite]


My understanding of averages tells me that for Frank to improve at twice the average rate, there needs to be the equivalent of another student out there who does not improve at all!

There is no natural law which demands that the average rate is unchanged by the intervention. They may also be calculating the reference distribution from kids who where not intervened on.


(1) It is possible for a large number of students to be successful at improving at an above average rate.

(2) The benchmarks scores used to establish these goals are stable and reliable.


If a large fraction (say, all) of the bottom 25% were targetted by a highly effective intervention then the 25%ile would move up, and recomputing "normal" based on the intervened upon group would drift upwards over time. Were that actually the case, it would be a good problem to have.

If the intervened upon group is small, or doesn't always make it out of the 25%ile, then the influence on "normal" won't be that much.

I don't know why you think it's not possible for targetted resources to cause students to improve more than they would have otherwise.
posted by a robot made out of meat at 5:24 AM on September 17, 2010


Response by poster: Thanks for the input, robot made out of meat.

It's not that I think it's not possible for targeted resources to cause students to improve more than they would have otherwise, it's that I think the "averages" we're being given are averages of students who are receiving targeted interventions just like ours are.

My suspicion is based on this: AIMSweb/Pearson uses proprietary tests, and the web portal we're using to keep track of our kids is (I think, and this is what everything hinges on) the very same system that aggregates the data that these averages are based on. It's being used by teachers just like us who are monitoring progress and using targeted interventions on their kids; otherwise they would be subscribing to it for no reason.

Interestingly, our school purchased the package only for students in grades 4 and 5, meaning students in grades 6, 7, and 8 will not be receiving these targeted interventions. This would have been a PERFECT opportunity for AIMSweb/Pearson to get control data on these grades. They could have offered us a discount on the software program in exchange for us permitting them to test these students who are not receiving interventions. I think schools piloting the program on only a few grades is a somewhat frequent phenomenon, and I'm puzzled as to why they didn't take advantage of this chance to get control data.

But of course I could be missing something huge.

Also, say everyone in the bottom 25% DID move up, so that none of them were below the previous cutoff for the bottom 25% the previous year. That would still leave me with my question about the reliability of the measure. I would say that the "correct" target for these students would have been where the cutoff for the bottom 25% ACTUALLY wound up, not where they were projected to have wound up.

Also, assuming that everyone in the bottom 25% at the beginning of the year test did better than everyone in the bottom 25% of last year's beginning-of-year test, I would be wondering why it was necessary to implement interventions for kids who, had they taken the test the previous year and gotten the same score, been deemed to have been performing adequately.
posted by alphanerd at 1:49 PM on September 17, 2010


Best answer: I don't work for them; I'm not here to defend any particular system. Their test may or may not be particularly valid or reliable. They may or may not have done a meaningful analysis of their overall dataset to design interventions. I'm just answering your question: your concerns about the validity of using (potentially) post-test results to set goals are not really something to worry about. The goals selected are semi arbitrary; the data they use is just setting a useful range.

It's not that I think it's not possible for targeted resources to cause students to improve more than they would have otherwise, it's that I think the "averages" we're being given are averages of students who are receiving targeted interventions just like ours are.

Then what the student is being told is where they need to be at the end of the year to no longer be behind, taking into account that other kids in the bottom quartile are receiving help also. That's as or more useful than your suggestion to use the cutoff were no other students being intervened on.

Also, say everyone in the bottom 25% DID move up, so that none of them were below the previous cutoff for the bottom 25% the previous year. That would still leave me with my question about the reliability of the measure.

I don't see why that would really say anything about the measure. It says either than there is massive negative longitudinal correlation or that the intervention works.

Also, assuming that everyone in the bottom 25% at the beginning of the year test did better than everyone in the bottom 25% of last year's beginning-of-year test, I would be wondering why it was necessary to implement interventions for kids who, had they taken the test the previous year and gotten the same score, been deemed to have been performing adequately.

They're still the weakest students. Quantiles don't tell you some things (the entire range may be adequate; the range may be extremely narrow). It's still not a crazy policy to say that the weakest students get extra help to bring them into the middle. The weakest students at Cal Tech are probably more math proficient than the strongest ones at 3rd tier colleges, but the school is still interested in bringing them to a more level performance with their peers.
posted by a robot made out of meat at 6:07 PM on September 17, 2010


Response by poster: Thanks again for the reply.

I think I've got a better idea what's going on here, and that I either misread or didn't fully comprehend what you wrote in your first response about what happens to the bottom 25%.

The place I'm at with this is that the feasibility of meeting goals set as described above hinges on just how many kids were getting the interventions in the bottom 25% of the sample we're using as a comparison. That is, if the sample is a true control, meaning NONE of them were getting the intervention, then of course I'd be inclined to believe that interventions would lead to improvement well above that seen in the control population.

However, if ALL of the kids in the reference population were getting the intervention, then I think the goal is being circularly defined. In this case, you don't have the interventions as an explanation for how you intend to get above average improvement, since your average is an average of kids who are already receiving interventions.

And if SOME of the kids in the reference population were getting the intervention, I suppose the fewer of them were receiving the intervention, the more likely it would be that those who were receiving it could exceed the average rate of improvement for all kids.

Do you think I have this right?

As a remark, I do believe that targeted interventions can lead to above average improvement (as compared to a control population), and I do think it's great to have well-defined goals for kids that are corellated to scores on state tests. My concern here is about the feasibility of reaching these goals, since the teachers in my school may wind up being evaluated on the basis of whether or not they reach them. And we've gotten way more information about how well this software defines goals and evaluates progress toward them than we've gotten about how realistic those goals are.

In any case, I really appreciate your insight.
posted by alphanerd at 5:45 PM on September 18, 2010


It's true that getting out of the end of year bottom 25% doesn't just mean "improving more than you would have" it means "improving more than you would have and possibly more than the other kids receiving intervention."

Simple illustration, suppose the 20 kids in a class have scores 1 to 20. The previous cutoff was 5, and the intervention adds 2.1 points to the bottom 4. One of those kids is now higher than the old cutoff but not out of the bottom 25%. On the other hand, if it added 8 points they would all be out of the bottom 4.

That effect is real; I'm just saying it doesn't invalidate using "6" as the goal for them. You could pick an even higher number as a reasonable target, it depends completely on how effective the intervention is. If your real concern is that the goals are unachievable, isn't that really an empirical question? You are (probably) being evaluated relative to other teachers, so the goal being high for everybody shouldn't matter.
posted by a robot made out of meat at 6:26 PM on September 18, 2010


« Older Name for a new sandwich   |   What's in the box? Newer »
This thread is closed to new comments.