Standardized tests, as a way to evaluate and rank public high schools
January 16, 2020 12:34 PM   Subscribe

In Ontario, we have EQAO testing. Fraser Institute (which has conservative ties) uses this data to rank schools. I don’t believe these standardized tests and rankings are particularly useful in evaluating whether a given high school will better serve the average student than a lower ranking high school (assume no special needs or disadvantaged socio-economic circumstances etc). Give me your succinct arguments supporting this position.
posted by walkinginsunshine to Education (15 answers total) 1 user marked this as a favorite
 
In the United States we have standardized test scores and the only thing they were shown to meaningfully correlate with was family income. So, to extrapolate, all these rankings will show is the wealth of the families of the student population.
posted by jessamyn at 12:38 PM on January 16 [13 favorites]


In addition to the great point about income from Jessamyn, recent immigrants and new English language learners don’t do as well on those tests, on average, as kids who grew up in the country/culture the test comes from. If you value a globally diverse school community, going by test scores isn’t the best way to find that
posted by horizons at 12:50 PM on January 16 [1 favorite]


Even if test scores correlated well overall academic ability or potential (which is not a safe assumption as jessamyn points out), they do not demonstrate what causes better or worse academic performance. A test score doesn't even attempt to show what caused any particular test-taker to get a high score--it could be superior pedagogy, or healthier students, or it could be the absence of some unknown pernicious factor.

It also doesn't follow that a school with higher average test-scores will be a better educational environment for any particular student than a school with lower averages. A school that expels all its lower performing students would dramatically boost its average scores but not necessarily improve its teaching performance. A standardized test is a measure of output, and even if you uncritically accept that the output represents academic achievement, that doesn't tell you what kind of inputs they're dealing with, and therefore doesn't tell you what kind of educational quality the school provides.
posted by skewed at 1:00 PM on January 16 [2 favorites]


The Worst 8th Grade Math Teacher in NYC. As discussed on MeFi.

PS: The margin of error is 35 - 53%.
posted by DarlingBri at 1:13 PM on January 16


The other really simple limitation on testing like this is that it ignores/handwaves cohort factors. Every year it's a new bunch of kids so it's hard to compare how a school is going over time.
posted by freethefeet at 1:25 PM on January 16 [1 favorite]


And teachers who know they are being evaluated by these test scores learn to teach to the test instead of teaching the actual subject matter.
posted by Obscure Reference at 2:12 PM on January 16 [1 favorite]


And teachers who know they are being evaluated by these test scores learn to teach to the test instead of teaching the actual subject matter.

That can be a feature, not a bug, if they weren't teaching much of anything before, and if students are being tested on relevant subject matter, and if the test scores are a good measure of how much they are learning. There are lots of problems with standardized testing, but it's not like they disrupted a universal utopia of deep learning.
posted by Mr.Know-it-some at 2:33 PM on January 16


What happens if a school is terrible but its in a richer area where the parents send their kids to Kumon or have tutors for them? Wouldn't the EQAO score for the school be higher than its own teaching would warrant?

Plus with programs such as gifted and IB, their presence in a school will skew results higher but won't help if your kid isn't in that program, with the converse being if your kid is in one of those programs you likely wouldn't worry about a lower EQAO score because that doesn't reflect the teaching that your kid would receive.
posted by any portmanteau in a storm at 2:50 PM on January 16


In addition to the excellent points above, most large high schools have multiple teachers covering each major subject (English, Math, etc.). I think everybody has had teachers that were particularly good for them or bad for them in a given subject; some of this is general teacher skill and some of this is individual learning styles and preferences. It's entirely plausible that a specific child of median ability could experience very different results going through the same school taking the same classes depending on the set of teachers they were assigned. (Or that two students of similar ability could take the same classes with the same teachers but experience different results.)

Fundamentally, there's so much noise in the inputs to standardized testing that the only signal they can meaningfully produce at the school-by-school level is socioeconomic status.
posted by Homeboy Trouble at 2:59 PM on January 16


One issue with this is that the direction of effects are muddy. Do "good" schools cause high test scores? Or do high test scores "cause" "good" schools (i.e., a cluster of good students, as an artifact of segregation, gives you the appearance of being a good school). From a U.S. perspective at least, there is a lot to unpack with school rankings like these because we are generally highly segregated, generally have a neighborhood school system, and as jessamyn already said, test scores are highly correlated with family socioeconomic status (which is also correlated with race). I'm sorry to say I don't know enough about Canada's education system, but I'd venture to guess much of this is applicable to some extent.

That said, peer effects are also a thing (students who attend schools with high-achieving peers tend to do a little better themselves). And these kinds of test scores/report cards are also self-perpetuating as homeowners and teachers might use them as a signal for a good place to live/work (and thus you get more resource-rich families and perhaps more experienced/better teachers). So the average student might be "better served" at a school with higher test scores, but it's not necessarily because of something inherent about the school, if that makes sense.
posted by kochenta at 3:01 PM on January 16


"won't help if your kid isn't in that program, with the converse being if your kid is in one of those programs you likely wouldn't worry about a lower EQAO score because that doesn't reflect the teaching that your kid would receive"

This, basically. So much of educational outcomes is the particulars: the kid, the family, the teacher(s). I tell this story often, so sorry if this is repetitive, but the school district I attended was, by standardized test scores, the worst in the state of Ohio at the time. Not entirely undeserved - we also had the highest teen pregnancy rate in the state, we were ahead of our time in the opioid crisis, and the Globe and Mail once proclaimed my town as the unhappiest city in the entire United States. All around, not the type of place you'd look at statistics and think "this is where my kid will have a chance to succeed". And yet, none of it really affected me or my friends, because we were in the AP program. I was Dean's List in college, handful of my classmates went to the Ivy League, a bunch got Ph.D.s, and, as I always like to remind people, a guy a couple years ahead of me has an EGOT (he was also one of the Ivy Leaguers). If you take the top 10% or so of our low performing school, it would compare favorably with many high performing schools. And the district was careful to segregate us, keeping us away from the general population (as much as possible - only so much you can do when you're walking back from the bathroom and a gang-related knife fight breaks out). So yeah, we had our own little spot where we could flourish, and given the chance, I'd never trade that for a private school experience. So yeah, unless your kid is the exact average student, their experience is going to be fairly hard to predict without a lot more information.
posted by kevinbelt at 5:38 PM on January 16 [1 favorite]


This year, my kids’ teacher sent home a letter asking us to opt out of the BC provincial testing. Here’s the links that convinced me to opt out, basically it boils down to the Fraser Institute abusing the data:

https://curriculum.gov.bc.ca/sites/curriculum.gov.bc.ca/files/pdf/agpa_report.pdf

https://www.bctf.ca/fsa.aspx
posted by Valancy Rachel at 5:53 PM on January 16 [1 favorite]


Thinking of the Ontario context specifically:

(1) Your standardized testing in high school only takes place in Grades 9 (for math) and 10 (for literacy), which is a bit different from some of the other provinces the Fraser Institute does school rankings for, where diploma exams take place in Grade 12. Given that difference, you could argue that in Ontario, standardized test rankings really only account for achievement in the very early half of a student's high school career. Unlike in jurisdictions where standardized testing happens at the end of high school, the EQAO test suite doesn't really attempt to sum up a student's 4-year exposure to the curriculum.

(2) The other argument in Ontario is that there is such a vast EQAO achievement gap between applied and academic students that rankings are, to some extent, a function of a school's applied/academic mix. The average student province-wide takes mostly academic Grade 9 and 10 courses, and the vast majority of who take Grade 9 Academic math perform at or above the provincial standard on their EQAO test; what happens with the applied cohort has little resemblance to their high school experience.

FWIW, the high school I went to a million years ago has generally been in the top 10 public high schools in Ontario since the Fraser Institute started its rankings, but the average student there (as well as at most of the schools I recognize at the top of the list) isn't the average Ontario high school student. To any portmanteau in a storm's point, these schools mostly have very large gifted/IB/FI/EF/regional arts program enrolments and relatively small numbers of students taking applied courses. These schools aren't even average among the affluent suburbs they mostly serve.
posted by blerghamot at 6:41 PM on January 16 [2 favorites]


More to follow, but the tl;dr is that there are things you could do that could at least be sincere attempts to generate measures of how well schools are performing from test scores. Fraser isn't doing that, so they aren't even trying to estimate how well schools are performing. Their ranking is entirely meaningless.

There are things you could in principle do to generate reasonable measures out of test data, but that "in principle" is doing an awful lot of work there. The basic idea would be something similar to what NYC was using with the value-added measure, linked before.

The basic idea was to measure the difference between how well students actually score and how well you would have expected them to score based on a huge array of variables about them. How would you expect someone to score with one parent earning $78K and the other staying at home, both of which have graduate degrees, both of which are native English speakers, who lives in this neighborhood, who has only one arm, etc etc etc? Then you compound that with an entirely different set of variables describing the school, since teachers have no immediate control over that larger learning environment, and probably some more to control for the time of day that class is taught and a whole bunch of other stuff. An enormous cross-classified multilevel model.

The core problem is that the empirical results of the model are crap. From dim memory of the stuff NYC was doing, some absurdly high proportion of teachers had margins of error around their ranking so wide that all you could say is "Not in the top 5%" or "Somewhere between very bad and very good." And, as the old discussion noted, low correlations within teachers teaching multiple grade levels. Which just means Mr. X was rated an excellent teacher for Grade 8 Underwater Basketweaving but terrible for Grade 9 Underwater Basketweaving, which is of course laughably implausible if this were a measure of teacher quality.

This failure isn't anything automatic or an inherently terrible way to try to measure this, it's just that student achievement is so noisy and random that trying to predict how well a group of 25-50 students is going to do is horrendously error-prone, and that error necessarily propagates forward into the rankings or ratings.

Anyway, you would know that Fraser was halfway serious about doing this if they had generated something like a value-added measure or some similar way to measure the gap between expected student performance and actual student performance from data Fraser might plausibly have obtained. If they had done this, their results would still almost certainly be garbage, but at least arguably-sincere garbage. They didn't even attempt to do that, so it's not even that good of garbage.
posted by GCU Sweet and Full of Grace at 9:51 PM on January 16 [1 favorite]


Test scores alone can’t tell you why they are high...are they high due to teaching to the test, external factors (e.g. parental involvement or high socioeconomic status), or many years of high quality instruction across multiple teachers and schools? In addition, standardized tests only measure a tiny portion about what school “should” be about... isn’t having access to extracurriculars, caring and competent teachers, and having a good peer group important too? Moreover, standardized tests can create perverse incentives. If teachers are evaluated on how their students perform, then they are less likely to want to teach the most difficult (lowest performing) students. If schools are being evaluated, then they may try to counsel out the weakest students to alternative schools. If the outcome matters to students, then they have the incentive to cheat, but if the outcome doesn’t, then they have no reason to try. Finally, the scores themselves assume that the test is a good measure of whatever it is supposed to measure and isn’t biased towards certain groups.
posted by oceano at 9:59 PM on January 16


« Older What is this Japanese cloud design element called?   |   ISO contact lenses that won't dry my eyes out. Newer »

You are not logged in, either login or create an account to post comments