February 23, 2009 4:34 AM Subscribe

[Statistics Filter]: What's the best way to calculate a meta-median?

I'm working with aggregated benchmarking data, and don't have access to the the raw data behind them. The benchmarks include averages, percentiles, number of measurements, and medians (usually no standard deviations or variances).

What would be the best way to calculate a meta-median, i.e. a median based on several median values from different benchmarks? Should I:

- take the median of medians

- average median values (maybe weighted?)

- use any other kind of approximation from the data I have?

Thanks for your help!
posted by lord_yo to Science & Nature (4 answers total) 2 users marked this as a favorite

I'm working with aggregated benchmarking data, and don't have access to the the raw data behind them. The benchmarks include averages, percentiles, number of measurements, and medians (usually no standard deviations or variances).

What would be the best way to calculate a meta-median, i.e. a median based on several median values from different benchmarks? Should I:

- take the median of medians

- average median values (maybe weighted?)

- use any other kind of approximation from the data I have?

Thanks for your help!

As it turns out, there's no way to compare medians of multiple data sets.

Example: Start with the sets

{1, 2, 3}, Median 2

{1, 1, 1}, Median 1

Then the union of these has median 1, which is the minimum of the two medians. It's not hard to see that you can also get the maximum or any number in between which has denominator at most 2.

With extra information (such as the percentiles), you might be able to do more, but it's going to be very inexact. Calculating a median throws away almost all of your information about the data, unlike an average.

posted by TypographicalError at 5:59 AM on February 23, 2009 [1 favorite]

Example: Start with the sets

{1, 2, 3}, Median 2

{1, 1, 1}, Median 1

Then the union of these has median 1, which is the minimum of the two medians. It's not hard to see that you can also get the maximum or any number in between which has denominator at most 2.

With extra information (such as the percentiles), you might be able to do more, but it's going to be very inexact. Calculating a median throws away almost all of your information about the data, unlike an average.

posted by TypographicalError at 5:59 AM on February 23, 2009 [1 favorite]

I'll agree with TypographicalError on this one, but - there may be some characteristics between them which make your data sets somewhat able to be better described... but the fact that you don't know variances and standard deviations for each of your data sets pretty much leaves you with little useful information.

Now with that said, I've done some SPC work with averaged agregated data, but I had verified that my methods of data collection did not impact my averaged data, as well as confirmed that variances between groupings were within tolerance, and that each sample was taken identically. Without being able to decree that each piece of benchmarking data is applicable, significant, and taken under the same conditions... I'd be hesitant to use more than one piece of it at a time.

posted by Nanukthedog at 8:05 AM on February 23, 2009

Now with that said, I've done some SPC work with averaged agregated data, but I had verified that my methods of data collection did not impact my averaged data, as well as confirmed that variances between groupings were within tolerance, and that each sample was taken identically. Without being able to decree that each piece of benchmarking data is applicable, significant, and taken under the same conditions... I'd be hesitant to use more than one piece of it at a time.

posted by Nanukthedog at 8:05 AM on February 23, 2009

All of the tests that I can think of that do something that could be loosely described as comparing medians require additional data and don't sound like they're aimed at the purposes you have in mind. (Of course, I'm an amateur at best--my knowledge of these tests is more in the way of knowing they exist than in being skilled in selecting or applying them.)

The Median Test -- now in disuse

The Mann-Whitney U

Kruskal-Wallis Variance Analysis

What are you trying to accomplish, more generally speaking? What relationships do you need to show/investigate?

posted by snuffleupagus at 7:10 AM on February 24, 2009

The Median Test -- now in disuse

The Mann-Whitney U

Kruskal-Wallis Variance Analysis

What are you trying to accomplish, more generally speaking? What relationships do you need to show/investigate?

posted by snuffleupagus at 7:10 AM on February 24, 2009

This thread is closed to new comments.

posted by shothotbot at 5:32 AM on February 23, 2009