Diversity/concentration measure for nonexclusive categories?
November 15, 2008 5:20 PM   Subscribe

Measures of diversity / fragmentation / concentration for nonexclusive categories?

So I have data on the financial interests of legislators, and one of the too many things I want to do with them is look at the diversity, fragmentation, or concentration of industries represented in the legislature.

This is pretty trivial for occupations, because if your occupation is "Lawyer," it isn't "Farmer." So for occupation, you can (and people have) just used the Herfindahl index from econ. For those playing the home game, this is the sum of the squares of the market shares.

But I have financial interests. And you can get income from a law firm and income from a farm at the same time. This means that I can't directly use the Herfindahl index, because now the sum of the market shares isn't 100%, it's 105 or 120 or 150%.

So does anyone know of a standard method or summary statistic used to measure diversity or concentration where any individual observation can be in more than one category at the same time? Like, something used to measure linguistic diversity that allows people to speak more than one language, or something else canned?

I can think of ways around this -- modifying the Herfindahl, or doing factor analysis and counting the number of recovered dimensions. But if there's something well-specified and simple from another discipline, I'd prefer to use that.
posted by ROU_Xenophobe to Science & Nature (10 answers total) 1 user marked this as a favorite
Are you looking at within-concentration or between-concentration for industries - in other words, are you hypothesizing about concentration across legislators or for each legislator? For either you could use HHI for total concentration (out of a total market share of all dollars donated) or for each legislator (for all dollars donated to them).

I am not sure concentration is meaningful if you are not doing it either of these ways, since the donors are not really in competition otherwise. You might want to look at simple correlated donations or use cluster analysis to tease out relations between various donor industries, however.
posted by blahblahblah at 6:44 PM on November 15, 2008

It's not totally clear to me what you're trying to do with your data but there are a lot of ecological techniques that look at both the number of different things in each observation and their relevant abundance. Might that work? One of the things you could do is group observations by those that are most similar in their composition.

It's hard to recommend a specific analysis without knowing more about the data set.
posted by fshgrl at 6:53 PM on November 15, 2008

To be clearer, here are the data I have:

For each legislator, I have a dummy variable denoting "Did you have income from that economic sector according to your mandatory conflict-of-interest filing?" across a variety of sectors. Not campaign donations, just where they get their own personal money apart from their legislative pay. The number of observations in each chamber is just the number of legislators.

So for each legislative chamber, I know that 30% received income from agriculture, 40% from law firms, 60% from unspecified general business, 15% from education, and so on.

There's an old article that looks at legislators' occupations, but with occupations you can just use a straightforward Herfindahl to say that Virginia, which is 60% lawyers, has a less diverse membership than some other legislature with 20% each of five occupations. Which is in fact what was done. There's other stuff I'm doing with the raw data, but I thought it would be fun to basically replicate the old study with the new data, with some extensions.

All I'm looking for is a reasonable and simple summary statistic for how diverse the distribution of financial interests is across chambers, taking into account that each legislator can have several financial interests. I do not need confidence intervals around them. Rescaling the Herfindahl as blabhblahblah describes is one of the fallbacks I have in my pocket.

Once I have them, I intend to dump them to a table and say "Well, lookee there! The old data said that X was more diverse than Y, and that's still true!." Maybe, at most, run some manner of regression-style (OLS or a count model, most likely) with the summary statistic as DV and a few IVs. High science this ain't; this is more an homage to the original study and the dude what did it.
posted by ROU_Xenophobe at 8:45 PM on November 15, 2008

Could you use a Shannon index to assign a simple diversity number to each legislator and use those numbers to come up with an average for the legislative body?
posted by kuujjuarapik at 9:14 PM on November 15, 2008

*looks that up*

Maybe. I'll try computing the measure for the old data and see if it returns reasonable results for those and for the new, and see if that returns something reasonable. But it looks like the index assumes that you can't be a velociraptor and a wombat at the same time.

But I don't really care whether individual legislators have a single financial interest or several -- I don't need the average of the diversity of individual members. All I'm looking for is "This legislature, with a whole bunch of financial interests "represented" and none of them dominant, is more diverse that that other one where half are lawyers, two thirds are businessmen, and that's all there are."
posted by ROU_Xenophobe at 10:20 PM on November 15, 2008

You could flatten the data for one legislature from a matrix (legislator×industry->revenue) to a vector (industry->revenue).

When flattening, if you consider legislator influence is measured in money, don't normalise, if you think legislator influence is measured in votes, normalise using the legislator's total revenue.

Then you can take a 1D diversity measure on that — Shannon index, Herfindahl index or such.
posted by Tobu at 5:42 AM on November 16, 2008

I do not have and cannot get data on revenue. Most states only require that you disclose any income sources over a limit, not how much that income was. So I know that Mr. X got money from farming and a small business and a state pension, but not how much money from any of them.
posted by ROU_Xenophobe at 7:57 AM on November 16, 2008

Well, that changes it a bit. My thought on the Shannon index was that you could assign each legislator a diversity number by calculating for example, a legislator with 25% income from A, 25% income from B and 50% income from C would be:
Legislator diversity number= .25(ln .25) + .25(ln .25) + .5(ln .5)
where .25, .25 and .5 would be the percentage of total income. If you don't have income but do have an amount of different sources of income, you could do something similar:
Diversity number= .33(ln .33) + .33(ln .33) + .33(ln .33) for a legislator with three incomes and
Diversity number= .5(ln .5) + .5(ln .5) for a legislator with 2 incomes, etc. where the % is simply 1 divided by the number of incomes.
You would get a different number for each level of income source, and you could average across the legislative body to compare one state to another. It wouldn't give much detail, but it might work.
posted by kuujjuarapik at 8:18 AM on November 16, 2008

Right, but that would give me the average income diversity of legislators, which isn't what I want.

I'm after the diversity of interests "represented" in the legislature -- is there a single dominant sector and some noise, two big sectors and some noise, or are the financial interests of legislators so scattered that many sectors are "represented" and none of them are obviously dominant? (This is not really ever the case given that "misc business" is always strongly represented because of its catchall nature)
posted by ROU_Xenophobe at 8:45 AM on November 16, 2008

Oh. I see. Nevermind, then.
posted by kuujjuarapik at 8:51 AM on November 16, 2008

« Older What was that weird car I saw?   |   Kitchenware that no good kitchen would be without? Newer »
This thread is closed to new comments.