Constructin a relatively simple econometric index.
September 7, 2008 6:12 AM   Subscribe

I am trying to construct a index with variables such as population density, population age, gross individual income and the likes. Yet as it is my first attempt, I certainly could use directions to books/websites/papers on the subject. A more detailed explanation with math details is avaiable inside.

The index would be calculated as follows:

i=sum(xpos_i*alpha_i) - sum(xneg_i*alpha_i)

whereas:

i index is a "how 'nice' the area is" index comparing different areas.

sum(xpos_i*alpha_i) is the the sum of all the positive xpos variables (believed to be desiderable) each one weighted with an alpha_i [0,1] according to how much I believe it should weight.

sum(xneg_i*alpha_i) is the sum of all the positive xneg variables (belived to be NOT desiderable) each one weighted with an alpha_i [0,1] according to how much I believe it should weight.

The general idea is that all the Areas A to Z present a set of variables, such as population density, that are either desiderable or indesiderable. For instace, population density is related to overcrowding and would, therefore, be considered as an xneg ; yet as popdensity has necessarily positive values, it would be included and in xneg set so that it's positive value is turned into a
negative by the "-" operator in the index. The surface of avaiable public gardens is believed to belong to the xpos set, as gardens provide a nice sight and an entertainment area. The areas with a greater surface of gardens would, therefore, increase the value of xpos more than other areas that have got less surface dedicated to gardens.

This all makes some sense to me, but as I am sailing in uncharted waters without a guide, any compass would be appreciated :) !
posted by elpapacito to Education (6 answers total)
 
If you're willing to not assign positive and negative values: just throw it into principal components factor analysis.

If you're not, and this isn't uber-serious: divide each variable by its standard deviation, and forget about the alphas unless you want to require that this is twice as important as that.

If it's uber-serious: do what the relevant literature does.
posted by ROU_Xenophobe at 6:32 AM on September 7, 2008


ROU_Xenophobe's suggestion of divding by standard deviaton is actually pretty important because it will normalize your numbers. Imagine that you have both number of librarys and square feet of public garden in your function. Librarys will be a small integer (<1>1,000). If you just include both these numbers then the big one will swamp the results. Normalization also allows you to see how important a number is relative to that numbers variation.

I would go one step further and subtract the mean from the number, so now you have what's called a Z-Score. What goes into your function is (x - mean(x)) / stedv(x). The variables are now all comparable and have an intuitive interpretation. A number > 2 is "lots" above average and a number <>
On coefficients, your alphas: if you are doing this for yourself just make up whatever alphas you want. If you are doing this for any sort of public consumption you will need to put a lot more thought into it.
posted by shothotbot at 8:13 AM on September 7, 2008


My comment above got a little borked. I meant to say "A number > 2 is "lots" above average and a number <>
posted by shothotbot at 10:50 AM on September 7, 2008


OK, I see what happened. Shot 0, HTML 2. I really meant to say that A number grater than two 2 is "lots" above average and a number less than -2 is lots below zero. Once all your numbers are z-scores it is correct, or correct enough for your purposes, to add them all up (signs changed as you feel is appropriate) and call the result your index.
posted by shothotbot at 10:53 AM on September 7, 2008


ROU is right. If you want this to be meaningful to anyone, look at the relevant lit (if I understand you, urban development/econ) and do as they do. If you do not know what the relevant lit is, then you should figure that out.
posted by a robot made out of meat at 11:47 AM on September 7, 2008


Response by poster: Thanks for your help so far. Indeed I think I will normalize and use z-score in order to get rid of all units of measure and absolute values influencing the value of the index.

As for the alphas, as I plan to eventually submit this as a mini "test work" , I think I should be at least state that an analysis of correlation of the various xi's alphas , with respect to quality of life, should be attempted, in order not to just make up alphas with pure guesstimation.

So far my three books on statistic have been rather useful, yet I still can't find any "worked example" text allowing me to grasp more intuitively some of the applications. It's good exercise, tho.
posted by elpapacito at 12:17 PM on September 7, 2008


« Older Cleaning out the rubber stamp stash   |   What are some audio editing jobs a public radio... Newer »
This thread is closed to new comments.