How would I pick which equation to use?
November 4, 2009 10:49 PM   Subscribe

I want to determine the difference between distributions. When would I use kl-divergence and when would I use rmse? It seems like both equations reduce deviation to a single number, but couldn't find a comparison between them.
posted by lpctstr; to Science & Nature (5 answers total) 3 users marked this as a favorite
 
Depends what you mean by difference. KL is not a distance metric. There are distance metrics on function spaces, but that may or may not be what you want. Any distance metric will give you a real number. You'll have to provide more information of what you're after...
posted by devilsbrigade at 1:06 AM on November 5, 2009


I generally see KL-divergence for this. I haven't seen RMSE used to measure the difference between probability distributions; where have you seen this? And, yes, what are you using this for?

The Dictionary of Distances says the following distances are usually used for probability distributions: "Bhattacharya 2, Hellinger, Kullback-Leibler and (especially, for histograms) Χ2, Kolmogorov-Smirov, Kuiper distances." And, of course, it has a whole chapter on other distances for probability distributions, with precious little guidance on why you would want to use one over another.
posted by grouse at 5:18 AM on November 5, 2009


Response by poster: Thanks for the initial responses.

I guess what I'm wondering then is why isn't rmse used to calculate divergence between distributions? What are the advantages of a distance metric giving a real number?
posted by lpctstr; at 4:31 PM on November 5, 2009


RMSE is used to measure the error of a model relating to a dataset. It is not meant to measure a distance in a function space. In particular, usually you try to minimize the MSE during model fitting. This notion isn't really the same as comparing two probability density functions. KL-divergence is useful for quantifying the amount of information gained by a Bayesian update ("as more data becomes available, how much does this data affect the resulting distribution?"), but is not a metric since it is not symmetric. Other metrics reflect different properties. The inner product in some functional spaces (the space of square-integrable functions, for instance), for instance, is very useful in defining conditional expectation in probability.

Really, it all depends what you're trying to get out of a notion of distance between two distributions, what the distributions are, etc.
posted by devilsbrigade at 8:19 PM on November 5, 2009


To echo the above, most of the time I'm interested in showing that two distributions are the same (or in some limit so) and use the divergence measure which I can control in that setting.
posted by a robot made out of meat at 9:01 AM on November 9, 2009


« Older And a turkey in a cranberry bush…   |   How to make the NDSlite display PDFs Newer »
This thread is closed to new comments.