# How would I pick which equation to use?

November 4, 2009 10:49 PM Subscribe

I want to determine the difference between distributions. When would I use kl-divergence and when would I use rmse? It seems like both equations reduce deviation to a single number, but couldn't find a comparison between them.

I generally see KL-divergence for this. I haven't seen RMSE used to measure the difference between probability distributions; where have you seen this? And, yes, what are you using this for?

The Dictionary of Distances says the following distances are usually used for probability distributions: "Bhattacharya 2, Hellinger, Kullback-Leibler and (especially, for histograms) Χ

posted by grouse at 5:18 AM on November 5, 2009

The Dictionary of Distances says the following distances are usually used for probability distributions: "Bhattacharya 2, Hellinger, Kullback-Leibler and (especially, for histograms) Χ

^{2}, Kolmogorov-Smirov, Kuiper distances." And, of course, it has a whole chapter on other distances for probability distributions, with precious little guidance on why you would want to use one over another.posted by grouse at 5:18 AM on November 5, 2009

Thanks for the initial responses.

I guess what I'm wondering then is why isn't rmse used to calculate divergence between distributions? What are the advantages of a distance metric giving a real number?

posted by lpctstr; at 4:31 PM on November 5, 2009

I guess what I'm wondering then is why isn't rmse used to calculate divergence between distributions? What are the advantages of a distance metric giving a real number?

posted by lpctstr; at 4:31 PM on November 5, 2009

RMSE is used to measure the error of a model relating to a dataset. It is not meant to measure a distance in a function space. In particular, usually you try to minimize the MSE during model fitting. This notion isn't really the same as comparing two probability density functions. KL-divergence is useful for quantifying the amount of information gained by a Bayesian update ("as more data becomes available, how much does this data affect the resulting distribution?"), but is not a metric since it is not symmetric. Other metrics reflect different properties. The inner product in some functional spaces (the space of square-integrable functions, for instance), for instance, is very useful in defining conditional expectation in probability.

Really, it all depends what you're trying to get out of a notion of distance between two distributions, what the distributions are, etc.

posted by devilsbrigade at 8:19 PM on November 5, 2009

Really, it all depends what you're trying to get out of a notion of distance between two distributions, what the distributions are, etc.

posted by devilsbrigade at 8:19 PM on November 5, 2009

To echo the above, most of the time I'm interested in showing that two distributions are the same (or in some limit so) and use the divergence measure which I can control in that setting.

posted by a robot made out of meat at 9:01 AM on November 9, 2009

posted by a robot made out of meat at 9:01 AM on November 9, 2009

This thread is closed to new comments.

posted by devilsbrigade at 1:06 AM on November 5, 2009