How would I pick which equation to use?
November 4, 2009 10:49 PM   RSS feed for this thread Subscribe

I want to determine the difference between distributions. When would I use kl-divergence and when would I use rmse? It seems like both equations reduce deviation to a single number, but couldn't find a comparison between them.
posted by lpctstr; to science & nature (5 comments total) 3 users marked this as a favorite
Depends what you mean by difference. KL is not a distance metric. There are distance metrics on function spaces, but that may or may not be what you want. Any distance metric will give you a real number. You'll have to provide more information of what you're after...
posted by devilsbrigade at 1:06 AM on November 5


I generally see KL-divergence for this. I haven't seen RMSE used to measure the difference between probability distributions; where have you seen this? And, yes, what are you using this for?

The Dictionary of Distances says the following distances are usually used for probability distributions: "Bhattacharya 2, Hellinger, Kullback-Leibler and (especially, for histograms) Χ2, Kolmogorov-Smirov, Kuiper distances." And, of course, it has a whole chapter on other distances for probability distributions, with precious little guidance on why you would want to use one over another.
posted by grouse at 5:18 AM on November 5


Thanks for the initial responses.

I guess what I'm wondering then is why isn't rmse used to calculate divergence between distributions? What are the advantages of a distance metric giving a real number?
posted by lpctstr; at 4:31 PM on November 5


RMSE is used to measure the error of a model relating to a dataset. It is not meant to measure a distance in a function space. In particular, usually you try to minimize the MSE during model fitting. This notion isn't really the same as comparing two probability density functions. KL-divergence is useful for quantifying the amount of information gained by a Bayesian update ("as more data becomes available, how much does this data affect the resulting distribution?"), but is not a metric since it is not symmetric. Other metrics reflect different properties. The inner product in some functional spaces (the space of square-integrable functions, for instance), for instance, is very useful in defining conditional expectation in probability.

Really, it all depends what you're trying to get out of a notion of distance between two distributions, what the distributions are, etc.
posted by devilsbrigade at 8:19 PM on November 5


To echo the above, most of the time I'm interested in showing that two distributions are the same (or in some limit so) and use the divergence measure which I can control in that setting.
posted by a robot made out of meat at 9:01 AM on November 9


« Older Celebrating Thanksgiving by my...   |   Google-fu failed, has anyone f... Newer »

You are not logged in, either login or create an account to post comments