# How to use kl-divergence for distributions with some non-overlapping elements?

January 11, 2010 4:10 PM Subscribe

How do I find the divergence between 2 distributions if there are elements in distributionP that are not in distributionQ and vice versa? I would typically use kl-distance but if I use that directly and disregard the divide-by-zero, I can get negative values (which should be impossible).

For example, if distributionP is {'t', 'e', 's', 't', '1'} and distributionQ is {'t', 'e', 's', 't'}, then calculating the kl-divergence and ignoring the '1' will result in a negative value, according to my calculations.

I hope this makes sense.

For example, if distributionP is {'t', 'e', 's', 't', '1'} and distributionQ is {'t', 'e', 's', 't'}, then calculating the kl-divergence and ignoring the '1' will result in a negative value, according to my calculations.

I hope this makes sense.

I'm trying to capture character overlap between words, but edit distance is not exactly what I want. I changed my metric to TV(P, Q). Thanks so much!

posted by tasty at 4:56 PM on January 11, 2010

posted by tasty at 4:56 PM on January 11, 2010

You are trying to capture the overlap between words treated as bags of symbols rather than strings? That is, the order within the word is not important? How about the bag distance metric?

Without knowing anything else about your problem, treating words as discrete probability distributions in the way you are doing seems bizarre.

posted by grouse at 5:15 PM on January 11, 2010

Without knowing anything else about your problem, treating words as discrete probability distributions in the way you are doing seems bizarre.

posted by grouse at 5:15 PM on January 11, 2010

« Older Will our age gap be something to worry about when... | Not only do I "do more," I don't really even do my... Newer »

This thread is closed to new comments.

That said, what are you actually trying to do? You say 'the divergence' but there are many different divergences such as Total Variation : TV(P,Q) = 1/2 * sum_x | P(x) - Q(x) | or Hellinger. Or, there are plenty of other functions that you can write down. However, without knowing what properties you want from your divergence, it's difficult to say which is the appropriate one.

posted by bsdfish at 4:32 PM on January 11, 2010