Let's say you have a discrete probability distribution 'p' representing the actual distribution of the outcomes of rolling a fair six-sided die:
p(1) = 1/6
p(2) = 1/6
p(3) = 1/6
p(4) = 1/6
p(5) = 1/6
p(6) = 1/6
Now, let's consider a reference distribution 'q' that represents a biased die with the following probabilities:
q(1) = 1/2
q(2) = 1/6
q(3) = 1/12
q(4) = 1/12
q(5) = 1/12
q(6) = 1/12
To calculate the KL Divergence from 'p' to 'q':
KL(p||q) = (1/6) * log((1/6) / (1/2)) + (1/6) * log((1/6) / (1/6)) + (1/6) * log((1/6) / (1/12)) + (1/6) * log((1/6) / (1/12)) + (1/6) * log((1/6) / (1/12)) + (1/6) * log((1/6) / (1/12))
KL(p||q) ≈ 0.386
=> This result tells you that using the biased distribution 'q' to approximate the fair die distribution 'p' results in approximately 0.386 units of information loss per roll of the die.