Mutual information measures a relationship between two random variables which are sampled concurrently.
In other words, mutual information is an estimation of the amount of information in one variable that uncovers another variable. It tells how much information on one variable can averagely convey about the other.
The value of mutual information is 0 if the two random variables are statistically independent.
Mutual information can also be defined as “the reduction in uncertainty about a random variable given knowledge of another” [3]. Mutual information with a high value corresponds to a great reduction in uncertainty [3].
The equation for mutual information of two random variables X and Y where joint distribution is represented as PXY(x, y):
Where PX(x) and PY(y) are the marginals. Note:
[1] Erik G. Learned-Miller, Entropy and Mutual information, University of Massachusetts, Amherst - Department of Computer Science, 2013.
[2] Jens Christian Claussen, The Infomax principle: Maximization of Mutual Information, University of Lübeck, 2012.
[3] Peter E. Latham and Yasser Roudi, Mutual information, Scholarpedia, 2009.
[4] Frank Keller, Formal Modeling in Cognitive Science, University of Edinburgh - School of Informatics, 2006.
[5] Iftach Haitner, Joint & Conditional Entropy, Mutual Information, Tel Aviv University, 2014.
[6] What does maximizing mutual information do?, StackExchange.