Now, let's consider what happens when we are given one of the two outcomes
of each event (a,b) from
consistently, and we want to deduce the other. To be specific, let the
be generated N times, and let the value of B be seen each
time. What is the asymptotic average log number of ways to see A
given that B is seen each time? Well, let
be the number of times that
occurs, and for these occurrences of ,
count the occurrences of .
Define the vectors
then for a fixed value
of B there are
ways that the A values could be distributed for this . Taking the product of these numbers of ways gives us the number of ways that the A values could be distributed given the B values. This is
Taking the logarithm and doing the asymptotics gives gives us
where we have . Averaging by dividing by N gives us the entropy of A given B, or . Note that . Similarly, . Note that we could have found the log number of ways that A could occur given as , and then noted that asymptotically to average this and find the result above.
After working through these examples, the interpretation of entropy
as an uncertainty - an additive quantity representing the state of ignorance
of the outcome - is straightforward. For example, if A is determined
by B, then there is no uncertainty in A given B, immediately ;
further there is no more uncertainty in the joint distribution then there
is in the distribution of B, i.e. S(A,B) =
S(B). Finally, note that the quantity
gives the uncertainty change between not knowing B and knowing B,
and is called the mutual information. It is symmetric in its arguments,
and can be written as
The mutual information is clearly a quantity that for two random variables can be labeled the information about one variable that is in the other, and vice-versa. It is the information that each random variable shares about the other. In section 3.12 higher order information functions of this nature are defined, the information correlation functions, and these can be interpreted as the information between a set of random variables.
There are several other information functions that are of interest. We may define the redundancy of one random variable in another as the mutual information of the two. We might also define the normalized redundancy of two random variables as the mutual information divided by the joint information (entropy), M(A,B)/S(A,B). This is a quantity that has value zero only for independent processes, and has value one when one process completely determines the other. For two or more random variables the redundancy has been defined as the sum of the single entropies minus the joint entropy, [66, 93]. This redundancy is distinctly different from that of the information correlation functions to be defined in section 3.12 When there are only two processes this is the mutual information. A measure of correlation has been defined as 1-S(B|A)/S(A) . Note that this is asymmetric in the processes. It is 0 when the entropy of B given A is equal to the entropy of A, which for identically distributed variables occurs only when they are independent. A symmetric function with similar properties is 2(1-S(A,B)/(S(A)+S(B)))=2M(A,B)/(S(A)+S(B)).