Now, let's consider what happens when we are given one of the two outcomes
of each event (a,b) from
consistently, and we want to deduce the other. To be specific, let the
events from
be generated N times, and let the value of B be seen each
time. What is the asymptotic average log number of ways to see A
given that B is seen each time? Well, let
be the number of times that
occurs, and for these occurrences of
,
let
count the occurrences of
.
Define the vectors
and
;
then for a fixed value
of B there are
ways that the A values could be distributed for this
.
Taking the product of these numbers of ways gives us the number of ways
that the A values could be distributed given the B values.
This is
Taking the logarithm and doing the asymptotics gives gives us
where we have
.
Averaging by dividing by N gives us the entropy of A given
B, or
.
Note that
.
Similarly,
.
Note that we could have found the log number of ways that A could
occur given
as
,
and then noted that asymptotically
to average this and find the result above.
After working through these examples, the interpretation of entropy
as an uncertainty - an additive quantity representing the state of ignorance
of the outcome - is straightforward. For example, if A is determined
by B, then there is no uncertainty in A given B, immediately
;
further there is no more uncertainty in the joint distribution then there
is in the distribution of B, i.e. S(A,B) =
S(B). Finally, note that the quantity
gives the uncertainty change between not knowing B and knowing B,
and is called the mutual information. It is symmetric in its arguments,
and can be written as
The mutual information is clearly a quantity that for two random variables
can be labeled the information about one variable that is in the other,
and vice-versa. It is the information that each random variable shares
about the other. In section 3.12
higher order information functions of this nature are defined, the information
correlation functions, and these can be interpreted as the information
between a set of random variables.
There are several other information functions that are of interest.
We may define the redundancy of one random variable in another as
the mutual information of the two. We might also define the normalized
redundancy of two random variables as the mutual information divided by
the joint information (entropy), M(A,B)/S(A,B).
This is a quantity that has value zero only for independent processes,
and has value one when one process completely determines the other. For
two or more random variables the redundancy has been defined as the sum
of the single entropies minus the joint entropy,
[66,
93].
This redundancy is distinctly different from that of the information correlation
functions to be defined in section 3.12
When there are only two processes this is the mutual information. A measure
of correlation has been defined as 1-S(B|A)/S(A)
[16].
Note that this is asymmetric in the processes. It is 0 when the entropy
of B given A is equal to the entropy of A, which for
identically distributed variables occurs only when they are independent.
A symmetric function with similar properties is 2(1-S(A,B)/(S(A)+S(B)))=2M(A,B)/(S(A)+S(B)).