nextuppreviouscontents
Next:Estimating functions of probability Up:Estimating information and correlation Previous:Inferring unknown parameters from

Inference increases information about the true distribution on average

Let the data be given one sample point at a time. The sample data are thus ordered and will be indicated by tex2html_wrap_inline15963. After each new sample is given an inference of the unknown parameters using Bayes' theorem, taking the form of the distribution of these parameters, will be made. The information in the inferred parameter distribution will be found, and surprisingly it will be shown that this information does not generally increase upon seeing new data. However, when this inferred information is averaged over possible data samples, the average information about the unknown parameters does increase when new data is seen. Thus, on average, more is learned about the unknown parameters when additional data is presented, but additional data can sometimes lead to more confusion.

The measure of the uncertainty of (and confusion about) the unknown parameters is the entropy of the distribution of the unknown parameters. Working in the probability density framework the entropy of the distribution of parameters after seeing n data samples tex2html_wrap_inline15967 is given by
equation4672
The change in the uncertainty of the parameters upon seeing the nth data sample is then given by tex2html_wrap_inline15983. At first it might be thought that this uncertainty should always decrease. However, this is not the case. Suppose, for example, that the mean of a gaussian distribution is to be inferred, while the width of the gaussian is known. Suppose further that n=2, and that tex2html_wrap_inline15987 and tex2html_wrap_inline15989 happen to lie very far apart from each other, which can happen by chance for gaussian distributed data. In this case, tex2html_wrap_inline15991 is going to be more sharply peaked than tex2html_wrap_inline15993. This is because the tex2html_wrap_inline15995 inference puts density at tex2html_wrap_inline15987, while the two sample tex2html_wrap_inline15999 inference puts density at both of the data locations, which are far apart. In fact, taking the uniform distribution for the parameter prior, the one sample inference is a single gaussian bump centered on the data, while, if the two data samples are very far apart, the two sample inference is two identical gaussian bumps (having half the height and the same width as the one sample inference gaussian bump), and the entropy of the two sample inference will be one bit greater than that of the one sample inference. Sometimes new data leads to increased confusion about the parameters.

Now, consider what happens when the average over data sets tex2html_wrap_inline15967 of the change in uncertainty (confusion) is taken. The average of interest is the average change in confusion when a new data sample is seen, given by
equation4691
The next step is to show that the average change in the confusion about the parameters is negative. To do this, note that for any function tex2html_wrap_inline16009
eqnarray4699
Now, expand the average change in the confusion written in equation9.3 as
eqnarray4725
and simplify the inner integral of the second term on the right side using the identity proven in equation 9.4 to find
eqnarray4742
Collect the logarithms and note that the integral over tex2html_wrap_inline13239 is a Kullback-Leibler distance, and allows us to apply the information identity (for probability densities tex2html_wrap_inline16109 and tex2html_wrap_inline16111tex2html_wrap_inline16113) to find that the average change in the confusion about the parameters is negative
eqnarray4758
Because negative uncertainty is information, a negative change in confusion corresponds to a positive change in information. Thus we have proven the following theorems:

: Information increases on the average. Although in particular data the information about the parameters may decrease upon seeing a new data sample, on the average the information about the parameters increases upon seeing a new data sample.

: Average information increase is the Kullback-Leibler distance. The average increase in the information about the parameters is the average of the Kullback-Leibler distance between the parameter distributions conditioned on the data after and before the new sample is seen.


nextuppreviouscontents
Next:Estimating functions of probability Up:Estimating information and correlation Previous:Inferring unknown parameters from
David Wolf

Tue Mar 25 08:11:49 CST 1997