next up previous contents
Next: The distribution of data Up: Estimating functions of probability Previous: Statement of the problem

Estimating from finite data is ubiquitous

The problem of estimating a function of an unknown distribution from finite samples of that distribution is ubiquitous in physics, arising for example in dimension estimation and in estimating correlations from data.

For example, in information dimension estimation, [31, 63] we imagine a discretization of a space containing an attractor. The attractor constitutes a probability density function across the space, and therefore a probability distribution across the bins of the discretization. We are interested in how the Renyi entropy of the distribution across the bins changes as the discretization changes. This behavior gives us the information dimension of the attractor, which is useful in non-linear time-series analysis, in connection with estimating the embedding dimension, see [21, 63]. It turns out that to accurately measure the information dimension we would like to make accurate estimates of the Renyi entropy for as wide a range of granularities of the discretization as possible. In particular, we would like to make accurate estimates when the discretization is quite fine. In such a regime, the number of counts per bin - i.e., the values tex2html_wrap_inline12947 - will be quite small. Accordingly, we are unavoidably faced with the Ēsmall sample statistics problemē of how to meaningfully perform inference with small samples. This is precisely the regime in which Bayesian techniques excel.



David Wolf
Tue Mar 25 08:11:49 CST 1997