Information is ubiquitous. Information is carried in physical systems by physical objects, and these objects are consumed by the users of that information. Each and every living system makes use of information to ensure its survival and to continue its existence, from finding food to reproducing to finding destructive infiltrating agents (the immune system), information is presented and acted upon. More speculatively, making use of information is perhaps fundamentally what separates life from non-life. Intrinsic to the use of information is the need to be able to recognize the message being sent by reducing the information in the physical object presented to the relevant message. This is a very general statement. To be more specific, in what ways can a reduced description of a system suffice to bring the relevant information to the consumer of that information?
In this dissertation the question of how information is carried in a physical system is examined. The systems studied here are simple, as are all systems which have a presentable analysis. The point of view that the states of the physical system of interest may be treated probabilistically, that there is an underlying distribution which describes the probability that a particular state occurs, is taken thoroughly. Certainly the thermodynamic systems studied here are treated on this basis, but more generally whenever such a distribution exists and is known, or is learnable, the methods of this work apply. Thus throughout everything starts with these distributions.
The relationships between the full distribution and reductions of the full distribution are of prime interest here, for it is the reductions of the full distribution that represent the process of ignoring, perhaps relevant, information. In making the reduction it is important to know just what is being lost, and it is similarly important to know when a reduction is satisfactory and when it is not.
To begin with a simple but important example (see a similar example in [84]), restrict attention to a typewriter symbol set, and suppose that three magnetic tapes are presented. Unknown to the consumer of the tapes, one contains a poem, one contains random noise, but the symbols occur with the same frequencies as those of the poem, and one, the blank tape, contains nothing but one symbol, zero, repeated forever. What is to be done to distinguish them? This is analogous perhaps to a researcher examining a biological system in which there are many long strands of DNA. In some of the cases the strands carry useful information, and in others there is nothing but random bases joined together. In others still there may be long regions of repeated bases which are there only because some enzyme went out of control at some point in the evolution of this genome. What is to be done to recognize the difference between these strands? What is to be done to distinguish useful DNA from non-useful DNA? In the case of the first two tapes, the poem and the noise tapes, both have the same first order entropies. But the second order entropy of the poem is probably lower than that of the noise. And so on. At very high orders though, the poem probably has very low entropy, with the next unseen symbol being nearly predictable. Thus at high orders and seen in terms of the entropies, it appears similar to the third tape, the blank tape. The blank tape has zero entropy at all orders. Clearly, understanding the entropy of a system at just one order is not enough to make a clear assessment of the content. This example makes it clear that in order to understand a system, it is important to carefully consider the various reductions of that system and their relationships. But no claim that once this is done a useful distinction is possible can be made. This analysis has left all of coding theory and image compression out of the picture; indeed this is not the focus of the effort here. The important issues to be understood here are those surrounding the various reductions. In any system of any large degree of complexity, summary statistics like entropy cannot be expected to provide a clear picture of the system, certainly they do not provide the information present in the full distribution, but for the purposes of engineering or for mapping out relevant biological mechanisms this is necessarily what the researcher must depend upon.
In order to quantify the information in a distribution the entropy is available, see chapter 2. The development here is uniquely centered on counting, with every effort made to show explicitly how counting states and entropy are related. The development of the entropy and conditional entropy is made in this fashion. From this immediately follows the mutual information. The notions of redundancy and relevancy are introduced and several competing definitions from the literature are noted. Several theorems about the entropy are concisely given, see section 2.2, and made use of throughout the work. In particular, the reduced entropy relationship and the reduced entropy per degree of freedom relationship are of immense value.
In order to quantify the goodness of a reduction of the distribution there is an interesting set of techniques arising from many body theory in statistical mechanics which are centered on the notion of partitioning. These includes methods for approximating the moment generating function, the cumulant expansion, see sections 3.1 and 3.3, and approximation of the distribution function itself, the Ursell expansion, see section 3.5, and the cluster expansion, see section 3.8. The usual moment functions, correlation functions, and the cumulants are made explicit. It is important to know when a set of random variables is telling you something new that some subset of the set could not. For this the information correlation functions, see section 3.12, are introduced. In many ways these functions are the proper generalization of the entropy and mutual information, and generally the notion of the amount of information available between a set of random variables. These functions have been noted for possible use in corners of the statistical mechanics community, especially with the fluid theorists [26] and plasma physicists [39, 84]. In this dissertation they are employed as tools for making the behavior of systems explicit. It is hoped that by presenting the tools of many body physics clearly and concisely, and by providing interpretations for the information correlation functions, the various communities that could make use of these tools will do so - especially the neural networks and machine learning communities - for these techniques indeed present powerful methods for function approximation, function learning, representation theory, etc. That all of these methods are linked together by the central concept of partitioning is the key result of chapter 3. There the linked cluster theorem, cluster expansions, the Ursell development and the cumulant expansion are tied together by one theorem about partitioned structures, see section 3.8. Clearly this approach provides for a rigorous description of many variable interaction and representation, awaiting numerous applications as the techniques become better known outside the theoretical physics community.
Chapter 4 continues with an analysis of an extremely simple statistical mechanical system, the bit string system, where correlations arise not because of some interesting structure in some Hamiltonian, but because there is a simple constraint involved. This is analogous to the situation in quantum mechanics, where a type of symmetry, the symmetries that must be obeyed by fermions and bosons, induces correlations in systems of these particles. The bit string example has an interesting twist - for non-physical values of bits, fractional bits, interesting discontinuities in the entropies occur, see section 4.9. The bit string system is shown equivalent to a lattice hard sphere gas, where in the non-lattice case there is a hypothesized phase transition.
Information is carried in physical objects described by probability distributions. These distributions change in time. Chapter 5 on the classical equations of information flow discusses the time evolution of these distributions, and thus the information that they carry. For Hamiltonian systems, this is the Liouville equation, see section 5.1. The reduction of the full probability distribution also is generally changing in time. The BBGKY hierarchy, see section 5.2 describes the time evolution of the reduced distribution function in two parts, the evolution of the reduced system due to itself, and the evolution of the reduced system due to the rest of the system. The presentation here focuses on the structure of the separation of the reduced system and the rest of the system, applicable to general flows, with the actual Hamiltonian flow description included later. This makes the presentation much simpler to follow than the usual textbook study. It is perhaps somewhat surprising that taking a maximally uninformative rest-of-system makes the entropy change of the reduced system zero, see section 5.5.
Models play an important role in the presentation here. Previously mention was made of the study of the simple bit string system with constraint induced correlations. In chapter 6 the Ising coupled spin system is discussed. Expressions for the entropies of collections of neighboring spins are given in closed form in section 6.8 and shown graphically. The presentation shows clearly how the various reduced entropies, moments, and information correlation functions indicate areas of the phase space of interest, in particular the antiferromagnetic coupling case shows more structure than the ferromagnetic coupling case. Entropies for collections of non-neighboring spins are also given in closed form, see section 6.17. It is interesting to note that the reduced entropies per spin are ordered, see sections 6.15 and 6.17. Also shown (section 6.18) is that the information correlation functions are given by the mutual information between the first and last spins in the set considered. This indicates that all of the information available between a set of bounded spins is actually available from the endpoint spins alone, although this result is tied to the interpretation of the information correlation functions, see sections 3.13 and 3.14. Several other topological reductions of the information correlation functions are mentioned, leading to the observation that the information correlation functions may be used to explore many aspects of the redundancy structure of a set of random variables.
Quantum systems are of interest because the world is fundamentally quantum in nature, and because this is the technological frontier for the foreseeable future in computing. Quantum systems are described by density matrices. In chapter 7 the density matrix is described in terms of a state density function. The evolution of this density function is discussed, see section 7.2, giving rise to the quantum Liouville equation (coordinate dependent description). Analogous to the classical case, there is a quantum BBGKY hierarchy, see section 7.3, which breaks the evolution into two parts, the reduced system and the rest of the system. The Heisenberg and pure-state time development equations are discussed here too.
The quantum Heisenberg model of coupled spin-1/2 particles is studied in chapter 8. Here also is presented a brief outline of the classification of spin systems. Entropy and all of the other system functions mentioned so far are presented for the quantum Heisenberg model. The interesting behavior occurs in the antiferromagnetic coupling case, as it did in the Ising system, and is indicated by the structure of the high order information correlations, entropies and other system functions. Again, we find that it is important to consider the high order nature of these systems in order to understand them. Phase transitions between the dominant states are observed at many values of the external field applied to the system, depending on the number of spins in the system. The relationship between the intrinsic entropy of the system (the entropy of the density matrix) and the measurement entropy of the system is made explicit. One key result is that the measurement entropy is greater than or equal to the intrinsic entropy, see section 8.8. In potential applications, an important quantity to consider is the mutual information between a measurement, and an unmeasured but perhaps more desirable operator, see section 8.9. Because the operators for the energy eigenstates and the measurement eigenstates need not commute, the mutual information here is a time ordered (filter ordered) mutual information. Similarly, in more complex scenarios, the information correlation functions are generally time ordered information correlation functions. The mutual information spoken of here is the mutual information between the measured and unmeasured observables.
Information comes via measurements, and by nature only a finite number of these are available. Chapter 9 is devoted to the extraction of information from finite data, and it is here that we come full circle and complete the picture. Information is carried by a physical system, the system is measured by the observer, and the relevant aspects of the system are then inferred. This is the basis of all pattern recognition, learning, and prediction. It is important to not only infer the value of a quantity, but to also infer the uncertainty that is left after some finite number of data are seen, to know when to stop trying to obtain more data, and because a single value representing a guess at the underlying system is almost completely irrelevant unless it is also known how good that guess is. Here the tools for making these guesses and presenting their uncertainties are developed completely. The chapter starts off with a discussion about learning the values of parameters determining a system. It is shown how, sometimes, more data can lead to greater uncertainty about the parameters, but that the uncertainty about the parameters decreases on the average. The exact amount of decrease is quantified, see section 9.2. Inferring the values of the moments of the entropy is the next problem solved. The expressions give closed forms for the inference of the value of the entropy, its uncertainty, and all of the other moments of the entropy. The presentation is quick, via the presentation of several theorems which demonstrate how to do the integrations needing to be done, see section 9.3.8. The results for the entropy appear in section 9.3.9. The results for the estimators for moments, correlations and cumulants appear in section 9.3.11. After the entropy and moment inference tools are developed, the tools needed to express the inferences of the moments of the mutual information, chi-squared, and covariance are presented. Again this is done quickly by way of many theorems, see section 9.3.13, with results in section 9.3.14. The appendices for this chapter present the mathematics needed to make the development of this section both rigorous and simple, and should serve as a useful reference to many in similar work to follow.
Derivatives of the entropy in statistical mechanical systems at thermodynamic equilibrium are the subject of chapter 10. It is well known that these may be related to dynamical power spectra and the equations of classical thermodynamics, see section 10.2. There is a similar relationship between the derivatives of the reduced entropy and correlations of the energy and the reduced distribution function, see section 10.4. That the reduced entropies are ordered was shown in chapter 2. That the derivatives of the reduced entropies are not is shown in section 10.5.
In summary, throughout this work a great deal of attention is paid to the nature of the high order entropies, information functions, and cumulant functions, and how they indicate interesting behavior in physical systems, the content of chapters 4, 6 and 8. Also, a great deal of attention is paid to how to infer the underlying physics of a system from actual measurements made on the system, the content of chapter 9. Many things of interest remain to be done yet, and many interesting ideas need to be explored. Especially intriguing is the possibility of using clusters of magnetic spins in the design of quantum computers, manipulating external fields to set the information of the internal state, how information is placed within these systems being analyzed using tools like the time ordered mutual information of section 8.9. Application of the techniques presented here, especially those of chapters 3 and 9 to the fields of biological system modelling, machine learning, image processing, and pattern recognition are also to come. Not to mention the usefulness of these ideas in such nontraditional fields for physical scientists such as economic analysis. There is doubtless a great deal of pattern recognition work yet to be done in recognizing and understanding the mechanisms of life, and the analysis presented in this work forms an ideal basis for such investigations. This work will serve as a reference and creative motivator for those involved in such work, and apart from the other results presented inside, this is perhaps its most important role.
A brief note on the notation used within. Distribution functions may
be denoted by P, p or
,
with the tendency to be that P or p are used for discrete
distributions while
is used for continuous distributions. Logarithms in the sections where
graphical results are presented are always base e. The coupling
strengths in the chapters on the spin systems are referred to as
ferromagnetic
when they favor alignment of the spins, or as antiferromagnetic
when they favor anti-alignment, in keeping with the usage in [24,
25,
44].
The parameter
always appears as b in the labels of the
axes of graphs.