Search:

# C.E. Shannon: A Mathematical Theory Of Communication

## The Bell System Technical Journal, Vol. 27 pp. 379-423, 623-656, July, October 1948.

This paper can be accessed online here.

This paper rationalised earlier work by Nyquist and Hartely, and initiated the area now known as communication theory. In particular, Shannon derived a measure of the information content of signals, which has become known as Shannon Entropy. However, much later work, particularly on Mutual Information registration, has ignored Shannon's warning that the entropy becomes scale-dependent when applied to continuous functions.

Shannon entropy is perfectly valid when applied to physically meaningful discrete probabilities, such as those of symbols in a signal drawn from a discrete alphabet, as such probabilities incorporate a well-defined scale parameter. In fact, Shannon Entropy is identical to likelihood in those circumstances.

Problems arise when entropy is applied to continuous distributions, which lack a defined scale, or discrete probabilities (such as those derived from histograms) produced by integrating a continuous distribution in the absence of a well-defined scale. The entropy becomes scale-dependent, and is therefore just a relative measure that cannot be used in processes such as optimisation. TINA Memos 2004-001 and 2004-005 contain examples of this effect.

There are now several families of entropy measures: TINA memo 2004-004 discusses a few of them. In general, the same conclusion can be drawn for these measures. The only information measure that I am aware of that is consistent with quantitative statistics in the general case is the Fisher Information: the inverse of the Cramer-Rao bound, and so the maximum amount of information that a single data point can provide about the parameters of some assumed model in a likelihood framework.

The take-home message is this: when you swap between continuous and discrete probabilities you implicitly assume a certain scale, as the later are definite integrals of the former. This scale must be properly defined if the probabilities are to be compared in any meaningful way. One way to enforce such a well-defined scale is to work in the equal variance domain: TINA Memo 2004-005 gives further details.

PAB 21/2/2005

Page last modified on March 05, 2012, at 03:51 PM