What on Earth is Information (III) – Similarity and Difference

[This is #3 in a series of posts exploring information: what is it, where does it come from, and what does it teach us about ourselves and the reality in which we live. – #1 can be found here]

As we have seen, not all information is equal in character. Particularly, it seems that the kind of information minds transmit using language (i.e. semantic information) differs somehow from the kind of information uncovered in the physical structure of the world (i.e. environmental information). We turn now to some concrete examples.

Let us first consider a representation of environmental information; a quartz crystal. Quartz contains a distinctive atomic pattern defined by how the constituent silicon and oxygen atoms bond with each other (Figure 1).

Figure 1 – lattice structure of quartz

This pattern, determined by the the direction and strength of the atomic bonds, has an information content. The oxygen atoms bond to each silicon atoms to form a specific local shape, four oxygen atoms surrounding a central silicon atom. This same shape repeats over and over again to form the pattern of the atomic structure. If we investigate different crystalline materials at the atomic level, we find differently local shapes – some cubic, some hexagonal, some tetragonal, again repeating to form a pattern. Thus it appears there is some information content in the quartz structure, information which is an ’emergent’ property of the atomic bonding. Change the number, direction and strength of the bonds between the atoms, and different patterns arise. 

But there are significant things to note about the information contained in a crystal structure such as that of quartz.

First, the information is constrained. Because the oxygen and silicon atoms bond in a certain predefined way, the bonding pattern – and thus its associated information – will always be the same when we put these kinds of atoms together under the same conditions (e.g. temperature and pressure). If we think of a crystal structure like a kind of sentence in which each atom is a kind of word, these ‘words’ of the quartz crystal (silicon and oxygen) are locked into a predefined structure by their predetermined bonding affinities (Figure 1). This means that unlike the words of a language, the atoms in the crystal are not free to arrange in any manner. We might say that the quartz is not ‘free’ to ‘say’ anything – it can only say one thing over and over again. It is as though we tried to speak in English, but the word ‘and’ could only ever be followed by “mudflaps”, and the word “I” could only be followed by “stink”, and so on. This kind of specific ‘bonding’ would drastically limit the kinds of information our vocabulary could be used to represent.

Secondly, because the atoms configure in a set way, the information content of a crystal does not grow as the amount of material (i.e. the number of atoms) grows. We could imagine an entire planet made of quartz, but we would find that its information content would be virtually no different to that of the tiniest quartz fragment. After all, the largest structure is simply achieved by repeating the same small pattern (Figure 2) over and over again.

Figure 2 – the tetrahedral repeating unit of quartz

This means we do not get any new information by simply repeating this ‘unit cell’ more and more times: we just get a changing number of repeats. This is the chemical equivalent of a single word or sentence written many times over, like lines written on a chalkboard during a school detention. The amount of words may be huge, but the amount of information is no bigger than the content of the repeated sentence, plus a number representing how many repeats there are. 

So how is the semantic information content of language – the freely arranged patterns of words communicated by minds  –  different on these two points: constraint and content?

First, the information we express using language is highly unconstrained. There are of course a few ordering conventions we follow, but overall, the ability of a language to encode semantic information effectively depends on the vocabulary being freely arrangeable. We do not simply throw words onto a page and watch them bind together in predetermined ways to miraculously form the works of Shakespeare. We do not throw notes onto a stave and watch them repel and attract like a handful of magnets thrown on a tray, miraculously arranging themselves into Bach’s Sonata No. 1 in G Minor. The entire reason language works for encoding semantic content is because the words do not self-arrange. The relations between signs are not fixed. Rather, language is a kind of flexible vessel that can be adapted to hold whatever semantic content intelligent beings freely choose to express. The more language rules we fix to state how words must combine, the less we can express.

On the second question of content, we find that unlike the crystal, the information content of meaningful linguistic communication generally does increase with the increasing length of a text. In a crystal, because the same small unit simply repeats over and over again, there is barely any more information in an infinite volume of material than in a single molecule. But the same is not true of the semantic information we represent using language, as we will now consider.

Assessing information content

Brief reflection reveals that the number of words is not the same thing as the amount of meaningful information.

For instance, consider these two texts:

Text 1:

“What what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what what.”

Text 2:

“What a piece of work is man, How noble in reason, how infinite in faculty, In form and moving how express and admirable, In action how like an angel, In apprehension how like a god, The beauty of the world, The paragon of animals. And yet to me, what is this quintessence of dust? Man delights not me; no, nor Woman neither…”

Both these texts contain exactly 62 words, and both are written in English. Nonetheless, we instinctively know they are different – not just in style or comprehensibility, but in the amount of information they contain. After all, the first text inspires nothing but confusion; the second – an excerpt one of Hamlet’s great soliloquies – plumbs the depth of man’s existential condition.

But the difference is not just qualitative; a sense that there is more meaning in one text than the other. An important and quantifiable difference is that the extract from Hamlet cannot be reduced in length without losing something of significance, whereas the first text can be radically shortened without losing meaningful information. If we replace the first text by the sentence “what repeated 62 times” or even “what x 62”, we end up with far less text than the original (3 or 4 words vs 62), but we have the same amount of information. Every piece of important meaning has been conserved, but it has been expressed in a much more efficient way.

Therefore one way to express the difference in the two texts is in terms of how compressible they are. The more we can reduce the length of a text whilst preserving the information content, the more compressible it is.  The Hamlet text, being conceptually very rich, is not able to be compressed much if at all without losing something. We could glibly summarise Hamlet’s sentiment as “it’s complicated being human”, but we would not legitimately have compressed all the information contained, all the concepts and their relations. Nor would we have preserved the emotive effect of the original.

What this concept of compressibility gives us is a way (though not the only way) of getting a firmer grasp on how to compare different presentations of information. It lets us trim away what is really just repetition or filler, and determine what is the meaningful information content of a text.

Examples from binary

To see this illustrated more clearly, we can look at some examples of information represented in binary form.  Binary is useful as a tool for exploring these questions, because it is perhaps the simplest way of presenting information – for each ‘bit’, or symbolic unit, we can either specify a ‘1’ or a ‘0’. In other words, we have a two-letter vocabulary, the simplest we could possibly have. This makes binary a good ‘sandpit’ for looking at these important questions of compressibility and meaningful information content.

For instance, imagine a sequence of 1s and 0s one million digits long:

101010101010101…

Now let’s ask the question: “how much information does it contain?”.  On the one hand we could say a lot: after all there are a million bits of text here- far too much to comfortably read or write.

But it doesn’t take long to realise we can represent the same information by writing an simple instruction:

“Repeat the sequence ’10’ 500,000 times.”

This instruction contains 39 characters, including spaces. In character length, this is over 25,000 times shorter than the original text. Yet, we could reasonably argue that this short sentence includes all the information that is contained in our  original million-bit sequence.

We could go further, and specify a sequence of bits, of 1s and 0s , which repeat an infinite number of times. We could fill an infinite number of hard drives with the infinite digits of this infinite binary sequence. We could fill so many hard drives with it that the entire universe was filled with nothing by hard drives. In other words we could have a nearly infinite amount of text. But ultimately there would be no more information contained in the universe full of hard drives than what is contained in the simple 9-word sentence:

“The sequence ’10’ repeated an infinite number of times.”

This sentence is a simple example of an algorithm; a set of instructions for producing a result. For instance, the algorithm above completely describes the infinite sentence “1010101010101….”, but does it in a much more compact way.

This means if we want to a measure of how much information we actually have in a given linguistic expression, a good question or test might be the following:

“What is the size of the most compact algorithm that can represent this expression?”

On this measure, the infinite sentence “101010101010101…, though infinitely long in text terms, actually contains very little information! Writing out the whole sentence is therefore a very, very inefficient way of representing the actual information content. It is much better to use the compact form; “the sequence ’10’ repeated an infinite number of times.”

Anyone who has read a particularly waffling essay or article will have experienced this very principle: using more words doesn’t necessarily mean more meaningful content! Indeed, one of the signs of good writing is the ability say what you are trying to say (i.e. express the semantic information) without using any more text than necessary.

Algorithmic compressibility

This concept of algorithmic compressibility is highly relevant to our previously observed difference between semantic information and environmental information.

In considering how to represent environmental information, we have already considered how a huge crystal structure can be ‘compressed’ to a representation which just specifies that smallest repeating unit, plus the number of repeats. In the same way, the data describing other physical structures and processes can often be compressed down to shorter instructions or algorithms. The amount of compression we can achieve provides a measure of how dense the meaningful information content is.

But this is not only the case for static representations  of environmental information, the kind contained in the patterns of crystals and tree rings and the like. There is also an information content inherent to the dynamic behaviour of entities in the universe. The interplay of matter and energy in planetary systems, oceanic currents, tectonic dynamics, atmospheric processes and the like gives rise to measurable data, often most easily expressed as numeric quantities: positions, frequencies, durations, magnitudes, velocities and accelerations. These data, the kind of information collected using sensors, or even recorded in logbooks from naked-eye observation, are a form of information about how shapes, states and positions of things change; they are a form of dynamic environmental data. Of course the ‘things’ in question can be as large as planetary bodies or as small as subatomic particles; the principle is the same: there is information in the way things change. Swing a pendulum and you will find a number – a piece of information – associated with the frequency with which it moves back and forth. Drop a rock and you will find a number – another piece of information – associated with its acceleration towards the centre of the earth. These are of course extremely simplistic examples; at the more complex end, gifted physicists devote whole careers to unpicking and measuring the dynamics of behaviour at the tiny scale of subatomics (think the Large Hadron Collider and Max Planck) and the most massive scale (think radio telescopes and Stephen Hawking).

One of the triumphs of science, particularly since the Enlightenment, has been to discover natural ‘laws’ which take dynamic environmental data and use mathematics to approximate and represent the ‘gist’ of the physical behaviour from which they flow. If one or more equations can be found that let us approximate what a set of data suggest about a given physical phenomenon, the phenomenon can then to a degree be ‘represented’ by the equation rather than by a potentially vast set of measurements. The huge advantage of these mathematical representations (i.e. physical ‘laws’) is to offer a much more compact – compressed – way of saying what is ‘going on’ in the universe: a form which makes it much easier to cut through less relevant aspects, and ask deeper questions which drive understanding forwards.

This process not only makes the information content of a set of data more clear: it also drives deeper discovery, since compressed mathematical representations make ‘gaps’ in our understanding more evident, helping to point towards fertile areas for further exploration.

Examples of this process abound in the history of physics. The Renaissance astronomer Tycho Brahe, for instance, was a systematic recorder of environmental information; over the course of his life he took vast numbers of detailed measurements – data – relating to aspects of planetary motion. The data in his numerous logbooks contained  environmental information, but not in a form which provided a concise, understandable overview of the inter-related motions of the planets he observed. However, using Brahe’s measurements and the methods of mathematics, Johannes Kepler was able to infer three laws of planetary motion. These laws were a clearer way of representing the meaningful insights hidden in the vast number of “raw” measurements Brahe had made. The mathematical representations thus expressed the meaningful information content of a huge quantity of data in a more elegant and compact way. The derivation of the laws was a form of compression, reducing the information content down to the simplest representation which could adequately account for what was ‘going on’ in the data. Thus, such laws, expressed in mathematical form, can, to some extent, stand in for raw data when seeking to explain or represent the environmental information collected by examining physical reality. Of course, a given set of raw environmental data may contain additional meaningful information which is left out of a given set of laws a person may derive from it. The point is that laws which are sensibly derived will generally be a denser – a more compressed –  representation of the data they relate to, containing as much meaningful information as possible in relation to their size.

This process of deducing laws has been of immeasurable value to modern science. Modern scientific studies of phenomena such as atmospheric conditions or oceanic currents can lead to vast amounts of data being collected: countless hard drives full of information charting locations, speeds, pressures, directions and other parameters at maybe millions or billions of points over extended periods of time. But by simulating the measured processes (such as air and water flow and material deformation) using mathematical equations, powerful computing resources can be used to simulate how environmental processes (such as atmospheric conditions or ocean currents) develop and change over time.

We see this process at work in the modern atmospheric simulations used to assist weather forecasting. In a weather model, an aspect of the physical universe (the atmosphere) has to some extent been represented by a compressed representation of information (mathematical equations) which get close enough to what is actually going on to be useful for prediction.

Given infinite time, a set of equations simulating oceanic currents or the weather or cosmological phenomena could generate an infinite amount of virtual measurements. As a thought experiment, we could imagine a fixed set of equations comprising a weather model, and a supercomputer which simulated weather using them, and did so for so long that it filled the universe with hard drives full of result data. But this near-infinite set of results would nonetheless trace back to a finite, fixed set of equations and boundary conditions – in other words, from entities with a non-infinite, and indeed relatively small length. It would be the equivalent of our previous algorithm “repeat the sequence ’10’ and infinite number of times”. 

So, just as an algorithm can compress the representation of static structures, such as the pattern in a crystal, or the text of a repeating pattern of words; equations can be used to compress the representations of dynamic processes, the motions and transformations of energy and matter. Relative to the raw data of environmental measurements, this compression increases the density of meaningful information, eliminating secondary and/or duplicated aspects as far as possible while conserving as much of the essential information as possible. The goal is, in a sense, to retain only the information that cannot be removed without reducing the explanatory and / or conceptual power of the whole.

The upshot of these observations is that we can ask questions of a given piece of information. Compressibility gives us a way of beginning to refine how much unique meaning there is in a piece of text or data with which we are confronted. Algorithmic / mathematical compressibility appears to raise the likelihood that the information is environmental in character. On the other hand, relative incompressibility is a hallmark of semantic information – the kind of information communicated between minds. 

Conclusion

We have seen that the concept of compressibility is a helpful tool in comparing the information content of different texts and data. Environmental data are often highly compressible, being capable of representation by much more compact algorithms and / or equations, which capture much of the meaningful information content in a far more efficient and useful way. But the brief example of Hamlet’s soliloquy suggested that the same cannot necessarily be said of the semantic information we convey from mind to mind using language. Representations of information flowing from minds do not appear to be easily compressible.

This means the issue of the mind is unavoidable in considering what semantic information is and where it comes from. Of course, mind is also relevant to environmental information, since the task of compression is something which requires a mind, or a computer program programmed by a mind. Thus we are drawn to consider the relationship between information and mind, the subject to which we will soon arrive.