- Claude Shannon, the "father of information science"
- Word pairs as a tool for analyzing linguistic style
- But is that computer actually saying anything?, or, philosophical considerations on computer-generated text

Shannon's master's thesis (MIT, 1937) investigated the use of electrical circuits to model logical statements. Shannon was one of the few people of his day familiar with both electrical engineering and the mathematical logic of Boolean algebra. He showed how electrical switches could be used to carry out calculations and detailed instructional procedures, thus foreshadowing the electronic computer.

After MIT, Shannon joined Bell Telephones in 1941 as a research mathematician. In 1949, with the strong urging of his colleagues, he published *A Mathematical Theory of Communication* with Warren Weaver. This book (probably inspired by his wartime work in cryptography) gave the first mathematical underpinning to the study of communications.

One of Shannon's major discoveries is that information has a very precise relationship to *entropy*. Entropy - a measure of the disorder of a system, and before Shannon used only in classical thermodynamics - can be seen as a *lack of information* about a system. This relationship implies that disordered systems can be "cleaned up" by using information about the system. It also shows that information about the world is never "free"; every bit of information gathered causes a tiny but definite disordering of the system under investigation.

Shannon also showed that much of any "real world" communication is taken up with *redundancy*. (Shall I repeat that in different words?) He analyzed a huge number of communications, from code transmissions to telephone conversations to James Joyce novels, in order to understand the relationship between a) the message intended for transmission and b) the redundant information tacked on to ensure that the message is understood correctly. Redundancy is crucial for clear communication in a "noisy" environment, but when noise is low the redundancy can be stripped out and the message highly compressed (as PKZIP users can attest).

In this connection, Shannon examined the frequency of word correlations in the English language. Pairs of words which often appear together (for instance, "gutless" and "wonder") show a higher degree of redundancy than less common pairs (perhaps "elegant" and "quicksand"). Shannon showed that a randomly generated string of words could sound remarkably like meaningful English, so long as each word had a high correlation with the word before it. The resemblance to English is even greater if the nonsense string is generated using word triplets, rather than word pairs. The Shannonizer uses this effect to analyze texts and mimic their style using word pairs

Not surprisingly, perhaps, there's a fair amount of material about Claude Shannon in the Web. The Shannon links cover the gamut from the serious to the seriously silly. You'll learn that Shannon's grandfather was reputed to have invented the washing machine, and that Shannon himself invented motorized pogo sticks and chess-playing robots. A lifetime juggler, Shannon even devised a unicycle designed for riding while juggling, not to mention a tiny clockwork circus where three clowns juggle tiny clubs, balls and rings simultaneously... more grist for the Shannon legend in the 22nd century, no doubt.

Claude Shannon actually achieved an effect like this without a computer. Taking a novel, he picked a first word at random, then found the *next* place in the novel where this word appeared. He added the *following *word to the generated text, then repeated the process until he had a fair-sized sentence such as:

The head and in frontal attack on an English writer that the character of this point is therefore another method for the letters that the time of who ever told the problem for an unexpected.

(This is the kind of geeky fun that Shannon seemed to have all the time. He got his friends to join in to all sorts of odd, semi-surreal word games like this and somehow turned the results into ground-breaking research. He also managed to retire at the age of fifty. It's enough to make you want to take up juggling.)

Presumably, the mimicry would be even more accurate if we analyzed word *triplets* instead of just word *pairs.* This would take up a good deal more computing time. I also suspect that such a "third-order" approximation might make the Shannonizer a little too accurate to be amusing!

[What do you think?] . . [Help!]