The Unimaginable Mathematics of Borges' Library of Babel Read online

Page 5


  TWO

  Information Theory

  Cataloging the Collection

  It is a very sad thing that nowadays there is so little useless information.

  —Oscar Wilde, "A Few Maxims for the Instruction of the Over-Educated"

  INFORMATION THEORY IS ONE OF THE YOUNGEST fields in mathematics, essentially born in 1948 when Claude Shannon published "A Mathematical Theory of Communication." As a discipline, it is still unfolding, still crystallizing into a way to analyze and interpret the world. For the purposes of this book, we'll say that information theory is the study of the compression and communication of complex information. We consider each book in the Library to be a complex piece of information, and our inquiry takes the form of investigating how a catalogue of the Library might encode information about the content and location of books. Since the story was written while Borges was tasked with cataloguing the collection of the Miguel Cane Municipal Library, questions of this nature may have taken on rich significance for him.

  Typically, a library catalogue card, either physical or virtual, contains two distinct kinds of information. The first sort uniquely specifies a book in such a way that a reader with partial or incomplete information still might identify the book: a title, author, edition, publisher, city of publication, year of publication, and short description of the contents generally appear on a card and prove sufficient. An ISBN also uniquely specifies a book, but probably isn't much use in finding a book if we remember only a few digits of the number.

  The second type of information uniquely specifies a location in the library, although additional knowledge is usually required. For example, under most systems of cataloguing currently in use, the call numbers, in addition to uniquely specifying a book, include an abundance of letters and digits, often interspersed with decimal points. If one does not know, say, where the PQ books are shelved, the information is degraded. Even if the books were arranged alphabetically by author or title, for a large collection we'd still need to know in what general region to begin our search. By analogy, many dictionaries have thumbnail indentations which enable readers quickly to find a section of words beginning with one or several letters. Both of these categories of information are problematic for the Library of Babel.

  A form a catalogue might take in principle is: Book (identifiers), Hexagon (location), Shelf (only 20 per hexagon), Position on Shelf (only 32 books per shelf). Perhaps surprisingly, self-referentiality is not a problem. A volume of the catalogue, say the tenth, residing in Hexagon 39, Shelf 20, Position 14, could well be marked on the spine "Catalogue Volume Ten," and correctly describe itself as the tenth volume of the catalogue and specify its location in Position 14, on Shelf 20, in Hexagon 39: there is no paradox. However, beginning with the obvious, here are some of the difficulties that arise.

  Clearly, the Library holds far too many books to be listed in one volume; any catalogue would necessarily consist of a vast number of volumes, which, perversely, are apt to be scattered throughout the Library. Indeed, reminiscent of the approach of another ofBorges' stories, "The Approach to Al-Mu'tasim," and of the lines in "The Library of Babel,"

  To locate book A, first consult book B, which tells where book A can be found; to locate book B, first consult book C, and so on, to infinity. . . .

  an immortal librarian trying to track down a specific book likely has a better chance by making an orderly search of the entire library, rather than finding a true catalogue entry for the book. Every plausible entry from any plausible candidate catalogue volume would have to be tracked down, including regressive scavenger hunts. An immortal librarian would spend a lot of time traversing the Library, ping-ponging back and forth between different books purporting to be volumes of a true catalogue.

  After revealing the nature of the Library, the librarian notes that contained in the Library are "the faithful catalog of the Library, thousands and thousands of false catalogues, the proof of the falsity of those false catalogs, a proof of the falsity of the true catalogue..." This, then, is the second problem of any catalogue: the only way to verify its faithfulness would be to look up each book. Furthermore, the likelihood of any book being located within a distance walkable within the life span of a mortal librarian is, to all intents and purposes, zero. Sadly, even if we were fortunate enough to possess a true catalogue entry for our Vindication, presumably our Vindication would merely give details of the death we encountered while spending our life walking in a fruitless attempt to obtain the Vindication. (Recall in "The Library of Babel," Borges describes Vindications as "books of apology and prophecy which vindicated for all time the acts of every man in the universe and retained prodigious arcana for his future.")

  Let's consider the first category of information found on library cards, that which uniquely specifies the book. Authorship is moot. One might argue that the God(s), or the Builder(s) of the Library, is (are) the author(s) of any book. One might also make a claim that the author is an algorithm embodied in a very short computer program which would, given time and resources, generate all possible variations of 25 orthographic symbols in strings of length 1,312,000. One could make the Borgesian argument that One Man is the author of all books.

  For that matter, the writer Pierre Menard, a quixotic character in Borges' story "The Don Quixote of Pierre Menard," may as well be credited with authorship of all the books in the Library.

  Certainly there are many, many books whose first page resembles the one in figure 4. How many such books? Specifying one page means that 80 symbols for each of 40 lines are "frozen." This means that out of the 1,312,000 symbols of a book, the first 3,200 are taken, leaving 1,308,800 spaces to fill. By the work of the "Combinatorics" chapter, there are thus precisely 251,308,800 books with a first page exactly the same as the depicted title page. (Using logarithms as in the first Math Aftermath, this number is seen to be approximately 101,829,623 books.) Viewed from a complementary angle, there are 253,200 possible first pages, and although significantly smaller than the numbers we've been contemplating, it is yet another enormous number. The chance of randomly selecting a book with this particular first page is "only" 1 in 253,200, approximately 104,474, which means, essentially, that it will never happen. For comparison's sake, the chance of a single ticket winning a lottery is better than 1 in 100,000,000 = 108. So finding such a book is equivalent to winning the lottery more than 559 times in a row. (In the equation below, each factor of 108 signifies winning the lottery once.)

  As a source of useful information for a catalogue entry, a title on the spine of a book, such as The Plaster Cramp, is similarly moot, for there must still be something like

  distinct books with the exact same orthographic symbols on the spine.

  Edition, publisher, city of publication, year of publication—all are meaningless in this Library. The one sort of information we mentioned that may possibly prove useful is that of a short description of the contents of the book. We'll take "short" to mean "half-page or less." It's much more difficult to say what we mean by "description." We'll take it to mean "something that significantly narrows the possible contents of the book." For example, "The book is utter gibberish, completely random nonsense," doesn't significantly narrow the possible contents of the book. (We are aware that this definition is problematic.)

  Any book published in the last 500 years likely has a short, reasonably limiting description. A book whose contents consist of the letters MCV repeated over and over evidently has a short description. A book whose entire contents are similar to the 80-symbol line

  unmenneo .ernreiuht.naper,utuytgn or fgioe,no,e,dn .roih senoi.,erg n cprih npp

  almost certainly doesn't have a short description. Or does it? A fascinating area of study in the field of information theory concerns the difficulty of deciding whether or not a line such as the one above has some sort of algorithmic description that is shorter than the line itself. Borges seems to have an intimation of this when he writes "There is no combination of characters one can make�
��dhcmrlchtdj, for example—that the divine Library has not foreseen and that in one or more of its secret tongues does not hide a terrible significance." Perhaps "hr,ns llrteee" is a more concise description of the line, or perhaps a succinct translation into English is "Call me Ishmael."

  It does no good to excerpt a passage as a short description; titanic numbers of books in the Library will contain the same passage. In an important sense, then, for all languages currently known by human beings, for the cataclysmic majority of books in the Library, the only possible description of the book is the book itself. This, in turn, leads to a lovely, inescapable, unimagined conclusion:

  The Library is its own catalogue.

  Let's restrict the investigation to a slightly more agreeable collection of books: all those whose entire contents cohere and are recognizably in English, and whose first page contains precisely a short title and a half-page description, both of which accurately reflect the contents. Any rule of selection will have problems. Some associated with this one are: What does it mean to "cohere"? Would a collection of essays on different topics constitute a coherent work? Would sections of James Joyce's infamous novel Finnegans Wake register as "recognizably English"? What if the book contains a non-English word, such as "ficciones"? What if the title, as in the case of Ulysses, is more allusive than descriptive? Can any description "accurately reflect" the contents of a book? Regretfully, we'll ignore these and other legitimate, interesting concerns.

  For example, suppose the first page of a volume of the Library began with the following description, modified slightly from the back cover of the 2002 Routledge Press edition of Wittgenstein's Tractatus Logico-Philosophicus.

  Tractatus Logico-Philosophicus by Ludwig Wittgenstein

  Perhaps the most important work of philosophy written in the twentieth century, Tractatus Logico-Philosophicus was the only philosophical work that Ludwig Wittgenstein published during his lifetime. Written in short, carefully numbered paragraphs of extreme brilliance, it captured the imagination of a generation of philosophers. For Wittgenstein, logic was something we use to conquer a reality which is in itself both elusive and unobtainable. He famously summarized the book in the following words, "What can be said at all can be said clearly, and what we cannot talk about we must pass over in silence."

  If next came the precise contents of the book, including Bertrand Russell's introduction, followed by the appropriate number of pages consisting of nothing but blanks, then that Library volume would be included in the collection. We are also willing to include books longer than 410 pages, so long as the title page includes reference to an appropriate volume number. This allows, among other things, for the inclusion of this Catalogue of Books in English into the putative catalogue we are trying to define, which we may as well call Books in English.

  This amenable collection of books is designed to enable Books in English to include a title and short accurate description of the contents. This nearly accomplishes the first half of the task of a catalogue; although the books aren't uniquely specified, the scope of possibility is greatly constricted. However, the other half of a catalogue, that of specifying a location, is also fraught with difficulties.

  First, and most strongly emphasized by Borges, is the apparent lack of organization in the distribution of books. It is possible that there is an overarching pattern, but even ifthere is, it would be impossible to deduce it from local information. The librarian's "elegant hope" that the Library is (truly) infinite and periodic would provide a godlike observer with a kind of an order for each book; every particular book would have an infinite number of exact copies—unimaginably distant from each other— and these infinite copies would constitute a set of regularly spaced three-dimensional lattice points. But this pattern does not serve our needs.

  Finite or infinite, the problem of identifying individual hexagons of the Library is insurmountable. If the Library is a 3-sphere or any of the other spaces described in the chapter "Topology and Cosmology," the number of hexagons is finite. However, since each hexagon holds 640 books, which is approximately 25 2,007 books, more than 251,311,997 (approximately 101,834,095) hexagons are required to hold all the Library's books. This means that if one were to attempt to write out a number for each hexagon in our familiar base-10 notation, it would take 1,834,095 digits. Now each book in the Library has exactly 1,312,000 slots to fill, and, moreover, the orthographic symbols contain no (recognizable) digits. Writing a number out in words usually uses many more precious slots; for example,

  [one million, eight hundred thirty four thousand, and ninety five] versus 1,834,095.

  The bracketed expression takes 63 spaces, while the second needs only nine. For almost every hexagon in the Library, a volume of a hypothetical Books in English catalogue could not actually contain the corresponding hexagon number where a book is shelved.

  Trying to circumvent this problem, one may observe that many numbers have shorter expressions, such as 24,781, and legitimately wonder if every integer might have a remarkably condensed form. An insuperable problem is that there are many such condensed expressions, including the one above, that need a computer to calculate. More disturbing, though, is an example of a condensed verbal description of a "small" number— only 100 digits—that even we, using networked supercomputers, would be unable to find:

  The median of the prime numbers expressible in 100 digits.

  Thus, even if the catalogue entry for the Tractatus Logico-Philosophicus listed the location as

  Hexagon: the median of the prime numbers expressible in one hundred digits.

  Shelf: four.

  Position: seventeen.

  the information is as useless to us as it is to a librarian. (See the Math Aftermath "Numb and Number (Theory)" for more discussion about prime numbers and, in particular, why we are unable to determine the median of the prime numbers expressible in 100 digits.)

  Usually, outside of computer science, we use base 10 to represent the positive integers, meaning we use the 10 symbols {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} to label numbers. In these circumstances, though, one might try using a higher base than 10 for the integers, such as base 25, to number the hexagons. There are two problems associated with this: first, it would still take all but two slots of a book to list a hexagon number, which suffices to invalidate the usefulness. Second, since each book contains only 25 orthographic symbols, each such symbol would have to stand for a digit. So, if one were to write out the hexagon number in base-25 digits, it would usually look like complete gibberish. (In fact, it also leads to an unpleasant, yet valid, interpretation of the Library: it is the complete listing of all base-25 numbers comprised of exactly 1,312,000 digits.) At any rate, such a book would not be "recognizably English"; thus it would not itself be listed in Books in English.

  What if, like Ireneo Funes, from Borges' celebrated short story "Funes the Memorious," we resolved to work in base 24,000? It would do no good: in the story, for each number up to 24,000 Funes created his own signifier, for example, names such as Brimstone, Clubs, and The Whale. In the Library, we are stuck with 25 orthographic symbols. Instead of combining 10 digits in various ways to fill five places to make a number between 1 and 24,000, we would need to combine the 25 symbols in a minimum of four places to distinguish 24,000 separate numbers, because

  254 = 390,625

  while

  253 = 15,625

  which doesn't provide enough distinct signifiers to take us up to base 24,000. Anyway, not only wouldn't this convention save much space, it also leads back to the previous dilemma: writing out the names of the numbers will result in waterfalls of gibberish.

  Finally, a potential catalogue entry might take a different tack. It might give coordinates, such as, "Go up ninety-seven floors, move diagonally left four thousand hexagons, and then move diagonally right another two hundred and twenty." Although this might, at first blush, seem appealing, the same sorts of problems arise, for most hexagons are unimaginably far away. The example provided abo
ve works simply because the numbers involved—97, 4,000, and 220—are so miniscule, so accessible. The Library is neither.

  The Library is its own catalogue. Any other catalogue is unthinkable.

  Math Aftermath: Numb and Number (Theory)

  A metaphysician is one who, when you remark that twice two makes four, demands to know what you mean by twice, what by two, what by makes, and what by four. For asking such questions metaphysicians are supported in oriental luxury in the universities, and respected as educated and intelligent men.

  —H. L. Mencken, A Mencken Chrestomathy

  Below are two outgrowths from the sprawling yet spare field of number theory; together they form a pair of relatively straightforward mathematical confections. Both revolve around using prime numbers decisively to reach interesting conclusions.

  Consider the 251,312,000 distinct volumes in the Library: a simple rethinking of this number will produce a result surely unimagined by Borges. Now, as we all know, the number 25 factors into 55, so

  A prime number is a positive integer greater than one that is divisible only by itself and by one. The unique factorization theorem, proved by Euclid in The Elements, says that every positive integer is decomposable into exactly one product of primes, each of which is raised to a power greater than or equal to one. For example, we all know that 100 = 10 10, and it's also true that 100 = 4 25. So, what is 100 equal to, 10 10 or 4 25? Of course you're laughing at us, because 100 is obviously equal to both products. Neither of these answers, though, is written exactly as a product of primes, in which each prime is raised to a power greater than or equal to one. Based on the two factorizations—10 10 and 4 25—it's easy to see that