Random thoughts on a gray November noon

This is a revised summary of my talk from the November 29, 2012 Theory Lunch. I confess to having made an obnoxious amount of errors during that presentation, for which I apologize to all the listeners. This version is hopefully correct.

There are at least two ways of defining randomness. One is about randomness of distributions of objects: this is the subject of the theory of probability. There is another one, which is about randomness of single objects, and is the subject of algorithmic information theory. Consider the following infinite sequences:

00000000000000000000000000000000 …
01010101010101010101010101010101 …
01000110110000010100111001011101 …
10111010110101100001101111000001 …

The first sequence does not look random at all: it is just a sequence of zeroes. The second one is only slightly more complicated: just a sequence of pairs 01. The third one seems much more complicated, until one notices that it is the sequence of finite nonempty binary strings in length-lexicographic order: 0, 1, 00, 01, 10, 11, 000, 001, and so on. The last one displays no clear pattern.

We are thus tempted to declare the first three strings, nonrandom: and the last one, random. The underlying logic is that the former have short descriptions, while the latter doesn’t seem to.

The formalization of this approach leads to the definition of the Kolmogorov complexity of a finite string. Kolmogorov complexity is defined according to a given computer, which is considered as a partial function C which, given a program p and an input y which are finite strings on a Q-ary alphabet A, possibly returns an output x = C(p,y). The Kolmogorov complexity of a string x relative to a string y with respect to a computer C is then defined as the quantity

K_C(x \mid y) = \min \{ n \geq 0 \mid \exists p \in A^n \,: \, C(p,y) = x \} \;,

with the usual convention \min \emptyset = \infty; we write K_C(x) for K_C(x \mid \varepsilon), where \varepsilon is the empty string. A computer C is said to be prefix-free, or a Chaitin computer, if for every input y the domain of the function p \mapsto C(p,y) is a prefix-free set: in this case, the prefix-free Kolmogorov complexity (also called Chaitin complexity) of x with respect to y and C is indicated by H_C(x \mid y). The following result, proved independently by Andrei Kolmogorov and Ray Solomonoff for ordinary computers and by Chaitin for prefix-free computers, is what makes the entire theory sound.

Invariance Theorem. There exists a (prefix-free) computer U with the following property: for every (prefix-free) computer C there exists a constant c such that, for every program p and input y such that C(p,y) is defined, there exists a program q such that U(q,y) = C(p,y) and |q| \leq |p| + c.

A computer U with the property stated in the Invariance Theorem is said to be universal. The Invariance Theorem then states the existence of universal computers, and that the (prefix-free) Kolmogorov complexity according to a universal computer is well defined up to an additive constant. We can thus speak about the (prefix-free) Kolmogorov complexity of a string x with respect to an arbitrary, but fixed, universal (prefix-free) computer U: which we then omit from the subscript. As an immediate consequence of the Invariance Theorem, there exist constants c,c' \geq 0 such that K(x) \leq |x| + c and H(x) \leq |x| + 2 \log_Q |x| + c' for every finite word x.

We can now define random string according to the notion of prefix-free Kolmogorov complexity. We say that x is random if H(x) is maximal among the strings of the same length as x. Moreover, if \Sigma(n) = \max_{x \in A^n} H(x) and H(x) \geq \Sigma(|x|) - m, we say that x is m-random. It is then possible to prove that, if x is m-random with m \leq T - O(\log_Q T), then K(x) \geq |x| - T.

Another way to look at randomness is to consider statistical properties such as the law of large numbers or the law of the iterated logarithm: then we expect that a random string satisfies every “conceivable” statistical property. This, in turn, raises the question of what should the word “conceivable” mean.

The following idea is due to Per Martin-Löf, and dates back to 1966. Consider a set U \subseteq \mathbb{N} \cup A^\ast with the following two properties:

  1. The level sets U_n = \{ w \in A^\ast \mid (n,w) \in U \} satisfy U_n \supseteq U_{n+1} for every n \in \mathbb{N}.
  2. For every m,n \in \mathbb{N} we have |A^m \cap U_n| < (Q^{m-n}) / (Q-1).

We call such a set U a Martin-Löf test (briefly, M-L test). Recall that A is a Q-ary alphabet: the analogy with statistical test is thus that the level set U_n represents the confidence interval at level 1 - \alpha with \alpha = Q^{-n}/(Q-1). It is then straightforward to say that a word w passes a M-L test U at level n, if w \not \in U_n. We thus only have to decide what “conceivable” means, and we do this by requiring U to be recursively enumerable. Then the concepts of Chaitin randomness and Martin-Löf randomness are almost equal in the following sense: almost every Chaitin random finite word is ultimately declared random by every M-L test.

Notably, Martin-Löf tests also have a universality property: there exists a M-L test V which improves all M-L tests at once, that is, for every U there exists c such that V_{n+c} \subseteq U_n for every n \geq 0.

The idea of M-L randomness can be adapted to deal with infinite words. However, a plain adaptation is impossible, because just defining an infinite word as random if all its finite prefixes are, means that random infinite words would form a null set. In fact, if N_0(x;n) is number of consecutive zeroes in x from position n, then \limsup_n N_0(x;n) / \log_2 n = 1 with probability 1, so that K(x_{[0:n]}) \leq n - O(\log n) for infinitely many n: however, it follows from the Invariance Theorem and the previously observed relation between K(x) and H(x) that K(w) = |w| + O(1) when w is Chaitin random.

What is done instead, is to add the further requirement that the tests are sequential, that is, closed by extensions: for every x \in U^\ast and n \geq 0, if x \in U_n and y \in x A^\ast, then y \in U_n too. We then say that an infinite word x fails a sequential M-L test U if x \in \bigcap_{n \geq 0} U_n A^\mathbb{N}: which, because of sequentiality, means that none of the finite prefixes of x pass U. Notably, the class of sequential M-L tests also contains a universal element: there exists a sequential M-L test V such that, for every sequential M-L test U there exists a constant c = c(U) such that U_{n+c} \subseteq V_n for every n \geq 0. This means that an infinite word passes every sequential M-L test U if and only if it passes the universal sequential M-L test V: we call these infinite words random.

Such approach to randomness has another advantage: it can be reformulated in a way that allows extension to more general contexts. To see how, let w_n be the n-th word over A = \{a_0, \ldots, a_{Q-1}\} in length-lexicographic order, so that w_0 = \varepsilon is the empty word, w_1 = a_0, w_Q = a_{Q-1}, w_{Q+2} = a_0 a_1, and so on: then \widetilde{B}_n = w_n A^\mathbb{N} is a computable enumeration of a subbase—in fact, a base—of the product topology of A^\mathbb{N}. Consider then the enumeration E of finite sets of natural numbers, defined by i \in E(n) if and only if the i-th bit of n is 1: the derived sequence defined by \widetilde{B}'_n = \bigcap_{i \in E(n)} \widetilde{B}_i is a computable enumeration of a base of the product topology. Moreover, given two sequences \mathcal{U} = \{U_i\}_{i \geq 0}, \mathcal{V} = \{V_j\}_{j \geq 0} of open sets (recall that r.e. sets are open in the product topology) we say that \mathcal{U} is \mathcal{V}-computable if there is a recursively enumerable set M such that

U_i = \bigcup_{\pi(i,j) \in M} V_j \,,

where \pi(i,j) = (i+j) (i+j+1) / 2 + j is the standard primitive recursive bijection from \mathbb{N} \times \mathbb{N} to \mathbb{N}. Finally, let us recall that there is a product measure \mu_\Pi on the Borel \sigma-algebra of A^\mathbb{N}, defined via the Hahn-Kolmogorov theorem as the only probability measure satisfying \mu_\Pi(w A^\mathbb{N}) = Q^{-|w|} for every w \in A^\mathbb{N}.

Theorem. (Hertling and Weihrauch) Let x \in A^\mathbb{N}. The following are equivalent.

  1. x is M-L random.
  2. For every \widetilde{B}'-computable sequence \mathcal{U} = \{U_n\}_{n \geq 0} such that \mu_\Pi(U_n) < 2^{-n} for every n \geq 0, we have x \not \in \bigcap_{n \geq 0} U_n.

Observe that we require \mu_\Pi(U_n) < 2^{-n} even if A is a Q-ary alphabet: after all, what we expect from x, is to pass the test at some level 1-\alpha, and we know that if it does, then it does it also at level 1-\beta for every \beta < \alpha.

Then, given a topological space X together with a computable enumeration B of a subbase of its topology and a computable measure \mu defined on the Borel \sigma-algebra, we can reason as before and define the random elements of the randomness space (X, B, \mu) as those that respect the following condition: for every (attention here!) B'-computable sequence \mathcal{U} = \{U_n\}_{n \geq 0} of open sets such that \mu(U_n) < 2^{-n} for every n \geq 0, we have x \not \in \bigcap_{n \geq 0} U_n. Recall that B' is the sequence derived from B, as \widetilde{B}' is derived from \widetilde{B} in the case X = A^\mathbb{N}. Observe that, as such \mathcal{U}‘s are countably many and \mu is a measure, the set R_X of the random elements of X satisfies \mu(R_X) = 1. Then we can, for instance, speak about random configurations c \in A^G where G is a group satisfying some additional conditions: but this is going to be the topic of another talk.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s