# Random thoughts on a gray November noon

This is a revised summary of my talk from the November 29, 2012 Theory Lunch. I confess to having made an obnoxious amount of errors during that presentation, for which I apologize to all the listeners. This version is hopefully correct.

There are at least two ways of defining randomness. One is about randomness of distributions of objects: this is the subject of the theory of probability. There is another one, which is about randomness of single objects, and is the subject of algorithmic information theory. Consider the following infinite sequences:

00000000000000000000000000000000 …
01010101010101010101010101010101 …
01000110110000010100111001011101 …
10111010110101100001101111000001 …

The first sequence does not look random at all: it is just a sequence of zeroes. The second one is only slightly more complicated: just a sequence of pairs 01. The third one seems much more complicated, until one notices that it is the sequence of finite nonempty binary strings in length-lexicographic order: 0, 1, 00, 01, 10, 11, 000, 001, and so on. The last one displays no clear pattern.

We are thus tempted to declare the first three strings, nonrandom: and the last one, random. The underlying logic is that the former have short descriptions, while the latter doesn’t seem to.

The formalization of this approach leads to the definition of the Kolmogorov complexity of a finite string. Kolmogorov complexity is defined according to a given computer, which is considered as a partial function $C$ which, given a program $p$ and an input $y$ which are finite strings on a $Q$-ary alphabet $A$, possibly returns an output $x = C(p,y)$. The Kolmogorov complexity of a string $x$ relative to a string $y$ with respect to a computer $C$ is then defined as the quantity

$K_C(x \mid y) = \min \{ n \geq 0 \mid \exists p \in A^n \,: \, C(p,y) = x \} \;,$

with the usual convention $\min \emptyset = \infty$; we write $K_C(x)$ for $K_C(x \mid \varepsilon)$, where $\varepsilon$ is the empty string. A computer $C$ is said to be prefix-free, or a Chaitin computer, if for every input $y$ the domain of the function $p \mapsto C(p,y)$ is a prefix-free set: in this case, the prefix-free Kolmogorov complexity (also called Chaitin complexity) of $x$ with respect to $y$ and $C$ is indicated by $H_C(x \mid y)$. The following result, proved independently by Andrei Kolmogorov and Ray Solomonoff for ordinary computers and by Chaitin for prefix-free computers, is what makes the entire theory sound.

Invariance Theorem. There exists a (prefix-free) computer $U$ with the following property: for every (prefix-free) computer $C$ there exists a constant $c$ such that, for every program $p$ and input $y$ such that $C(p,y)$ is defined, there exists a program $q$ such that $U(q,y) = C(p,y)$ and $|q| \leq |p| + c$.

A computer $U$ with the property stated in the Invariance Theorem is said to be universal. The Invariance Theorem then states the existence of universal computers, and that the (prefix-free) Kolmogorov complexity according to a universal computer is well defined up to an additive constant. We can thus speak about the (prefix-free) Kolmogorov complexity of a string $x$ with respect to an arbitrary, but fixed, universal (prefix-free) computer $U$: which we then omit from the subscript. As an immediate consequence of the Invariance Theorem, there exist constants $c,c' \geq 0$ such that $K(x) \leq |x| + c$ and $H(x) \leq |x| + 2 \log_Q |x| + c'$ for every finite word $x$.

We can now define random string according to the notion of prefix-free Kolmogorov complexity. We say that $x$ is random if $H(x)$ is maximal among the strings of the same length as $x$. Moreover, if $\Sigma(n) = \max_{x \in A^n} H(x)$ and $H(x) \geq \Sigma(|x|) - m$, we say that $x$ is $m$-random. It is then possible to prove that, if $x$ is $m$-random with $m \leq T - O(\log_Q T)$, then $K(x) \geq |x| - T$.

Another way to look at randomness is to consider statistical properties such as the law of large numbers or the law of the iterated logarithm: then we expect that a random string satisfies every “conceivable” statistical property. This, in turn, raises the question of what should the word “conceivable” mean.

The following idea is due to Per Martin-Löf, and dates back to 1966. Consider a set $U \subseteq \mathbb{N} \cup A^\ast$ with the following two properties:

1. The level sets $U_n = \{ w \in A^\ast \mid (n,w) \in U \}$ satisfy $U_n \supseteq U_{n+1}$ for every $n \in \mathbb{N}$.
2. For every $m,n \in \mathbb{N}$ we have $|A^m \cap U_n| < (Q^{m-n}) / (Q-1)$.

We call such a set $U$ a Martin-Löf test (briefly, M-L test). Recall that $A$ is a $Q$-ary alphabet: the analogy with statistical test is thus that the level set $U_n$ represents the confidence interval at level $1 - \alpha$ with $\alpha = Q^{-n}/(Q-1)$. It is then straightforward to say that a word $w$ passes a M-L test $U$ at level $n$, if $w \not \in U_n$. We thus only have to decide what “conceivable” means, and we do this by requiring $U$ to be recursively enumerable. Then the concepts of Chaitin randomness and Martin-Löf randomness are almost equal in the following sense: almost every Chaitin random finite word is ultimately declared random by every M-L test.

Notably, Martin-Löf tests also have a universality property: there exists a M-L test $V$ which improves all M-L tests at once, that is, for every $U$ there exists $c$ such that $V_{n+c} \subseteq U_n$ for every $n \geq 0$.

The idea of M-L randomness can be adapted to deal with infinite words. However, a plain adaptation is impossible, because just defining an infinite word as random if all its finite prefixes are, means that random infinite words would form a null set. In fact, if $N_0(x;n)$ is number of consecutive zeroes in $x$ from position $n$, then $\limsup_n N_0(x;n) / \log_2 n = 1$ with probability $1$, so that $K(x_{[0:n]}) \leq n - O(\log n)$ for infinitely many $n$: however, it follows from the Invariance Theorem and the previously observed relation between $K(x)$ and $H(x)$ that $K(w) = |w| + O(1)$ when $w$ is Chaitin random.

What is done instead, is to add the further requirement that the tests are sequential, that is, closed by extensions: for every $x \in U^\ast$ and $n \geq 0$, if $x \in U_n$ and $y \in x A^\ast$, then $y \in U_n$ too. We then say that an infinite word $x$ fails a sequential M-L test $U$ if $x \in \bigcap_{n \geq 0} U_n A^\mathbb{N}$: which, because of sequentiality, means that none of the finite prefixes of $x$ pass $U$. Notably, the class of sequential M-L tests also contains a universal element: there exists a sequential M-L test $V$ such that, for every sequential M-L test $U$ there exists a constant $c = c(U)$ such that $U_{n+c} \subseteq V_n$ for every $n \geq 0$. This means that an infinite word passes every sequential M-L test $U$ if and only if it passes the universal sequential M-L test $V$: we call these infinite words random.

Such approach to randomness has another advantage: it can be reformulated in a way that allows extension to more general contexts. To see how, let $w_n$ be the $n$-th word over $A = \{a_0, \ldots, a_{Q-1}\}$ in length-lexicographic order, so that $w_0 = \varepsilon$ is the empty word, $w_1 = a_0$, $w_Q = a_{Q-1}$, $w_{Q+2} = a_0 a_1$, and so on: then $\widetilde{B}_n = w_n A^\mathbb{N}$ is a computable enumeration of a subbase—in fact, a base—of the product topology of $A^\mathbb{N}$. Consider then the enumeration $E$ of finite sets of natural numbers, defined by $i \in E(n)$ if and only if the $i$-th bit of $n$ is 1: the derived sequence defined by $\widetilde{B}'_n = \bigcap_{i \in E(n)} \widetilde{B}_i$ is a computable enumeration of a base of the product topology. Moreover, given two sequences $\mathcal{U} = \{U_i\}_{i \geq 0}$, $\mathcal{V} = \{V_j\}_{j \geq 0}$ of open sets (recall that r.e. sets are open in the product topology) we say that $\mathcal{U}$ is $\mathcal{V}$-computable if there is a recursively enumerable set $M$ such that

$U_i = \bigcup_{\pi(i,j) \in M} V_j \,,$

where $\pi(i,j) = (i+j) (i+j+1) / 2 + j$ is the standard primitive recursive bijection from $\mathbb{N} \times \mathbb{N}$ to $\mathbb{N}$. Finally, let us recall that there is a product measure $\mu_\Pi$ on the Borel $\sigma$-algebra of $A^\mathbb{N}$, defined via the Hahn-Kolmogorov theorem as the only probability measure satisfying $\mu_\Pi(w A^\mathbb{N}) = Q^{-|w|}$ for every $w \in A^\mathbb{N}$.

Theorem. (Hertling and Weihrauch) Let $x \in A^\mathbb{N}$. The following are equivalent.

1. $x$ is M-L random.
2. For every $\widetilde{B}'$-computable sequence $\mathcal{U} = \{U_n\}_{n \geq 0}$ such that $\mu_\Pi(U_n) < 2^{-n}$ for every $n \geq 0$, we have $x \not \in \bigcap_{n \geq 0} U_n$.

Observe that we require $\mu_\Pi(U_n) < 2^{-n}$ even if $A$ is a $Q$-ary alphabet: after all, what we expect from $x$, is to pass the test at some level $1-\alpha$, and we know that if it does, then it does it also at level $1-\beta$ for every $\beta < \alpha$.

Then, given a topological space $X$ together with a computable enumeration $B$ of a subbase of its topology and a computable measure $\mu$ defined on the Borel $\sigma$-algebra, we can reason as before and define the random elements of the randomness space $(X, B, \mu)$ as those that respect the following condition: for every (attention here!) $B'$-computable sequence $\mathcal{U} = \{U_n\}_{n \geq 0}$ of open sets such that $\mu(U_n) < 2^{-n}$ for every $n \geq 0$, we have $x \not \in \bigcap_{n \geq 0} U_n$. Recall that $B'$ is the sequence derived from $B$, as $\widetilde{B}'$ is derived from $\widetilde{B}$ in the case $X = A^\mathbb{N}$. Observe that, as such $\mathcal{U}$‘s are countably many and $\mu$ is a measure, the set $R_X$ of the random elements of $X$ satisfies $\mu(R_X) = 1$. Then we can, for instance, speak about random configurations $c \in A^G$ where $G$ is a group satisfying some additional conditions: but this is going to be the topic of another talk.