First of all, let us recall what the Church-Turing thesis is, and what it is not. Its statement, as reported by the Stanford Encyclopedia of Philosophy, goes as follows:

A function of positive integers is effectively calculable only if recursive.

Here, for a calculation procedure to be “effective” means the following:

- it has a finite description;
- it always returns the correct output, given any valid input;
- it can be “carried on by pencil and paper” by a human being; and
- it requires no insight or ingenuity on the human’s behalf.

One model of effective procedures is given by the recursive functions; another one, by the functions computable by Turing machines; a third one, by the functions which are representable in Church’s -calculus. Alan Turing and Stephen Cole Kleene proved that the three classes coincide: thus, in ordinary practice, the Church-Turing thesis is often stated with “Turing-computable” in place of “recursive”.

The class of Turing machines has the advantage of containing a *universal element*: a special Turing machine and an encoding from the set of Turing machines to the set of natural numbers exists such that, when the special Turing machine is provided the encoding of an arbitrary Turing machine and a valid input for the latter, it will return the value of the encoded Turing machine on the provided input.

Now that we have written down what the Church-Turing thesis is, we can examine Akl’s theorem.

In his 2005 paper, Akl defines a universal computer as a system having the following features:

- It has means of communicating with the outside world, so to receive input, and where to send its output.
- It can perform every elementary arithmetic and logic operations.
- It can be programmed, according to the two previous rules.
- It has unlimited memory to use for input, output, and temporary values.
- It can only execute finitely many operations (evaluating input, producing output, performing an elementary operation, etc.) at each time step.
- It can simulate any computation performed by any other model of computation.

The statement of the theorem, which does not appear explicitly in the original paper but is written down in the one from 2015 which clarifies the idea and addresses criticism, is hereby reported *verbatim*:

**Nonuniversality in Computation Theorem (NCT):** No computer is universal if it is capable of exactly operations during time unit of computation, where is a positive integer, and is finite and fixed once and for all.

The main argument is that no such computer can perform a computation which requires more than operations at some time . Explicit examples happen in parallel computation, a field Akl is a master of, where the number of operations that can be performed in a time unit grows linearly with the number of processors: for instance, reading values in input can be done in time by a parallel machine with processors, but not by any machine with processors.

Such requirement, however, does not appear in the notion of universality at the base of the original, and actual, Church-Turing thesis. There, to “simulate” a machine or algorithm means to be able of always reproducing the *same output* of the algorithm, given *any valid input* for it, *up to an encoding* of the input and the output. But no hypothesis on *how* the output is achieved from the input is made: a simulation in *linear time*, such that each step of the simulated algorithm is reproduced by exactly operations of the Turing machine, is as good as one where the simulation of the th step takes operations from the Turing machine, or where no such regularity appears.

Among the (counter)examples provided by Akl are:

- Computations with time-varying variables.
- Computations with time-varying computational complexity.
- Computations whose complexity depends on their placement on a schedule.
- Computations with interacting variables, e.g., states of entangled electrons.
- Computations with uncertain time constraints.

None of these, however, respect the definition of computation from the model of recursive functions: where the values of the variables are given once and for all, and can possibly change for recursive calls, but not for the original call. They can be seen as instances of *unconventional* models of computation: but by doing this, one changes the very notion of computation, which ceases to be the one at the basis of the Church-Turing thesis.

So my guess is that Akl’s statement about the falsity of the Church-Turing thesis actually falls in the following category, as reported in the humorous list by Dana Angluin:

*Proof by semantic shift:* Some standard but inconvenient definitions are changed for the statement of the result.

Actually, if we go back to Akl’s definition of a universal computer, it appears to be fine until the very last: the first two points agree with the definition of effective computation at the basis of the actual Church-Turing thesis, the next three are features of any universal Turing machine. The problem comes from the last point, which has at least two weak spots: the first one being that it does not define precisely what a model of computation is, which can be accepted as Akl is talking of unconventional computation, and it is wiser to be open to other possibilities. But there is a more serious one, in that it is not clear

*what does the expression “to simulate” mean*.

Note that the Stanford Encyclopedia of Philosophy reports the following variant of the Church-Turing thesis, attributed to David Deutsch:

Every finitely realizable physical system can be perfectly simulated by a universal model computing machine operating by finite means.

Deutsch’s thesis, however, does not coincide with the Church-Turing thesis! (This, notwithstanding Deutsch’s statement that “[t]his formulation is both better defined and more physical than Turing’s own way of expressing it”.) Plus, there is another serious ambiguity, which is of the same kind as the one in Akl’s definition:

*what is “perfectly simulated” supposed to mean?*

Does it mean that every single step performed by the system can be reproduced in real time? In this case, Akl is perfectly right in disproving it *under the constraint of boundedly many operations at each time unit*. Or does it mean that the simulation of each elementary step of the process (e.g., one performed in a quantum of time) ends with the correct result if the correct initial conditions are given? In this case, the requirement to reproduce exactly what happens between the reading of the input and the writing of the output is null and void.

Worse still, there is a vulgarized form of the Church-Turing thesis, which is reported by Akl himself on page 172 of his 2005 paper!, and goes as follows:

Any computable function can be computed on a Turing machine.

If one calls *that* “the Church-Turing thesis”, then Akl’s NCT is absolutely correct in disproving it. But *that* is not the actual Church-Turing thesis! It is actually a rewording of what in the Stanford Encyclopedia of Philosophy is called “Thesis M”, and explicitly stated not to be equivalent to the original Church-Turing thesis—and also false. Again, the careful reader will have noticed that, in the statement above, being “computable by a Turing machine” is a well defined property, but “computable” *tout court* definitely not so.

At the end of this discussion, my thesis is that Akl’s proof is correct, but NCT’s consequences and interpretation might not be what Akl means, or (inclusive disjunction) what his critics understand. As for my personal interpretation of NCT, here it goes:

*No computer which is able to perform a predefinite, finite number of operations at each finite time step, is universal across all the different models of computation, where the word “computation” may be taken in a different meaning than that of the Church-Turing thesis.*

Is mine an interpretation by semantic shift? Discussion is welcome.

References:

- Selim G. Akl. The Myth of Universal Computation. Parallel Numerics ’05, 167–192.
- Selim G. Akl. Nonuniversality explained.
*International Journal of Parallel, Emergent and Distributed Systems***31:3**, 201–219. doi:10.1080/17445760.2015.1079321 - The Church-Turing Thesis. Stanford Encyclopedia of Philosophy. First published January 8, 1997; substantive revision August 19, 2002. http://plato.stanford.edu/entries/church-turing/
- Dana Angluin’s List of Proof Techniques. http://www.cs.northwestern.edu/~riesbeck/proofs.html

]]>

We consider languages made of symbols that represents either objects, or functions, or relations: in particular, unary relations, or equivalently, sets. A *sentence* on such a language is a finite sequence of symbols from the language and from the standard logical connectives and quantifiers ( for conjunction, for disjunction, for negation, etc.) according to the usual rules, such that every variable is bounded by some quantifier. A *first-order* sentence only has quantifiers on objects, while a *second-order* sentence can have quantifiers on functions and relations (in particular, sets) as well.

For example, the set is made of the following first-order sentences on the language :

Of course, second-order logic is much more expressive than first order logic. The natural question is: *how much*?

The answer is: possibly, *too much* more than we would like.

To discuss how it is so, we recall the notion of *model*. Informally, a model of a set of sentences is a “world” where all the sentences in the set are true. For instance, the set of natural numbers with the usual zero, successor, addition, multiplication, and ordering is a model of . A model for a set of sentences is also a model for every *theorem* of that set, *i.e.*, every sentence that can be derived in finitely many steps from those of the given set by applying the standard rules of logic.

For sets of first-order sentences, the following four results are standard:

**Compactness theorem.** (Tarski and Mal’tsev) Given a set of first-order sentences, if every finite subset of has a model, then has a model.

**Upwards Löwenheim-Skolem theorem.** If a set of first-order sentences has a model of infinite cardinality , then it also has models of every cardinality .

**Downwards Löwenheim-Skolem theorem.** If a set of first-order sentences on a finite or countable language has a model, then it also has a finite or countable model.

**Completeness theorem.** (Gödel) Given a set of first-order sentences, if a first-order sentence is true in every model of , then is a theorem of .

All of these facts fail for second-order theories. Let us see how:

We start by considering the following second-order sentence:

**Lemma 1.** The sentence is true in a model if and only if the universe of is at most countable.

The informal reason is that intuitively means:

the universe is a monoid on a single generator

Let us now consider the following second-order sentence:

**Lemma 2.** The sentence is true in a model if and only if the universe of is infinite.

The informal reason is that intuitively means:

the universe contains a copy of the natural numbers

**Theorem 1.** Both Löwenheim-Skolem theorems fail for sets of second-order sentences.

*Proof.* only has countably infinite models. only has uncountably infinite models.

Let us now consider the set of all the sentences of together with the following second-order sentence:

Clearly, is the induction principle: which is an axiom in second-order Peano arithmetics, but only an *axiom scheme* in first-order PA.

**Lemma 3.** Every model of is isomorphic to the set of natural numbers with zero, successor, addition, multiplication, and ordering.

The informal reason is that , though finite, is powerful enough to tell numbers from each other: therefore, in every model of , each numeral (th iteration of the successor, starting from ) can be denoted by at most one item in the universe of the model. On the other hand, is powerful enough to reconstruct every numeral.

**Theorem 2.** The compactness theorem fails for sets of second-order sentences.

*Proof.* Let be a constant outside the language of . Consider the set made of all the sentences from and all the sentences of the form . Then every finite subset of has a model, which can be obtained from the set of natural numbers by interpreting as some number strictly greater than all of the values such that . However, a model of is also a model of , and must be isomorphic to the set of natural numbers: but no interpretation of the constant is possible within such model.

We can now prove

**Theorem 3.** The completeness theorem does not hold for second-order sentences.

In other words, *second-order logic is semantically inadequate*: it is not true anymore that all “inequivocably true” sentences are theorems. The proof will be based on the following two facts:

**Fact 1.** (Gödel) The set of the first-order formulas which are true in *every* model of is recursively enumerable.

**Fact 2.** (Tarski) The set of first-order formulas which are true in *is not* recursively enumerable.

Fact 1 is actually a consequence of the completeness theorem: the set of first-order formulas which are true in every model of is the same as the set of first-order sentences that are provable from , and that set is recursively enumerable by producing every possible proof! To prove Theorem 3 it will thus be sufficient to prove that Fact 1 does not hold for second-order sentences.

*Proof of Theorem 3.* We identify with the conjunction of all its formulas, which are finitely many.

Let be a first-order sentence in the language of . Because of what we saw while discussing the compactness theorem, is true in if and only if it is true in every model of : this, in turn, is the same as saying that is true in every model of . Indeed, let be a model of : if is isomorphic to , then is true in if and only if is true in ; if is not isomorphic to , then is false in , which makes true in . This holds whatever is.

Fix a Gödel numbering for sentences. There exists a recursive function that, for every sentence , transforms the Gödel number of the first-order sentence into the Gödel number of the second-order sentence .

Suppose now, for the sake of contradiction, that the set of second-order sentences that are true in every model of is recursively enumerable. Then we could get a recursive enumeration of the set of first-order sentences which are true in the standard model of by taking the Gödel number of such a sentence , turning it into that of via the aforementioned recursive function, and feeding the latter number to the semialgorithm for second-order sentences that are true in every model of . But because of Tarski’s result, no such recursive enumeration exists.

Bibliography:

- George S. Boolos et al. Computability and Logic. Fifth Edition. Cambridge University Press, 2007

]]>

Given a base , consider the base- writing of the nonnegative integer

where each is an integer between and . *The Cantor base-* writing of is obtained by iteratively applying the base- writing to the exponents as well, until the only values appearing are integers between and . For example, for and , we have

and also

Given a nonnegative integer , consider the *Goodstein sequence* defined for by putting , and by constructing from as follows:

- Take the Cantor base- representation of .
- Convert each into , getting a new number.
- If the value obtained at the previous point is positive, then subtract from it.

(This is called the*woodworm’s trick*.)

**Goodstein’s theorem.** Whatever the initial value , the Goodstein sequence ultimately reaches the value in finitely many steps.

Goodstein’s proof relies on the use of ordinal arithmetic. Recall the definition: an ordinal number is an equivalence class of well-ordered sets modulo *order isomorphisms*, *i.e.*, order-preserving bijections.Observe that such order isomorphism between well-ordered sets, if it exists, is unique: if and are well-ordered sets, and are two distinct order isomorphisms, then either or has a minimum , which cannot correspond to any element of .

An interval in a well-ordered set is a subset of the form .

**Fact 1.** Given any two well-ordered sets, either they are order-isomorphic, or one of them is order-isomorphic to an initial interval of the other.

In particular, every ordinal is order-isomorphic to the interval .

All ordinal numbers can be obtained via von Neumann’s classification:

- The zero ordinal is , which is trivially well-ordered as it has no nonempty subsets.
- A successor ordinal is an ordinal of the form , with every object in being smaller than in .

For instance, can be seen as . - A limit ordinal is a nonzero ordinal which is not a successor. Such ordinal must be the least upper bound of the collection of all the ordinals below it.

For instance, the smallest transfinite ordinal is the limit of the collection of the finite ordinals.

Observe that, with this convention, each ordinal is an element of every ordinal strictly greater than itself.

**Fact 2.** Every set of ordinal numbers is well-ordered with respect to the relation: if and only if .

Operations between ordinal numbers are defined as follows: (up to order isomorphisms)

- is a copy of followed by a copy of , with every object in being strictly smaller than any object in .

If and are finite ordinals, then has the intuitive meaning. On the other hand, , as a copy of followed by a copy of is order-isomorphic to : but is strictly larger than , as the latter is an initial interval of the former. - is a stack of copies of , with each object in each layer being strictly smaller than any object of any layer above.

If and are finite ordinals, then has the intuitive meaning. On the other hand, is a stack of copies of , which is order-isomorphic to : but is a stack of copies of , which is order-isomorphic to . - is if , if is the successor of , and the least upper bound of the ordinals of the form with if is a limit ordinal.

If and are finite ordinals, then has the intuitive meaning. On the other hand, is the least upper bound of all the ordinals of the form where is a finite ordinal, which is precisely : but .

*Proof of Goodstein’s theorem:* To each integer value we associate an ordinal number by replacing each (which, let’s not forget, is the base is written in) with . For example, if , then

and (which, incidentally, equals ) so that

We notice that, in our example, , but : why is it so?, and is it just a case, or is there a rule behind this?

At each step where , consider the writing . Three cases are possible:

- .

Then , as , and . - and .

Then for a transfinite ordinal , and . - and .

Then for some , and is a number whose th digit in base is zero: correspondingly, the rightmost term in will be replaced by a smaller ordinal in .

It is then clear that the sequence is strictly decreasing. But the collection of all ordinals not larger than is a well-ordered set, and *every nonincreasing sequence in a well-ordered set is ultimately constant*: hence, there must be a value such that . But the only way it can be so is when : in turn, the only option for to be zero, is that is zero as well. This proves the theorem.

So why is it that Goodstein’s theorem is not provable in the first order Peano arithmetics? The intuitive reason, is that the exponentiations can be arbitrarily many, which requires having available all the ordinals up to

, times , times:

this, however, is impossible if induction only allows finitely many steps, as it is the case for first-order Peano arithmetics. A full discussion of a counterexample, however, would greatly exceed the scope of this post.

]]>

]]>

Let us recall the basic notions. In a *game in normal form* we have:

- A set of
*players*. - A set of
*strategies*for each player. - A collection of
*utility functions*which associate to each*strategic profile*a real number, such that is the*utility*player gets from the strategic profile .

A *Nash equilibrium* for a game in normal form is a strategic profile such that, for every player and every strategy feasible for player , it is the case that . We had seen that not every finite game in normal form admits a pure strategy Nash equilibrium: so, we introduced randomization.

A *mixed strategy* for player is a probability distribution . If is finite, this is the same as assigning values for . A *mixed strategy profile* is a collection of mixed strategies for each player. A *mixed strategy Nash equilibrium* is a mixed strategy profile such that, for every player and every mixed strategy feasible for player ,

.

The idea behind Nash’s proof goes as follows. If the game is finite, then a mixed strategy for player is identified with a point of

therefore, mixed strategy profiles can be identified with points of

which is compact and convex as all of its components are. Mixed strategy Nash equilibria are those points of where each pure strategy , , , is used in the most efficient way: by relaxing the condition and allowing a small “slack” with respect to such most efficient way, it is possible to define a continuous transformation of mixed strategy profiles into mixed strategy profiles, which will have a fixed point because of the Brouwer fixed-point theorem. By gradually reducing the slack, a mixed strategy Nash equilibrium is found as a limit point of such approximations.

Suppose player has available the pure strategies for . Let be an arbitrary mixed strategy profile and be an arbitrary integer. Consider the following quantities:

- .
- .
- .
- .

Given , the sum is bounded from below by , hence the functions

are continuous and nonnegative and satisfy whatever and are. As a consequence, the functions

that is,

are continuous transformations of into itself. Let be a fixed point of , whose existence is ensured by the Brouwer fixed-point theorem: as is compact, the sequence has a limit point .

Suppose, for the sake of contradiction, that is not a mixed strategy Nash equilibrium. Then there must be a player and a mixed strategy such that . The only way this may happen, is that some *pure* strategy is used *suboptimally* by , that is,

Choose and so that:

- belongs to a subsequence converging to .
- .
- .
- .

Points 2 and 3 tells us that is strictly smaller than : this, together with point 4, yields , thus . But is a fixed point for , so

:

and as may be taken arbitrarily large and be made arbitrarily small, we must conclude that too. This is a contradiction.

]]>

To introduce this idea, together with other basic game-theoretic notions, we resort to some examples. Here goes the first one:

Alice and Bob are planning an evening at the cinema. Alice would like to watch the romantic movie, while Bob would like to watch the action movie. Neither of them likes much the other’s favored movie: however, should they split, the sadness for being alone would be so big, that neither of them would enjoy his or her movie!

This is the kind of situation modeled by a *game in normal form*, where we have:

- A set of
*players*. - A set of
*strategies*for each player. - A collection of
*utility functions*which associate to each*strategic profile*a real number, such that is the*utility*player gets from the strategic profile .

In the case of Alice and Bob, this may be summarized with a table such as the following:

Romantic | Action | |

Romantic | ||

Action |

Such tables represent games in normal form between two players, where the *rows* of the table are labeled with the strategies suitable for the *first* player, and the *columns* of the table are labeled with the strategies suitable for the *second* player: the entries of the table indicate the values of the utility functions when the first player plays the corresponding row and the second player plays the corresponding column. When we want to emphasize the role of player in contrast to the others, we write as , and talk about the strategy of player *given* the strategic profile of the other players.

Suppose that Alice is the first player, and Bob is the second player: then the table tells us that, if they both choose the romantic movie, Alice will enjoy it a lot (utility value ) and Bob not very much (utility value ). However, if Bob *defects* from this strategic profile and goes watch the action movie, he will ultimately not enjoy it, because he will be sad for not being together with Alice—which was the entire point about organizing the evening at the movies!

Let us consider another game (a rather serious one indeed) where the players are a lion and a gazelle. The lion wants to catch the gazelle; the gazelle wants to avoid being caught by the lion. To do this, they may choose between being on the move, or staying more or less in the same place. It turns out, from observation in the field, that the table for the lion-and-gazelle situation is similar to the one below:

Move | Stay | |

Move | ||

Stay |

We observe that, for the lion, the most profitable strategy is to move. Indeed, if the gazelle moves, then the utility for the lion is if he moves, which is more than the he gets if he stays; on the other hand, if the gazelle stays, then the utility for the lion is if he moves, which is more than the he gets if he stays. A strategy such as this, which always gives the *best* possible result *independently* of the other players’ strategies, is called a *dominant strategy*. Such strategies are indeed quite rare: indeed, neither Alice nor Bob from the previous game had a dominant strategy, nor has the gazelle here, as they can maximize their own profit only by choosing the *same* strategy as the other player.

So, what if we relax the requirement, and just demand that every player chooses the most favorable strategy, *given* the strategies of the other players? This is the basic intuition under the concept of Nash equilibrium, formalized and studied by John Nash in his 1950 doctoral thesis.

**Definition 1.** A *Nash equilibrium* for a game in normal form is a strategic profile such that, for every player and every strategy feasible for player , it is the case that .

The situation when both the lion and the gazelle are on the move, is a Nash equilibrium: and is the only Nash equilibrium in the corresponding game. (By definition, every dominant strategy enters every Nash equilibrium.) The situation when both Alice and Bob go watch the romantic movie, is a Nash equilibrium: and so is the one when they go watch the action movie.

So, does every game have a Nash equilibrium?

Actually, no.

Indeed, suppose that the predator and the prey, instead of being large mammals such as the lion and the gazelle, are small insects such as a dragonfly and a mosquito. It then turns out, after careful observation, that the table for the predator-prey game gets more similar to the following:

Move | Stay | |

Move | ||

Stay |

In this situation, the dragonfly maximizes its utility if it does the *same* as the mosquito. In turn, however, the mosquito maximizes its own utility if it does the *opposite* than the dragonfly! In such a situation there can be no such thing as a Nash equilibrium as defined above.

Where determinism fails, however, randomization may help.

**Definition 2.** A *mixed strategy* for the player in a game in normal form is a probability distribution on the space of the strategies for player . A *mixed strategy profile* is a collection of mixed strategies for each player.

For example, the dragonfly might decide to move with probability , and stay still with probability ; similarly, the mosquito might decide to move with probability , and stay still with probability .

With mixed strategies, the important value for player to take into account is the *expected utility given the strategic profile*

which we may write when we want to emphasize the role of player .

Now, suppose that the dragonfly decides to set its own paramenter so that its expected utility does not change if the mosquito decides to move or to stay: this corresponds to the dragonfly maximizing its expected utility, given the mixed strategy of the mosquito. Our table tells us that this corresponds to

which has solution . In turn, if the mosquito sets its own parameter so that its own expected utility does not change if the dragonfly decides to move or stay, then

which has solution . The situation where the dragonfly moves with probability and the mosquito moves with probability is a situation none of the two insects has any advantage to change on its own part, given the choice of the other.

**Definition 3.** A *mixed strategy Nash equilibrium* for a game in normal form is a mixed strategy profile such that, for every player and every mixed strategy feasible for player , it is the case that .

And here comes Nash’s great result:

**Nash’s theorem.** Every game in normal form that allows at most finitely many *pure* strategic profiles admits at least one, *possibly mixed* strategy, Nash equilibrium.

It is actually sufficient to prove Nash’s theorem (as he did in his doctoral thesis) when there are only many players, and each of them only has finitely many pure strategies: such limitation is only apparent, because the condition that pure strategy profiles are finitely many means that all players have finitely many pure strategies, and at most finitely many of them have more than one.

The idea of the proof, which we might go through in a future Theory Lunch talk, goes as follows:

- Identify the space of mixed strategic profiles with a compact and convex set for suitable .
- For define a family of continuous transformations .
- By the Brouwer fixed-point theorem, for every there exists a mixed strategic profile such that .
- As is compact, the sequence has a limit point .
- By supposing that is not a mixed strategy Nash equilibrium we reach a contradiction.

We remark that Nash equilibria are not optimal solutions: they are, at most, *lesser evils for everyone given the circumstances*. To better explain this we illustrate a classic problem in decision theory, called the *prisoner’s dilemma*. The police has arrested two people, who are suspects in a bank robbery: however, the only evidence is about carrying firearms without license, which is a minor crime leading to a sentence of one year, compared to the ten years for bank robbery. So, while interrogating each suspect, they propose a bargain: if the person will testify against the other person for bank robbery, the police will drop the charges for carrying firearms without license. The table for the prisoner’s dilemma thus has the following form:

Quiet | Speak | |

Quiet | ||

Speak |

Then the situation where both suspects testify against each other is the only pure strategy Nash equilibrium: however, it is very far from being optimal…

]]>

I wrote a post on this on my blog on cellular automata.

Link: http://anotherblogonca.wordpress.com/2014/05/15/random-settings-in-cellular-automata-machines/

]]>

Consider the space of bounded sequences of real numbers, together with the supremum norm. We would like to define a notion of limit which holds for *every* and satisfies the well known properties of standard limit:

*Linearity:*.*Homogeneity:*.*Monotonicity:*if for every then .*Nontriviality:*if for every then .*Consistency:*if the limit exists in the classical sense, then the two notions coincide.

The consistency condition is reasonable also because it avoids trivial cases: if we fix and we define the limit of the sequence as the value , then the first four properties are satisfied.

Let us recall the classical definition of limit: we say that converges to if and only if, for every , the set of values such that is *cofinite*, *i.e.*, has a finite complement: the inequality can be satisfied at most for finitely many values of . The family of cofinite subsets of (in fact, of any set ) has the following properties:

*Upper closure:*if and then .*Meet stability:*if then .

A family of subsets of with the two properties above is called a *filter* on . An immediate example is the *trivial filter* ; another example is the *improper filter* . The family of cofinite subset of is called the *Fréchet filter* on . The Fréchet filter is not the improper one if and only if is infinite.

An *ultrafilter* on is a filter on satisfying the following additional conditions:

*Properness:*.*Maximality:*for every , either or .

For example, if , then is an ultrafilter on , called the *principal ultrafilter* generated by . Observe that : if we say that is *free*. These are, in fact, the only two options.

**Lemma 1.** For a proper filter to be an ultrafilter, it is necessary and sufficient that it satisfies the following condition: for every and nonempty , if then for at least one .

*Proof:* It is sufficient to prove the thesis with . If with , then is a proper filter that properly contains . If the condition is satisfied, for every which is neither nor we have , thus either or .

**Theorem 1.** Every nonprincipal ultrafilter is free. In addition, an ultrafilter is free if and only if it extends the Fréchet filter. In particular, every ultrafilter over a finite set is principal.

*Proof:* Let be a nonprincipal ultrafilter. Let : then , so either there exists such that and , or there exists such that and . In the first case, ; in the second case, we consider and reduce to the first case. As is arbitrary, is free.

Now, for every the set belongs to but not to : therefore, no principal ultrafilter extends the Fréchet filter. On the other hand, if is an ultrafilter, is finite, and , then by maximality, hence for some because of Lemma 1, thus cannot be a free ultrafilter.

So it seems that free ultrafilters are the right thing to consider when trying to expand the concept of limit. There is an issue, though: we have not seen any single example of a free ultrafilter; in fact, we do not even (yet) know whether free ultrafilters do exist! The answer to this problem comes, in a shamelessly nonconstructive way, from the following

**Ultrafilter lemma.** Every proper filter can be extended to an ultrafilter.

The ultrafilter lemma, together with Theorem 1, implies the existence of free ultrafilters on every infinite set, and in particular on . On the other hand, to prove the ultrafilter lemma the Axiom of Choice is required, in the form of Zorn’s lemma. Before giving such proof, we recall that a family of sets has the *finite intersection property* if every finite subfamily has a nonempty intersection: every proper filter has the finite intersection property.

*Proof of the ultrafilter lemma.* Let be a proper filter on and let be the family of the collections of subsets of that extend and have the finite intersection property, ordered by inclusion. Let be a totally ordered subfamily of : then extends and has the finite intersection property, because for every finitely many there exists by construction such that .

By Zorn’s lemma, has a maximal element , which surely satisfies and . If and , then still has the finite intersection property, therefore by maximality. If then still has the finite intersection property, therefore again by maximality.

Suppose, for the sake of contradiction, that there exists such that and : then neither nor have the finite intersection property, hence there exist such that . But means , and means : therefore,

against having the finite intersection property.

We are now ready to expand the idea of limit. Let be a metric space and let be an ultrafilter on : we say that is the *ultralimit* of the sequence along if for every the set

belongs to . (Observe how, in the standard definition of limit, the above set is required to belong to the Fréchet filter.) If this is the case, we write

Ultralimits, if they exist, are unique and satisfy our first four conditions. Moreover, the choice of a principal ultrafilter corresponds to the trivial definition . So, what about free ultrafilters?

**Theorem 2.** Every bounded sequence of real numbers has an ultralimit along every free ultrafilter on .

*Proof:* It is not restrictive to suppose for every . Let be an arbitrary, but fixed, free ultrafilter on . We will construct a sequence of closed intervals , , such that and for every . By the Cantor intersection theorem it will be : we will then show that .

Let . Let be either or , chosen according to the following criterion: . If both halves satisfy the criterion, then we just choose one once and for all. We iterate the procedure by always choosing as one of the two halves of such that .

Let . Let , and let be so large that : then , thus . As the smaller set belongs to , so does the larger one.

We have thus almost achieved our original target: a notion of limit which applies to every bounded sequence of real numbers. Such notion will depend on the specific free ultrafilter we choose: but it is already very reassuring that such a notion exists at all! To complete our job we need one more check: we have to be sure that the definition is consistent with the classical one. And this is indeed what happens!

**Theorem 3.** Let be a sequence of real numbers and let . Then in the classical sense if and only if for every free ultrafilter on .

To prove Theorem 3 we make use of an auxiliary result, which is of interest by itself.

**Lemma 2.** Let be the family of collections of subsets of that have the finite intersection property. The maximal elements of are precisely the ultrafilters.

*Proof:* Every ultrafilter is clearly maximal in . If is maximal in , then it is clearly proper and upper closed, and we can reason as in the proof of the ultrafilter lemma to show that it is actually an ultrafilter.

*Proof of Theorem 3:* Suppose does not converge to in the classical sense. Fix such that the set is infinite. Then the family has the finite intersection property: an ultrafilter that extends must be free. Then , and does not have an ultralimit along .

The converse implication follows from the classical definition of limit, together with the very notion of free ultrafilter.

Theorem 3 does hold for sequences of real numbers, but does not extend to arbitrary metric spaces. In fact, the following holds, which we state without proving.

**Theorem 4.** Let be a metric space. The following are equivalent.

- For some free ultrafilter on , every sequence in has an ultralimit along .
- For every free ultrafilter on , every sequence in has an ultralimit along .
- is compact.

Ultrafilters are useful in many other contexts. For instance, they are used to construct *hyperreal numbers*, which in turn allow a rigorous definition of infinitesimals and the foundation of calculus over those. But this might be the topic for another Theory Lunch talk.

]]>

In Agda, proofs and programs are the same thing, types are sets are propositions. These sets contain values of that type, or equally they contain proofs of that proposition.

Agda is a very expressive language so very little is built in and most things can be defined in a library. We will define some of those things now to see how they work.

module talk where

Natural numbers (0,1,2,3,…) can be seen as being defined by either `zero`

or the successor (+1) `suc`

of another natural number.

data Nat : Set where zero : Nat suc : Nat → Nat

Addition `_+_`

takes two natural numbers and returns another. It can be defined by recursion (induction) on the first argument. If it is `zero`

we return `n`

and if it is successor we can make a recursive call to `(m + n)`

and apply the successor to the result. The recursive call (inductive hypothesis) is valid as `m`

is structurally smaller than `suc m`

.

_+_ : Nat → Nat → Nat zero + n = n suc m + n = suc (m + n)

Having defined one function we now want to prove something about it. One can think of this as an exercise in formal verification. I want to prove that `_+_`

satisfies some equations. There is an equals sign in the definition above but this is not the one I should use to state my equations. The `=`

is the definitional equality symbol and denotes equations that the computer can see are true. I want to *prove* some things that the computer isn’t able to see for itself.

The notion we need is propositional equality which is a relation we will define now. In another language one might expect a relation to have type like `Nat → Nat → Bool`

which might return true if the natural numbers were equal. In Agda we would use `Set`

instead of `Bool`

and the set would be inhabited if the relation holds. We can define propositional equality `_≅_`

once and for all for any set `A`

which we write as an implicit parameter `{A : Set}`

. Remarkably this set has only one canonical inhabitant `refl`

which is inhabits the type of equations between really identical (definitionally equal) values. This may seem strange but one can prove (derive) the other properties, such as symmetry and transitivity.

-- propositional equality data _≅_ {A : Set} : A → A → Set where refl : ∀{a} → a ≅ a

We could prove useful lemmas such as symmetry now but for this post I will only need `cong`

– that functions preserve equality. Given any function `f : A → B`

and two equal elements of `A`

the function should return equal results.

The lemma is very easy to prove as when we pattern match on the proof that the elements are equal then the only possible pattern is `refl`

which forces `a`

and `a'`

to be equal, replacing, say, `a'`

with `a`

which reduces our task to showing that `f a ≅ f a`

. This is easily proved using `refl`

.

-- lemma: functions preserve equality cong : {A B : Set} → (f : A → B) → {a a' : A} → a ≅ a' → f a ≅ f a' cong f refl = refl

The lemma is very useful for proving equations that have the same function (such as `suc`

) on both sides. We can use it to reduce our task to proving that what is underneath the function on both sides is equal.

One can define an algebraic structure like a monoid as a record (named tuple/struct/etc.) in Agda. It contains fields for the *data* and also for the laws. A monoid has a carrier set `M`

a distinguished unit element `e`

and a binary operation `op`

. The three laws state that `op`

has `e`

as its left and right unit and it is associative. The monoid record `Mon`

lives in `Set1`

as it contains a `Set`

(`0`

).

-- Monoid record Mon : Set1 where -- data field M : Set e : M op : M → M → M -- laws lunit : ∀ m → op e m ≅ m runit : ∀ m → op m e ≅ m assoc : ∀ m n o → op (op m n) o ≅ op m (op n o)

The goal is to define a monoid where `M = Nat`

, `e = zero`

and `op = _+_`

. Let’s go ahead and prove the laws. The first law which would be `(zero + n) ≅ n`

doesn’t require proof as it holds definitionally (it is the first line of the definition of `_+_`

). The second law does require proof as here the `_+_`

doesn’t compute as the first argument is the variable `n`

.

+runit : ∀ n → (n + zero) ≅ n +runit zero = refl +runit (suc n) = cong suc (+runit n)

We prove the second law `+runit`

by induction on `n`

. When proving a property of a program as we are doing here it’s a good idea to follow the same pattern in the proof as in the program. Here `_+_`

is defined by recursion on its first argument so it makes sense to carry out the proof by induction on the first argument too. This makes things compute nicely. The first case is `zero + zero ≅ zero`

which Agda computes to `zero ≅ zero`

by applying the definition of `_+_`

. This is easily proved by `refl`

. The second case computes to `suc (n + zero) ≅ suc n`

. First we observe that there is a function `suc`

on both sides so we type `cong suc ?`

. This reduces our problem to proving that `n + zero ≅ n`

which follows from the inductive hypothesis `+runit n`

+assoc : ∀ m n o → ((m + n) + o) ≅ (m + (n + o)) +assoc zero n o = refl +assoc (suc m) n o = cong suc (+assoc m n o)

The proof for associativity proceeds analogously. We pattern match on the first argument `m`

which gives two cases. The first case computes to `n + o ≅ n + o`

and the second computes to `suc ((m + n) + o) ≅ suc (m + (n + 0))`

. As before the first case follows by reflexivity and the second case by congruence of `suc`

and inductive hypothesis.

Having done all the hard work we can now define a monoid for `Nat`

, `_+_`

, and `zero`

:

-- natural numbers with addition form a monoid NatMon : Mon NatMon = record { M = Nat; e = zero; op = _+_; lunit = λ _ → refl; -- this one doesn't require proof runit = +runit; assoc = +assoc}

]]>

]]>