## Monday, January 21, 2013

### One More Problem

Last time, I listed some problems which my current foundation idea doesn't solve.

Here's one more, and I'd refer to it as the problem.

First, some set-up.

Eliezer has been writing some good stuff about 2nd-order logic. He's touching on a problem which I've written about before, namely: if any proof-system can be interpreted via theories in 1st-order logic, why is it that we can discuss ideas (such as number theory) which seem not to be fully characterized by it? We think we mean something more when we talk about higher-order concepts, but how can we, if everything could be reduced to first-order talk? Let's call this the problem of meaning.

My original hope was (somehow) to solve the problem using a sufficiently powerful theory of truth. Tarski's undefinability theorem gives us a way to, given some logic, 'mechanically' construct one thing which the logic cannot express: the truth predicate in that logic. No logic can express its own semantics. Therefore (I must have been thinking), we can use 'truth' as a standard way of extending a logic, and hope that extending it in this way will automatically make it more powerful in every other way that is interesting to us. (There is fairly good support for this idea.) I hoped to solve the problem of meaning by solving the problem of truth.

Along the way, I learned about the arithmetic hierarchy and other hierarchies. These provide a characterisation of how difficult logical/mathematical problems are. Only the first couple of levels of the arithmetic hierarchy can be handled on computers. The rest are increasingly uncomputable, and the other hierarchies just get worse. (The arithmetic hierarchy deals just with problems in number theory, and the next-up, the analytic hierarchy, deals with real analysis.) Given that we cannot compute facts about the higher-up stuff, the question is, in what sense can we refer to it? I referred to this as a grounding problem: in what sense does our logic refer to that stuff, if it can't access the truths of those hierarchy levels? This is a big aspect of the problem of meaning.

A somewhat different question (and more relevant to this post) is: How can we define rational belief about these things?

Anyway, I eventually got a bit fed up trying to climb levels, and attempted to solve the problem more directly by just coming up with a complete theory of truth. I eventually learned that the sorts of approaches I was considering, which attempted to build up truth incrementally, would typically only get me predicative mathematics at best, which is not very much. Finally, recently, I specified my impredicative theory of truth, realising that impredicativity in itself suggests a design for putting a classical truth predicate within its own language, getting around Tarski's theorem (in a sense). This gives something much like 2nd-order logic.

So, having declared tentative success on Truth, let's return to the question of Meaning.

Interestingly, 2nd-order logic seems to contain every meaningful mathematical statement we've come up with so far... it can say anything which we would want to invent higher-order logic for. is logically complex enough that it doesn't fit in any hierarchies of difficulty, "past, present, or future". So, it makes a surprisingly good candidate for a logic of thought. (Higher-order logics may still be important, but are in some sense "merely convenient".)

There are basically two reasonable semantics for my logic. One: only the syntactically constructible predicates exist. This makes sense if we want the theory of truth to just be talking about the truth of stuff we can say. Two: all mathematically possible predicates exist. This turns my logic into a syntactic variant of 2nd-order logic.

Really, I'd like to find a way to keep the semantics usefully ambiguous. Many of the theorems for real numbers have analogous results for the computable real numbers. We might hope for some similar match between results for constructible predicates and the mathematically possible predicates. This would tell us that, to some extent, it doesn't matter which we pick.

My idea is that the predicate ambiguity represents extensible meaning. When we define the natural numbers, we say "The least set containing zero and closed under succession". The semantic question is, how do we interpret "set"? What sets exist? Answer #1 says that only sets we can define exist. Answer #2 says that we refer to a broader range of sets, which is intuitively maximal (contains all possible sets). What I'm saying is that we want to refer to "all sets we, or anyone else, might conceive of"; in other words, we don't just  want to include stuff we know how to define, but also things we might learn about later, or anything anyone might define.

Constructivists might object that this does not justify semantics #2 at all: even "all the predicates that might be defined" should be a countable set. I would counter that it is still uncountable in some sense: we solve Skolem's paradox by pointing out that "uncountable" is relative to a logical background. It simply means we can't put the predicates in one-to-one correspondence with the natural numbers using what we currently know how to say. Therefore constructivists should not be "fundamentally" unhappy with uncountability. Most constructivists I know will now argue that the powerset is "actually" countable, despite being apparently uncountable in this relative sense, but at this point they are actually relying on classical mathematics!

Yet, they should point out that I'm being disingenuous. I'm stating that the semantics of my 2nd-order quantifiers should be, in some sense, ambiguous, to allow for the broadest possible interpretation. It's true that, for any particular logic, the predicates definable in that logic will be uncountable within the logic. However, I'm claiming something more: that the quantifiers are ranging over predicates in some imaginary maximal logic (which very clearly does not exist). In doing so, I'm effectively claiming "true uncountability" instead of relative uncountability: the maximal notion of predicates would be uncountable in any logic. A constructivist might be OK so long as I'm honestly being ambiguous, but with the intent of being maximal? There is, arguably, no way of making sense of that (without some specific background theory defining "maximal"...).

Now would be a good time to point out that I'm only doing armchair philosophy here. There are many details of the discussion which would benefit greatly from a formal treatment. For example, I'm not explicitly defining what it means for my logic to claim that its own predicates are uncountable (and this is something which you should be suspicious about, since the theory of truth I offered doesn't directly allow us to ask such a question). I'd like to explore all this further, but it's just a hobby at this point, so I'm prioritising the fun part at the expense of hard work.

So, putting aside the justification, let's just admit that I want to talk about classical mathematics and go for the full 2nd-order semantics. The central question again:

How can we define rational belief about these things?

"Rational belief" could be interpreted in several ways; for the moment, let's formalise rational degrees of belief as probabilities. If you're a Bayesian, then, the question is how to define a prior probability distribution over 2nd-order logic.

I know how I want probabilities on 1st-order logic to work. How do we define a prior over logical beliefs? It makes sense to suppose that statements in 1st-order logic are being drawn at random to construct a theory. We update on evidence by requiring this theory to be consistent with everything we know. When we want to know the probability of a statement being true, we ask the probability that it occurs in such a theory.

It's possible to approximate this probability distribution arbitrarily well, because it is possible to detect inconsistency in 1st-order theories. The proof system corresponds exactly to the semantics, so all we need to do is look for proofs of inconsistencies and throw out inconsistent ones.

This approach doesn't obviously generalise. We can define randomly generated consistent theories for other systems, but we can't approximate these distributions by looking for proofs. Suppose we're dealing with first-order arithmetic. We want to know the probability of the Goldbach Conjecture. We consider randomly constructed theories, throwing out those which we can prove inconsistent. Unfortunately, there will be a finite probability that the statement "there exists an even number which is not the sum of two primes" gets generated. The probability of this statement will go down as we rule out theories which include this statement but imply falsifiable things; however, it may converge to a finite number greater than zero. The axioms of first-order arithmetic are incomplete, so there may be no proof of the conjecture, in which case randomly generated theories would have a positive probability of hypothesising a counterexample. This is referred to as an omega inconsistency.

Even for just the stuff in the arithmetic hierarchy, it is not possible to specify any convergent computation which will arrive at the truth in every case. (And again, 2nd-order logic goes well beyond the arithmetic hierarchy, and the analytic hierarchy as well.)

Yet, we can do better than just the 1st-order probability. For example, we can converge to the truth of the Goldbach conjecture just by checking examples. We could put the probability of the conjecture being false at 1/N, where N is the number of positive examples we have found so far. If we find a counterexample, of course, then we permanently put the probability of it being false to 1. This is very rough, but would be guaranteed to eventually get arbitrarily close to the correct answer, so long as we eventually check any particular number.

More realistically, we make a general rule: for an arbitrary universal statement about natural numbers, we would like to converge to probability 1 as we check examples and find no counterexample. It would make sense to converge slowly: we would expect the probability of finding a counterexample as we check more numbers to go down roughly like the probability of a Turing machine halting as we wait longer. (It's improbable that the first counterexample is extremely large, but not very improbable.)

A central problem for me is: how can we make that sort of behaviour come naturally out of a general definition of rational belief over an incomplete logic?

Just for the language of first-order arithmetic (and therefore the arithmetic hierarchy), it seems possible to define appropriate belief in terms of something like the case-checking I've described. This cannot arrive at the correct belief for everything, because nested quantifiers make it possible to quantify over undecidable properties (so that we cannot check cases); however, it seems reasonable to say that it's the best we can do. In cases where we cannot converge to the truth, we should simply ensure that we don't converge toward 1 or 0. (Note: I'm hiding a lot of work here. The case-checking is inadequate as stated; we would like to guarantee nice logical properties for the probability distribution, like if A->B, P(A)<=P(B). However, this seems "not so hard".)

However, impredicative 2nd-order quantification seems impossible to check "by cases" almost by definition: even assuming only definable predicates, we still have the circular truth conditions introduced by the impredicativity. This seems to make a solution unlikely.

Maybe there is a better way of thinking about it, though. Depending on how we take my idea that the space of predicates is supposed to be usefully ambiguous, we might prefer the Henkin semantics, which actually gets us back to the first-order case: we don't assume that only the definable sets exist, but those are the only ones we know to exist. The impredicative definitions are grounded in some (unknown) model, rather than deriving their truth values circularly from facts about themselves. This makes the semantics equivalent to first-order logic with extra axioms; we can handle it in the same way we handled probabilities over 1st-order beliefs.

This is too weak, of course. We wanted to be able to do reasonable probabilistic inference about the Goldbach conjecture. However, it may be a good place to start. How can we get the case-checking behaviour for arithmetic, from something similar to a Henkin semantics?

One approach would be to assert the existence of enumerated classes: classes generated by given constants and functions. For example, even before asserting any axioms about the natural numbers, we would have the class NAT: 0, s(NAT). (Since we haven't stated anything, it could be that S(0)=0, et cetera. Only the existence of the class is guaranteed.) These are unnecessary in crisp logic, having crisp 2nd-order equivalents. However, they define special probabilistic behaviour: enumerated checking.

These aren't needed in the syntax of the language; just the semantics. When we give the inductive definition for natural numbers (n such that: for all X, if X(0) and for all i, X(i)->X(s(i)), then X(n)), then NAT will simply be one instantiation of the 2nd-order quantification in the semantics. We will need to account for this in the probabilistic calculation: we can use randomly generated theories, as for first-order logic, but the probability of statements concerning NAT and other enumerated classes would follow special rules. Since the probability of the Goldbach conjecture stated with respect to NAT would become high, and it being true of NAT implies that it is true of the natural numbers as normally defined (because NAT contains zero and successors), the probability with respect to the natural numbers would have to be high as well. (We would use case-checking to find probabilities of omega-inconsistency, and then use that to reduce the significance of possibilities which appear omega-inconsistent.)

It's not exactly pretty, but at least it gets the job done. Making a special kind of class for specific probabilistic behaviour feels a bit awkward, raising the question of whether there are any other kinds of classes we're forgetting. It would be better if the result emerged more naturally from concerns of what probabilistic behaviours can be obtained; perhaps out of a universal distribution or something...