In Search of Logic: February 2016

In my post on intuitionistic intuitions, I discussed my apparent gradual slide into constructivism, including some reasons to be skeptical about classical notions of truth. After a conversation with Sam Eisenstat, I've become less skeptical once again - mainly out of renewed hope that the classical account of truth, the correspondence theory, can be made mathematically rigorous.

Long-time readers might notice that this is a much different notion of truth than I normally talk about on this blog. I usually talk about mathematical truth, which deals with how to add truth predicates to a logical language, and has a lot to do with self-reference. Here, I'm talking about empirical truth, which has to do with descriptions of the world derived from observation. The two are not totally unrelated, but how to relate them is a question I won't deal with today.

Pragmatism

At bare minimum, I think, I accept a pragmatist notion of truth: beliefs are useful, some more than others. A belief is judged in terms of what it allows us to predict and control. The pragmatist denies that beliefs are about the world. Thinking in terms of the world is merely convenient.

How do we judge usefulness, then? Usefulness can't just be about what we think is useful, or else we'll include any old mistaken belief. It seems as if we need to refer to usefulness in the actual world. Pragmatism employs a clever trick to get around this. Truth refers to the model that we would converge to upon further investigation:

The real, then, is that which, sooner or later, information and reasoning would finally result in, and which is therefore independent of the vagaries of me and you. Thus, the very origin of the conception of reality shows that this conception essentially involves the notion of a COMMUNITY, without definite limits, and capable of an indefinite increase of knowledge. (Peirce 1868, CP 5.311).

This is saying that truth is more than just social consensus; it's a kind of idealized social consensus which would eventually be reached. The truth isn't what we currently think we believe, but we are still the ultimate judge.

If we make a mathematical model out of this, we can get some quite interesting results. Machine learning is full of results like this: we connect possible beliefs with a loss function which tells us when we make an error and how much it costs us to make different kinds of errors, and then prove that a particular algorithm has bounded regret. Regret is the loss relative to some better model; bounded regret means the total loss will not be too much worse than that of the best model in some class of models.

The ideal model is one with minimum loss; this is the model which we would assent to after the fact, which we'd want to tell to our previous self if we could. Since we can't have this perfect belief, the principle of bounded regret is a way to keep the damage to an acceptably low level. This might not be exactly realistic in life (the harms of bad beliefs might not be bounded), but at least it's a useful principle to apply when thinking about thinking.

Bayesianism

The way I see it, Bayesian philosophy is essentially pragmatist. The main shift from bounded-regret type thinking to Bayesian thinking is that Bayesians are more picky about which loss function is employed: it should be a proper scoring rule, ideally the logarithmic scoring rule (which has strong ties to information theory).

Bayesianism has a stronger separation between knowledge and goals than pragmatism. Pragmatism says that the aim of knowledge is to predict and manipulate the world. Bayesianism says "Wait a minute... predict... and manipulate. Those two sound distinct. Let's solve those problems separately." Knowledge is about prediction only, and is scored exclusively on predictive accuracy. Bayesian decision theory distinguishes between the probability function, P(x), and the utility function, U(x), even though in the end everything gets mixed together in the expected value.

Perhaps the reason this separation is so useful is that the same knowledge can be useful toward many different goals. Even though it's easy to find knowledge which doesn't fit this pattern (which is more useful for some goals than others), the abstraction is useful enough to persist because before you've solved a problem, you can't predict which pieces of knowledge will or won't be useful -- so you need a usefulness-agnostic notion of knowledge to some extent.

I think many Bayesians would also go further, saying truth has to do with a map-territory distinction and so on. However, this concept doesn't connect very strongly with the core of Bayesian techniques. A pragmatic notion of truth ("all models are wrong, but some are useful") seems to be closer to both theory and practice.

Still, this is an extremely weak notion of "truth". There doesn't need to be any notion of an external world. As in the pragmatist view quoted earlier, "the real" is a sort of convergence of belief. Knowledge is about making predictions, so all belief is fundamentally about sense-data; the view can be very solipsistic.

External Things

If all we have access to is our direct sense-data, what does it mean to believe in external things? One simplistic definition is to say that our model contains additional variables beyond the sense-data. In statistics, these are called "hidden variables" or "latent variables": stuff we can't directly observe, but which we put in our model anyway. Why would we ever do this? Well, it turns out to be really useful for modeling purposes. Even if only sense-data is regarded as "real", almost any approach will define probabilities over a larger set of variables.

This kind of "belief in external objects" is practically inevitable. Take any kind of black-box probability distribution over sense-data. If you open it up, there must be some mechanics inside; unless it's just a big look-up table, we can interpret it as a model with additional variables.

The pragmatist says that these extra beliefs inside the black box are true to the extent that they are useful (as judged in hindsight). The realist (meaning, the person who believes in external things) responds that this notion of truth is insufficient.

Imagine that one of the galaxies in the sky is fake: a perfect illusion of a galaxy, hiding a large alien construct. Further, let's suppose that we can never get to that galaxy with the resources available to us. Whatever is really there behind the galaxy-illusion has the mass of a galaxy, and everything fits correctly within the pattern of surrounding galaxies. We have no reason to believe that the galaxy is fake, so we will say that there's a galaxy there. This belief will never change, no matter how much we investigate.

For the pragmatist, this is fine. The belief has no consequence for prediction or action. It's "true" in every respect that might ever be important. It still seems to be a false belief, though. What notion of truth might we be employing, which marks this belief false?

I think part of the problem is that from our perspective, now, we don't know which things we will have an opportunity to observe or not. We want to have correct beliefs for anything we might observe. Because we can't really quantify that, we want to have correct beliefs for everything.

This leads to a model in which "external things" are anything which could hypothetically be sense-data. Imagine we have a camera which we can aim at a variety of things. We're trying to predict what we see in the camera based on where we swing it. We could try to model the flow of images recorded by the camera directly. However, this is not likely to work well. Instead, we should built up a 3D map of the environment. This map can predict observations at a much larger variety of angles than we have ever observed. Some of these will be angles that we could never observe -- perhaps places too high for us to lift the camera to, or views inside small places we cannot fit the camera. We won't have enough data to construct a 3D model for all of those non-possible angles, either; but, it makes sense to talk (speculatively) about what would be there if we could see it.

This is much more than the "black box" model mentioned earlier, where we look inside and see that there are some extra variables doing something. Here, the model itself explicitly presents us with spurious "predictions" about things which will never be sense-data, as a natural result of attempting to model the situation.

I think "external things" are like that. We use models which provide explicit predictions for things, outside of any particular idea of how we might ever measure those things. Like the earlier argument about Bayesians separating P(x) from U(x), this is an argument from convenience: we don't know ahead of time which things will be observable or how, so it turns out to be very useful to construct models which are agnostic to this.

Correspondence

The correspondence theory of truth is the oldest, and still the most widely accepted. We're now in a position to outline and defend a correspondence theory, but first, I'd like to expand a bit more on the concerns I'm trying to address (which I described somewhat in intuitionistic intuitions).

The Problem to Be Solved

According to the correspondence theory, truth is like the relationship between a good map and the territory it describes: if you understand the scale of the map, the georgraphic area it is representing, and the meaning of the various symbols employed by the map, you can understand where things are and navigate the landscape. If you are checking the map for truth, you can travel to areas depicted on the map and check whether the map accurately records what's there.

The correspondence theory of truth says that beliefs are like maps, and reality is the territory being described. We believe in statements which describe things in the external world. These beliefs are true when the descriptions are accurate, and false when inaccurate. Seems simple, right?

I've been struggling with this for several reasons:

The Bayesian understanding of knowledge has no obvious need for map-territory correspondence. If "truth" lacks information-theoretic relevance, why would we talk about such a thing?
Given perfect knowledge of our human beliefs, and perfect knowledge of the external world, it's not clear that a single correct correspondence can be found to decide whether beliefs are true or false. A map which must be graciously interpreted is no map at all; take any random set of scribbles and you can find some interpretation in the landscape.
Even if we can account for that somehow, it's not clear what "territory" our map should be corresponding to. Although the universe contains many "internal" views (people observing the universe from the inside), it's not clear that there is any objective "external" view to compare things to. To understand such a view, we would have to imagine an entity sitting outside of the universe and observing it.

Point #1 was largely what I've been addressing in the essay up to this point. I propose that "truth" is a notion of hypothetical predictive accuracy, over a space of things which are in-principle observable, even if we know we cannot directly observe those things. We use truth as a convenient "higher standard" which implies good predictive accuracy. This ends up being useful because in practice we don't know beforehand what variables will be observable or closely connected with observation. The hypothesis of an external world has been validated again and again as a predictor of our actual observations.

In order to address point #2, we need a mathematically objective way of determining the correspondence by which to judge truth. In our story about maps and territories, a human played an important role. The human interprets the map: we understand the correspondence, because we can check whether the map is true. It's not possible for the correspondence to be contained in the map; no matter how much is written on the map to indicate scale, meaning of symbols, and so on, a human needs to understand the symbolic language in which such an explanation is written. This metaphor breaks down when we attempt to apply it to a human's beliefs. It seems that the human, or at least the human society at large, needs to contain everything necessary to interpret beliefs.

Point #3 will similarly be addressed by a formal theory.

Now for some formalism. I'll keep things fairly light.

Solution Sketch

Suppose we observe a sequence of bits. We call these the observable variables. We want to predict that sequence of bits as accurately as possible, by choosing from a set of possible models which make different predictions. However, these models may also predict hidden variables which we never observe, but which we hypothesize.

Definition. A model declares a (possibly infinite) collection of boolean-valued variables, which must include the observation bits. It also provides a set of functions which determine some variables from other variables, possibly probabilistically. These functions must compose into a complete model, IE, giving probabilities to all of the variables in the model. A model with only the observed variables is called a fully observable model; otherwise, a model is a hidden-variable model.

Note that because the global probability distribution is made of local functions which get put together, we've got more than just a probability distribution; we also have a causal structure. I'll explain why we need this later.

(I said I'd keep this light! More technically, this means that the sigma-algebra of the probability distribution can contain events which are distinct despite allowing the same set of possible observation-bit values; additionally, a directed acyclic graph is imposed on the variables, which determines their causal structure.)

Any hidden-variable model can be transformed into a fully observable model which makes the exact same predictions, by summing out the hidden variables and computing the probability of the next observation purely from the observations so far. Why, then, might an agent prefer the hidden-variable version? My answer is that adding hidden variables can be computationally more convenient for a number of reasons. Although we can always specify a probability distribution only on the history, there will usually be intermediate variables which could be useful to many predictions. These can be stored in hidden variables.

Consider again the example of a camera moving around a 3D environment. It's possible to try and predict the observations for a new angle as a pure function of the history. We would try to grab memories from related angles, and then construct a prediction on the fly from those; sometimes simple interpolation would give a good enough prediction, but in general, we need to build up a 3D model. It's better to keep the 3D models in memory and keep building them up as we receive more observations, rather than trying to create them from sensory memories on the fly.

Is this an adequate reason for believing in an external world? I'm implying that our ontology is a rather fragile result of our limited computational resources. If we did not have limited computing power, then we would never need to postulate hidden variables. Maybe this is philosophically inadequate. I think this might be close to our real reason for believing in an external world, though.

Now, how do we judge which model is true?

In order to say that a model is true, we need to compare it to something. Suppose some model R represents the real functional structure of the universe. Unlike other models, we require that R is fully deterministic, giving us a fixed value for every variable; I'll call this the true state, S. (It seems to me like a good idea to assume that reality is deterministic, but this isn't a necessary part of the model; the interested might modify this part of the definition.)

Loose Definition. A model M is meaningful to the extend that it "cuts reality at the joints", introducing hidden variables whose values correspond to clusters in the state-space of R. The beliefs about the hidden variables in M are true to the extend that they pick out the actual state S.

Observations:

We're addressing issue #3 by comparing a model to another model. This might at first seem like cheating; it looks like we're merely judging one opinion about the universe via some other opinion. However, the idea isn't that R is supplied by some external judge. Rather, R is representing the actual state of affairs. We don't ever have full access to R; we will never know what it is. All we can ever do is judge someone else's M by our probability distribution over possible R. That's what it looks like to reason about truth under uncertainty. We're describing external reality in terms of a hypothetical true model R; this doesn't mean reality "is" R, but it does assume reality is describable.
This is very mathematically imprecise. I don't know yet how to turn "cut reality at the joints" into math here. However, it seems like a tractable problem which might be addressed by tools from information theory like KL-divergence, mutual information, and so on.
Because R determines a single state S, the concept of "state-space" needs to rely on the causal structure of R. Correspondence between M and R needs to look something like "if I change variable x in R, and change variable y in M, we see the same cascade of consequences on other corresponding states in both models."
The notion of correspondence probably needs to be a generalization of program equivalence. This suggests truth is uncomputable even if we have access to R. (This is only a concern for infinite models, however.)
The correspondence between M and R is an interpretation of our mental states as descriptions of states of reality.
In order to be valid, the correspondence needs to match the observable variables in M with the observable variables in R. Other variables are matched based on the similarity of the causal structure, and the constraint imposed by exactly matching the observable variables.

To show that my loose definition has some hope of being formalized, here is a too-tight version:

Strict Definition. A model M is meaningful if there is a unique injection of the variables of M into those of R which preserves the causal structure: if x=f(y,z) in M, and the injection takes x,y,z to a,b,c, then we need a=g(b,c,...) in R. (The "..." means there can be more variables which a depends on.) Additionally, observable variables in M must be mapped onto the same observable variables in R. Furthermore, the truth of a belief is defined by the log-loss with respect to the actual state S.

This definition is too strict mainly because it doesn't let variables in our model M stand for clusters in R. This means references to aggregate objects such as tables and chairs aren't meaningful in this definition. However, it does capture the important observation that we can have a meaningful and true model which is incomplete: because the correspondence is an injection, there may be some variables in R which we have no concept of in M. This partly explains why our beliefs need to be probabilistic; if the variable "a" is really set by the deterministic function a=g(b,c,d,e) but we only model "b" and "c", our function x=f(y,z) can have apparently stochastic behavior.

Let's take stock of how well we've addressed the three concerns I listed before.

#1: Bayesianism doesn't need map-territory correspondence.

I think I've given a view that a pragmatist can accept, but which goes beyond pragmatism. Bayesian models will tend to talk about hidden variables anyway. This can be pragmatically justified. A pragmatist might say that the correspondence theory of truth is a feature of the world we find ourselves in; it could have been that we can model experience quite adequately without hidden variables, in which case the pragmatist would have rejected such a theory as confused. Since our sensory experience in fact requires us to postulate hidden variables to explain it very well, the correspondence theory of truth is natural to us.

#2: It's not clear that there will be a unique correspondence. A map which must be interpreted generously is not a good map.

I think I've only moderately succeeded in addressing this point. My strict definition requires that there is a unique injection. Realistically, this seems far too large a requirement to impose upon the model. Perhaps a more natural version of this would allow parts of a model to be meaningless, due to having non-unique interpretations. I think we need to go even further, though, allowing degrees of meaning based on how vague a concept is. Human concepts will be quite vague. So, I'm not confident that a more worked-out version of this theory would adequately address point #2.

#3: It's not clear that there IS an objective external view of reality to judge truth by.

Again, I think I've only addressed this point moderately well. I use the assumption that reality is describable to justify hypothesizing an R by which we judge the truth of M. Is there a unique R we can use? Most likely not. Can the choice of R change the judgement of truth in M? I don't have the formal tools to address such a question. In order for this to be a very good theory of truth, it seems like we need to be able to claim that some choices of R are right and some are wrong, and all the right ones are equivalent for the purpose of evaluating M.

In Search of Logic

Sunday, February 28, 2016

The Correspondence Theory