Aside: my previous post, this one, and perhaps the next one or two do not represent me "thinking aloud" as my posts usually do... instead, I'm summarising thoughts that I had over the semester, but felt too busy to blog about.
So. How well can we do with just a single layer of probability, over a standard logic? How can we achieve something similar to Solomonoff induction?
One idea is to imagine that the universe we are observing was constructed by randomly writing down sentences in logic to determine each fact, only taking care not to ever write something which contradicts what we've already written. For example, we can use a length-based prior to choose a sentence at random from first-order logic. This prior obviously contains the Solomonoff prior ("is dominant over" as they say), but it's not clear to me whether the relationship goes the other way ("is dominated by") so that they are roughly equivalent. I'm thinking the prior I'm specifying is "even more uncomputable," since we have to check for consistency of theories.
This prior lends itself to an interesting approximation scheme, though.
- Each sentence has at least log-probability inversely proportional to its length
- To this, we can add some probability mass for each consistent sequence of sentences which we find that implies the sentence. (To test for "consistent" we just try to find a contradiction for some reasonable amount of time, perhaps trying harder later if it's important.)
- Normalise this probability by dividing by (1 - probability we encounter an inconsistent sequence drawing randomly), which is estimated based on the cumulative probability of the inconsistent sequences we've identified so far.
- This gives a rough lower bound for the probability. To get an upper bound, we apply the same technique to the negation of the sentence.
- If A implies B, then B is at least as probable as A (ie, it can inherit A's lower bound). Likewise, A is at most as probable as B (inheriting B's upper bound). In other words, we can use existing ideas about propagating probability intervals to work with these estimates if we like.
- To update on evidence, we use compatibility with that evidence as an extra criteria for consistency of sequences of statements. Intuitively, this will throw out a bunch of possibilities (and increase the amount of normalisation by doing that, which increases the probability of the remaining possibilities).
Unfortunately, these bounds would tend to be very wide, I think. The system would have to do a lot of thinking to narrow them down. More generous approximations might find one model and stick to it until it's disproved or a better model is found, say. This would at least give the system something to work with.
Now that we have a little bit of foundation in place, I think we can more usefully consider what to do if we want higher-order probabilities. Just as we can randomly choose sentences of first-order logic, we can randomly choose sentences of any system of probabilistic models we like (such as BLP, BLog, etc). This allows us to use the inference algorithms, et cetera from our favourite formalism. As long as first-order logic is a special case of the formalism, it will be just as universal.
For example, we could perfectly well use a prior based on random theories stated with the (objective-)probability-theoretic quantification which I was discussing in the previous posts. If we do this, we basically solve the puzzles I was running up against there: the inference from objective probabilities over classes to subjective probabilities over individuals is justified by the regularity assumption in the prior, IE, by the idea that the objective probability might be an "essential feature" of the class... something that was written down in the book of rules for the universe before the individual cases were decided.