## Monday, August 20, 2012

### Truth and AI

I've written extensively in the past about the relationship between foundations of mathematics and AGI; both on this and my previous blog, and in numerous posts to the AGI mailing list. I claimed that numerous problems in the foundations of mathematics needed to be solved before we could create true AGI.

An AGI should, with enough processing power and training, be able to learn any concept which humans can learn. However, in order to learn a concept, it needs to be able to represent that concept in the first place. So, if we find that an AGI system's internal representation can't handle some concept, even in principle, then we should extend it.

I called this the "expressive completeness" requirement. Logicians have some bad news for such a requirement: Tarski showed that any sufficiently powerful logic system is incapable of expressing its own semantics. This means there is at least one concept that can't be expressed, in any knowledge representation: the meaning of that representation.

This is related to Goedel's second incompleteness theorem, which says that we can never prove our own logic sound; any logic which can say if itself "all the results I derive are true" must be wrong about that!

Intuitively, this seems to indicate that whatever logic humans use, we won't be able to figure it out. A logic system can only understand weaker logic systems. This would suggest that we are doomed to forever ponder weak theories of mind, which are unable to account for all of human reasoning.

As a result, my hope for some time was to "get around" these theorems, to solve the expressive completeness problem. (This is not quite as hopeless as it sounds: the specific statements of the theorems do contain loopholes. The problem is to decide which assumptions are not needed.)

However, two years ago, I decided that the problem didn't really need to be solved. Here is the message I sent to the AGI list:
In the past I've done some grumbling about "expressive completeness" of an AGI knowledge representation, specifically related to Tarski's undefinability theorem, which shows that there is always something missing no matter how expressively powerful one's language is. Perhaps you remember, or perhaps you weren't around for it, but basically, I argued that for any given AGI system the Tarski proof could show where it had a representational hole: a concept that it could not even express in its representation.

Today I retract the worry and give a broad, somewhat tentative "thunbs up" to opencog, NARS, Genifer, LIDA, and any similar systems (at least insofar as logical completeness is concerned).

I still think the theory of logical completeness is important, and can bear important fruits, but at the moment it looks like its main result is just to say what many of you knew all along-- a "full steam ahead" on existing systems. I recognize that it's a hard sell to claim that we should do all that work to get the already-obvious answer.

Beyond that point, AGI researchers won't care all that much and I'm more doing some (albeit strange) philosophical logic.

The sketch of the result goes like this.

Jumps up the Tarski hierarchy of languages are fairly easy to justify, due to the various benefits that more logical power offers. These include speed of reasoning and more concise notation of certain concepts. Most AGI systems will be able to see these benefits and, if not explicitly endorsing the move *in their original logic*, will move toward stronger logics implicitly at their procedural level.

Worst-case, the system could replace itself "from the outside" by taking action in the external environment to modify itself or create a successor...

(Ideally, of course, it would be nice to have a "stable" system which explicitly accepted improvements in its initial logic.)

In conclusion, the best style of AGI I can recommend to acheive completeness is what I think of as "Ben's zen AGI approach" (apologies to both Zen and Ben Goertzel for any misrepresentation): give up your attachement to individual algorithmic approaches, and just take the best algorithm for the job. Put these good algorithms together in a blackboard-like environment where each can do its job well where it applies, and after a while, general intelligence will emerge from their co-authorship.
Recently, I have been thinking about these issues again. I think it is time for a bit more on this topic.

As I mentioned, we have some very strong limitative results in logic. To give a version of these results for AGI, we can talk about learning. Wei Dai gives a form of this result, calling it the unformalizability of induction.

A Bayesian learning system has a space of possible models of the world, each with a specific weight, the prior probability. The system can converge to the correct model given enough evidence: as observations come in, the weights of different theories get adjusted, so that the theory which is predicting observations best gets the highest scores. These scores don't rise too fast, though, because there will always be very complex models that predict the data perfectly; simpler models have higher prior weight, and we want to find models with a good balance of simplicity and predictive accuracy to have the best chance of correctly predicting the future.

Unfortunately, the space of models is necessarily incomplete. There exists a model which is intuitively fairly simple, but which necessarily does not exist in our Bayesian learner: the possibility that an "evil demon" (to borrow Decart's idea) is fooling the agent. The demon is performing the computations of the Bayesian update, to find out the probability distribution over the next observations, and then choosing for the agent to see that observation which is least probable according to the agent's probability distribution.

It is impossible that the agent converges to this model; therefore it must not exist in the space of models being considered.

Shane Legg used this idea to show that there is no "elegant universal theory of induction".

Of course, the practical concern is not really over evil demons; that's a rather unconcerning hypothesis. The demon is just a way of showing that the model space is always incomplete. In reality, more practical theories escape the model space.
• AI based on Hidden Markov Models cannot learn hierarchical patterns such as nested parentheses, and will never learn to count.
• AI based on context-free grammars cannot learn context-dependent patterns, and will never learn the general rule of subject-verb agreement.
• AI based on bounded computational resources cannot learn the true model of the world if it requires more computing power than is available (but it generally does!).
We know that Bayesian reasoning is the best option when the true model is within the space of models. But what happens when it is not?

Luckily, we can still say that Bayesian updates will cause beliefs to converge to the models which have the least KL-divergence with reality. For example, in the evil demon case, beliefs will go towards maximum entropy; the agent will treat reality as random. Given the structure of the situation, this is the best strategy to minimize incorrect predictions.

However, there is still more to worry about. In my email 2 years ago, I claimed that an AI system could see the benefits of more inclusive model spaces, and modify themselves to include what they might be missing. It would be good if something like this could be made more formal and proven.

For example, is it possible for an agent based on hidden markov models to learn to count if we allow it a 'cognitive workspace' which it is expected to learn to use? We can imagine an RL agent which works by doing HMM learning and then HMM planning to maximize reward. We augment it by giving it a memory containing a stack of symbols, and actions to push symbols to memory and pop symbols out of memory. The symbol popped out of memory is presented as an observation to the system. Can the agent learn to use memory to its benefit? Can it now learn pushdown automata, rather than only learning models with finite state-space?

I think the answer is, no. Because the agent does explicit HMM planning, it will not perform an action unless the HMM predicts a benefit in the future. This makes it very difficult for the agent to learn when to push symbols, because it would need to be able to look ahead to the time of their being popped; but the whole point is that the HMM cannot do that.

This suggests that my conclusion in the old email was wrong: reasonable systems may reject obvious-seeming self-improvements, due to lack of imagination.

Taking this as an example, it seems like it may be a good idea to look for learning methods which answer "yes" to these kinds of questions.