If that's the case, I might as well manage my image... at least a little.
As such, it would be good for me to explain a bit about what I am doing and why I am doing it.
My major recurring rant has been referential completeness. The idea is that an intelligent machine should be able to represent any kind of pattern that a human can. This is motivated by the fact that it's impossible to learn a pattern that you can't represent in the first place.
This has led me to grapple with some difficult issues in logic and the foundations of mathematics, since the undefinability theorem proves that there is no referentially complete system (in a specific sense). Unsurprisingly, I have not come up with a general response to this problem (though I think Hartry Field has some interesting ideas). However, I do have one message which I'd like to get across to as many practitioners as possible:
Relational methods are needed for general intelligence.
(I'm sure I've written about this before, but I will aim to give a decent unified post on the subject here.)
Now, I'm specifically talking about probabilistic relational methods. I will not, however, be giving an argument for probability theory. (Perhaps I'll discuss that in another post.) Rather, I'll be concentrating on the question of whether relational methods are needed.
Many people are beginning to understand this point whether I preach it or not. However, a large amount of research still goes on at the level of propositional methods. This isn't a bad thing (better propositional methods are very useful), but people often don't think about the issue and end up thinking that propositional methodology can lead to human-level intelligence.
So: what is the difference?
Propositional methods are so-called by analogy to propositional logic. Propositional methods deal with a fixed, finite number of variables, looking for predictive relationships like correlations (or even causation) between these. Variables are "basic"; they have no further break-down. It is not possible to learn a correlational structure for some variables and then generalize that to other variables, because there is no information to suggest which variables might be analogous.
Relational methods use more information about the structure of a problem. The details vary from formalism to formalism, but variables result from instantiation. This might be the instantiation of a class of objects (in object-oriented thinking), or an instantiation of a predicate or relation (in first-order language). In either case, we see that variables are now "about something": we have entities which we are describing.
This distinction can be surprisingly difficult to get across at times. Advocates of propositional learning methods will typically say that the difference is "We are learning the relations, rather than giving them ahead of time". This confusion results from using the word "relation" to describe multiple things. Logically, we can split a domain into very distinct levels:
- Entities
- Entity Properties
- Probabilistic Generalizations about Properties
Entity properties include relationships of one entity to another. This is very different from a probabilistic relationship, no matter how complicated. Trying to form relations from correlations is a level-confusion.
This mistake is rarely made with crisp propositional logic (Boolean logic). The limitation there is very clear. Chess is an often-cited example. Stated in propositional logic, the rules of chess would run thousands of pages; however, they can be stated in first-order logic in just a few pages. Why? Because in first-order logic, we can generalize the rules based on the structure of the game. In propositional logic, we must re-state rules for each board location and each round of the game.
For some reason, many people lose sight of the simple limitations of propositional methods when they begin to do more complicated things such as probabilistic reasoning.
Now, we can see one reason quite easily. I said that the rules of chess must be re-stated for each round of the game; ie, they do not generalize over time. However, it is quite standard to generalize over time with propositional methods in machine intelligence. Only a fool would not do so! Indeed, if I point out any one dimension which an algorithm fails to generalize over, it's generally fairly easy to add that dimension to the system's generalization capability. As such, it's not hard to miss the general pattern: a propositional method needs to be fed the dimensions in which it's useful to generalize up front. It can't see the possible analogies by itself.
(One good way to identify the representational power of a system is to ask what kinds of analogies it is capable of making.)
Because the key idea of relational methods is structured regularity, it also seems to be easy for people to confuse various hierarchical methods for relational power. Deep belief networks, hierarchical temporal memory, and other similar methods identify more and more complicated sorts of patterns as we move up the hierarchy. (This is similar to the way that cortical columns have been observed to identify increasingly abstract patterns as we get further from the v1.) However, in the analogy to logic, this is like building up larger and larger propositional sentences. At no point does the system become capable of structured generalizations.
An astute reader may point out that a universal quantifier can be viewed as just a large conjunction, and an existential quantifier as a large disjunction. So long as the number of entities is finite, first-order logic really can be viewed as just propositional logic. This is true. However, learning in this form will take exponentially more data than learning via generalizations. Obviously, this approach forces us to separately re-learn every case of the general rule. We could only see it as creating a "generalization" after the fact; it is unable to use this generalization to speed learning.
Along similar lines, the advocate of propositional methods may argue that humans must be essentially propositional, as we have a finite number of inputs and outputs. (This is related to arguing that the brain is obviously a finite-state machine.) Again, this kind of model does not seem to be a good account of the sorts of patterns people can actually learn to spot (and how quickly they can learn to spot them). Unfortunately, this would be a very long argument to try and spell out in detail. One example is the way people can perform on IQ tests. To my knowledge, the only way this performance can be replicated by computers is via analogical reasoning, which is a deeply relational approach.