Yet, in some ways at least, the field is headed in the opposite direction.
I've often discussed the Chomsky hierarchy, and how most techniques at present fall very low on it. I've often discussed hierarchies "above" the Chomsky hierarchy; hierarchies of logic & truth, problems of uncomputability and undefinability. Reaching for the highest expression of form, the most general notion of pattern.
Machine learning has made artificial intelligence increasingly practical. Yet, the most practical techniques are often the least expressively powerful. Machine learning flourished once it abandoned the symbolic obsession of GOFAI. Fernando Pereira famously said: "The older I get, the further down the Chomsky Hierarchy I go."
There's a good reason for this, too. Highly structured techniques like logic induction and genetic programming (both of which would go high in the hierarchy) don't scale well. Commercial machine learning is large-scale, and increasingly so. I mentioned this in connection with word2vec last time: "Using very shallow learning makes the technique faster, allowing it to be trained on (much!) larger amounts of data. This gives a higher-quality result."
The "structure" I'm referring to provides more prior bias, which means more generalization capability. This is very useful when we want to come to the correct conclusion using small amounts of data. However, with more data, we can cover more and more cases without needing to actually make the generalization. At some point, the generalization becomes irrelevant in practice.
Take XML data. You can't parse XML with regular expressions.1 Regular expressions are too low on the Chomsky hierarchy to form a proper model of what's going on. However, for the Large Text Compression Benchmark, which requires us to compress XML data, the leading technique is the PAQ compressor. Compression is equivalent to prediction, so the task amounts to making a predictive model of XML data. PAQ works by constructing a probabilistic model of the sequence of bits, similar to a PPM model. This is not even capable of representing regular expressions. Learning regular expressions is like learning hidden markov models. PPM allows us to learn fully observable markov models. PAQ learns huge markov models that get the job done.
The structure of XML requires a recursive generalization, to understand the nested expressions. Yet, PAQ does acceptably well, because the depth of the recursion is usually quite low.
You can always push a problem lower down on the hierarchy if you're willing to provide more data (often exponentially more), and accept that it will learn the common cases and can't generalize the patterns to the uncommon ones. In practice, it's been an acceptable loss.
Part of the reason for this is that the data just keeps flowing. The simpler techniques require exponentially more data... and that's how much we're producing. It's only getting worse:
|Has Big Data Made Anonymity Impossible? MIT Technology Review|
I can't help but agree. Yet, it seems the tide of history is against us. As the amount of data continues to increase, dumb techniques will achieve better and better results.
Will this trend turn around at some point?
Gary Marcus points out that some information just isn't available on the web. Yet, this is a diminishing reality. As more and more of our lives are online (and as the population rises), more and more will be available in the global brain.
Artificial intelligence is evolving into a specific role in that global brain: a role which requires only simple association-like intelligence, fueled by huge amounts of data. Humans provide the logically structured thoughts, the prior bias, the recursive generalizations; that's a niche which machines are not currently required to fill. At the present, this trend only seems to be increasing.
Should we give up structured AI?
I don't think so. We can forge a niche. We can climb the hierarchy. But it's not where the money is right now... and it may not be for some time.
1: Cthulhu will eat your face.