Friday, June 11, 2010

Goal Systems

Continues this.

It seems fairly clear that backwards-chaining should be seen as a result of goal-oriented behaviour, whereas forward-chaining should be more a result of the reactive reasoning layer. This thought arises from several considerations (including efficiency), but mainly from the idea that backtracking is simply an attempt to achieve a goal by trying different possible (sets of) subgoals in turn.

A simple production-rules-vs-logic-programming view would see the issue as two different ways of interpreting the conditional from propositional logic; A->B either means "if you have A, conclude B" (if you're a production rules person) or "if you want B, try A" (if you're a logic programming person). I'm denying that that is what's going on. A->B in the knowledge base means just what it means in the declarative logic, and doesn't by itself trigger any inferences. For inferences to be triggered, procedural knowledge must be present. Procedural knowledge can always be put in a series of if-then type statements which are executed by forward-chaining; both "if you have A, conclude B" and "if you want B, try A" are just examples of this.

Now, for probabilistic-relational systems, goal-oriented reasoning is set within the framework of bayesian utility theory. The mathematical theory is simple and elegant (although there have been refinements over time, such as causal decision theory, timeless decision theory, updateless decision theory); but like pure probabilistic reasoning, it isn't immediately obviously how to efficiently implement/approximate. Much of the relevant research for this comes from the domain of reinforcement learning; also, an algorithm that's generating a lot of research right now is monte carlo tree search. It seems reasonable to follow these threads of research for a goal-system implementation.

For me, it seems natural to replace the on/off subgoals & backtracking which appear in crisp systems with a more continuous goal structure. This is based on the analogy to the way the truth values become continuous in these systems, which my be a bit hasty. Still, here is my rough outline:

I've been considering for some time a system which allocates time to different reasoning processes based on a currency of "effort". Effort could take two forms: a longer-term effort would represent percentage of processing time, while a shorter-term effort would represent amounts of time. There would be long-ish cycles in which every goal that has long-term effort attached to it gets a corresponding amount of short-term effort assigned to it. A goal can then cause subgoals to be spawned which it assigns some of its effort to.

This may or may not be a good idea. :) The shorter-term effort seems like a natural concept; it's been invented independently by two different people I've talked with (Ben Goertzel's group, and Russel Wallace). It seems useful for complex combinations of approximation algorithms that can be given more or less time to give better or worse answers, especially if they may employ each other or themselves in a loopy manner. Programs based on this sort of effort would be guaranteed to terminate in whatever amount of time they are given (though, perhaps not with a very good answer.)

The longer-term type of effort is not so obviously useful; hard to say.

Wednesday, June 9, 2010

Procedural Logic

I've decided to start using the term "procedural logic" for what I called action logic in the previous post on this topic. This better reflects the desire to merge the procedural and declarative; also, it steers clear of logics already named "action logic."

To pick up roughly where I left off...

Tactical theorem provers such as HOL Light and Coq are the modern-day epitome of procedural logic, in some ways. They work by the simple lack-of-mechanism which I gave as the minimal procedural logic in the previous post: they are simply programmable theorem provers. "Tactics" (aka "strategies") are created, which are simply programs for guiding inference; tactics can use each other, so that in a weak sense they are like goals and subgoals. What's really nice about them is that they cannot be made to return an incorrect result, no matter how wild the tactic. What they lack compared to SOAR and PLANNER is:

  1. Reflexive inferences: SOAR automatically forward-chains for all rules in its KB. In a similar way, Prolog automatically backwards-chains for all programmed rules. PLANNER was designed to do both, with a declaration of which way a rule should be used upon its creation. (SOAR does something similar with a little work.) ACL2 probably has the most advanced facilities, from my cursory examination; there are many categories of possible reflexive inferences for different sorts of theorems, and the system has intelligent defaults which the user can override. However, the reflexive inferences are still clearly too simplistic sometimes; for example, the user can create an infinite loop just by proving both B = A and A = B... ACL2 will happily swap A for B indefinitely the next time either one appears in a statement.
  2. Truth Maintenance: Some facts are of the sort that can change. Under strict logic, we should index them by time to account for this, but it seems necessary for efficiency to ignore this and just change the facts directly. If that's the strategy, then we have a truth maintenance problem: we may need to update our entire KB to reflect the change, withdrawing any statements we proved based on the now-false information. In SOAR, this is reflected in the distinction between o-support and i-support: o-support represents that the rule is intended as making a change to the KB in firing, which cannot be undone except by another o-rule, whereas i-support represents that the rule's consequences should be undone immediately when the premises become false (thanks to an o-rule changing something). This is probably the most advanced form of this sort of distinction. PLANNER also addressed the issue, however. (That's why I called SOAR a "descendant of planner" in my previous post on this topic; since then, though, I've asked Paul Rosenbloom about it. He indicated that PLANNER did not actually influence SOAR in any significant way, so it looks like SOAR came up with the idea quite independently.)
So, the question becomes: what should a probabilistic relational system do about all this?

Paul Rosenbloom's probabilistic-relational rethink of SOAR (see his publication page) takes the position that #2 can emerge in a simple way from the sum-product belief propagation algorithm; basically, this amounts to re-estimating all the weights in working memory whenever a change is made by a rule in long-term memory. The system also deals with time in additional ways. As for #1, there is some divergence from SOAR's system, but nothing as sophisticated as ACL2. Paul Rosenbloom's proposal does not have a reason to focus on proofs via mathematical induction in the way ACL2 does, so it would not benefit from a translated version of ACL2's guidance system... perhaps it's reasonable to say that much of the sophisticated stuff in ACL2 is fairly special-purpose for its domain of application.

Having sum-product as a basic dynamic seems like a good idea, but at the same time, I'm thinking of a system which can reason about inference guidance even for that. For example, it seems reasonable to (A) weigh the product-sum algorithm's attention to specific areas of the belief network based on multiple factors, including which variables are currently undergoing the most change, and which variables are most critical for the goals; (B) augment the product-sum algorithm with more accurate algorithms in areas of the belief network where accurate estimation is critical. In the Rosenbloom system, (A) would have to be an addition to the basic level, but he has discussed (B) as something which might be implemented on top of the basic level using rules. The second situation is preferable, but at the same time, any "basic level" of reactive inference/decision will have some built-in assumptions. There is a way around this, however: dynamic replacement of inference strategies in the manner of Hutter's Algorithm. Hutter's algorithm is tremendously inefficient, so it would be nice to find a small-scale, incremental way of employing the same trick. (Also, Hutter's algorithm would need to add probabilistic reasoning about algorithm speed and correctness, to satisfy the desire for a fully probabilistic-relational system.)
    To summarise that disorganised paragraph: I like the idea of using sum-product as the reactive-level reasoner, but it may not be what I'm after for procedural logic. As a disclaimer: worrying about abstract asymptotic universality stuff is fun, but it is not necessarily relevant to practical implementations. Sometimes I am more of a mathematician than a computer scientist. However, there are some recent indications that the theory of universal intelligence might actually guide practical implementations: a working, practically applicable, asymptotically universal approximation of AIXI, and a Levin-search-based automatic programmer that works on some toy problems.

    In any case! The main point is, in probabilistic relational systems, truth maintenance gets replaced by some form of belief revision plus some way of dealing with time. (In OpenCog, I understand that part of the strategy will be to have the truth values of time-sensitive statements decay gradually as they become more probably out-of-date.)

    The other issue to address is how a goal system for a probabilistic-relational version should work, but, I guess that's too big a topic for the moment... though honestly I started writing this post to talk about that :) Such is blogging.