A normal axiomatic logic pins down which inferences are permissible, but makes no comment on how these inferences are to be made. For real implementations, we've come up with a plethora of inference strategies-- however, the axiomatic system is always treated as separate from the inference strategies. My idea of action logic is to merge the two, hopefully in order to do nifty things like letting the system reason about its own inference strategies in a smooth way.

In other words, what we want is to nicely integrate logic and programming.

Several attempts to do this already exist. One example is Prolog. Prolog does not, however, satisfy the desire to allow deductive reasoning to determine flow of inference; Prolog just performs a backtracking resolution algorithm on horn clauses, paying some heed to meta-logical control flow statements that can be inserted.

However, I think we can take some inspiration from a predecessor of Prolog, known as Planner. Planner worked through a backtracking mechanism similar to that of Prolog, but more expressive. It used a separate notion of "goal" and "fact" to drive reasoning: if a goal is present, then the system starts reasoning about how to achieve that goal. This can involve the creation of subgoals which drive more reasoning, the execution of programs, and performance of inference steps.

One

*could*argue that Prolog makes this distinction, but it is much less explicit: since programs and statements are the same thing, the creation of subgoals always just corresponds to the conditions in an if-then. (In PLANNER one can specify arbitrary advice for what subgoal to create at a given step; it is not so strongly tied to the form of the logical assertions in the knowledge base.)

PLANNER also has the feature that facts can be removed from the knowledge base when they become false, ie, the system does not need to index facts to time in order to model facts changing; this is important for efficiency, though it is less "logically pure" than the alternative.

This is a good overview of Planner for the interested. More recent systems such as SOAR, SCREAMER, ACL2, et cetera have many of the same features; I am really picking out PLANNER mainly for its historical significance.

In any case. I said that I was trying to

*merge*declarative and procedural, yet I am praising PLANNER's explicit goal system over Prolog's implicit way of dealing with goals. Why? Because it is more powerful: we can provide more inference guidance in PLANNER than in Prolog.

In any case, I am not saying that Planner is a sufficient action logic, either! In fact, I'd say, let's question the basic mechanism which it shares with Prolog: if the system has goal G, and knows a statement of the form "S implies G," then it will automatically make a subgoal S. (For the curious:

*Both PLANNER and Prolog will select out of several statements "X implies G", try that X, give up if that subgoal turns out infeasible, and try the next. This technique is applied recursively, creating sub-sub goals, et cetera. In Prolog this is the only mechanism at play; in PLANNER there are more conditions to be met for this to happen, there are other mechanisms at work, and the programmer can add arbitrary mechanisms.*)

What happens when we vary this mechanism? Well, there are a lot of options. We could create a reinforcement-type agent which attempted to learn what inferences to draw and what subgoals to create and when, getting rewarded whenever it meets a goal specified by the programmer. We could create a brute-force system which keeps proving theorems until it happens upon one of the form "P implies G" where P is a fully specified sequence of actions, rather than just a subgoal-- the system then does P. We could produce a system like Hutter's asymptotically optimal algorithm by creating a mechanism which searches for plans of action, replacing the current plan when the system finds a proof that switching to a different plan will result in a better outcome.

The weakest choice is to just require that if the goal "do X" is asserted, and X is a sequence of actions, the system does them immediately. This just means that the system is programmable. (X can be an arbitrary program because it can include branching goto type statements.) Any other choice of system can be modeled in this one by telling it to behave in that way.

Stronger choices will provide more built-in useful behavior, but will also have more built-in assumptions.

My desire is to create an abstract framework, not something that pins things down to a specific implementation. So, it seems like what I'm looking for is an abstract "should-ness" concerning what inferences to draw when, and what subgoals to make when.

Much of the logic discussed under the title of action logic in the Stanford Encyclopedia is somewhat relevant here, though the logics there are definitely not action logics in the sense I intend: they are just normal axiomatic systems, which must rely on outside inference procedures.

In general, my approach to AI is very much in the line of probabilistic relational methods: I advocate a move to probabilistic reasoning where just boolean reasoning is present, and to relational methods where just propositional methods are present. In this case, that means that what I see as the logical next step is to research probabilistic generalizations of PLANNER... it seems this would force many of the simplistic methods employed in PLANNER to be "broken open," so to speak, getting out the chaff and determining what can be salvaged and improved for the probabilistic setting.

What about game semantics and "general game playing" systems. Think of the chance of winning a subgame as an "action-truth value". But we need to include the computational effort in the formalization. For starters, think of "unknown environment" as an approximation to computation effort (for formalization sake), since we need to compute the "belief environment" before it becomes "known". We are left with "undiscounted relational RL". The established (with loss bounds results for bandits, and more recently, for undiscounted RL) approach is to follow what we would interpret as "promising inferences" (high upper confidence bound on estimated probability of winning). In other words, we need to extend the UCT MC tree search to relational structures.

ReplyDeleteAnother approach to think about is how to formalize "spreading activation", in a way that would join the control properties (perhaps a la ACT-R), and the probabilistic properties a la undirected graphical models.

[continued] The "spreading activation" is more nebulous but some form of it seems necessary. Some form of "growing network" system that derives priors for new nodes from values of other nodes weighted by context distance.

ReplyDeleteLukasz,

ReplyDeleteVery interesting. I blogged about upper confidence before, in relation to RL & bandit, but I am not familiar with the application to computational game theory. Any references would be appreciated.

I'm not sure I fully understand your suggestion of applying game semantics to the action logic problem, but it sounds like a very interesting idea, and one that jives with some thoughts I've been having... I've been thinking that it would make sense to apply something like predatory evolutionary search to probabilistic inference, ie, one subsystem learns to prove things with high confidence while another subsystem learns to refute those proofs. (Looking for high-confidence chains of reasoning is all well and good, but it could lead to systematic overconfidence, so it seems pertinent to try and refute the chains of reasoning.)

I think "we're on the same page". I don't think there is a "computational game theory" of the sort that I have in mind, yet, but I think "the time is ripe" (haven't googled it extensively though). For example, transfer of optimality bounds from the bandit case to the UCT case (as has been done for RL: http://jmlr.csail.mit.edu/papers/v11/jaksch10a.html), or "relational GT" analog of relational RL (basically, working harder on the GGP problem http://www.general-game-playing.de/literature.html).

ReplyDeleteMy (first paragraph) suggestion is trivial actually. Take game semantics for a logic, and replace quantification over two (the "existential" \exists and the "universal" \forall player) arbitrary strategies with a stochastic playing algorithm with known optimality guarantees. We can then talk about the probability that a sentence is true. Note that game playing algorithms I know of derive some value estimates -- we need is that they at least (besides being a game playing algorithms) are randomized algorithms for solving the decision problem "is the game winnable by the existential player". The "action logic" part follows from the game playing aspect of the algorithm -- i.e. it (1) computes a good (2) next move.

(1) it is stochastically determined what computations need to be made

(2) in case it is simpler to simulate external actions than to perform them, it is better not to use the computation in an online manner -- as in RL -- but to wait till we run out of computational resources (i.e. timeout) and then select the best move

I forgot that you need a logic to reason about what action should be taken, and that one would argue that just that just picking the action to be taken is not what logic is about! But perhaps I wasn't completely off-topic.

ReplyDeleteOf course the computation of an algorithm depends on its input. The logic can then be defined to explicitly represent some constructs targeted at specific aspects of the algorithm (the reflective part of the logic as opposed to the "object-level" part which is concerned with modeling whatever else).

ReplyDeleteLukasz,

ReplyDeleteThis is a really interesting idea.

Refreshing what little I know about game semantics by reading the Wikipedia article, I do not see (yet) how it can be a really practical way of evaluating normal statements. We could treat a history of games as a probability, but that would be a very inefficient way of proving some statements -- specifically those which can easily be proven classically. (If it was just based on a success/failure count, it would either converge far too slowly to the correct answer for the simple cases, or converge too quickly for the difficult ones... I'd want it to settle on probability 1 with complete confidence in just 1 play for some obvious cases, but for difficult cases, either converge to 1 much more slowly or set a very low confidence for its frequency of 1.) Perhaps some sort of dual architecture could cure this, but, the idea would have more appeal if it provided a unifying framework. Can you think of a way to cure this?

Maybe weigh the number of runs so far against an estimate of the state-space for the game associated with the statement? Or, perhaps, a more sophisticated estimate using the amount of "interesting" state-space remaining (based on a probabilistic generalization of alpha-beta pruning).

:P Ok, so I've said "this is interesting" 3 times now in different places. Sorry for the repetitive enthusiasm!

ReplyDeleteWhat do you imagine your "computational game theory" to consist of? That is, are you talking about a unifying framework for a bunch of game-theoretic algorithms, or something more?

ReplyDeleteI've never claimed a "computational game theory"... Just referring to your mention of that term. I'm not going to add to what I said already: I imagine results with bounds on regret of algorithms compared to the best fixed strategy, parametrized by properties of the game; and I imagine more formal results on GGP (or some other, perhaps simpler, formulation of games over relational structures). It doesn't mean that I plan working on them.

ReplyDelete[I should have said "properties of the game and computational effort"]

ReplyDeleteOk. I was using the term just to refer to game playing algorithms like alpha-beta pruning, UCT, et cetera.

ReplyDeleteReading up on UCT... I'm quite tempted to think of the step in which a random play to the end is employed as just a clever evaluation criteria for the board's state, as one would employ for leaf nodes in standard min-max search. Yet, at least in the glosses I'm reading, the values are not propagated upwards in the search tree using min-max... rather, the value of a node is just based on all the trials that have gone down that path. Perhaps I just haven't gone deep enough into the literature yet, since it seems like an obvious issue... probably it either doesn't work as well, or it works just fine and the summaries I'm reading just skip over it....

ReplyDeleteUsing minimax to replace (by propagation) the statistics (instead of "accumulating" them) would loose information. Instead, UCT stays faithful to the probabilistic interpretation of the statistics and uses minimax to shape the tree. But you can (try to) improve the UCT algo by storing the minimax value (updated at each iteration) in addition to the (win/lose) statistics and incorporate it "heuristically" in the upper bound expression.

ReplyDeleteOk... So, basically, the accumulation method results in generalization over the possible continuations of a state-- if a good number of bad continuations have been played, the system will assume that most continuations are bad, rather than keeping up its optimism.

ReplyDeleteI'm tempted to suggest the "proper bayesian" way of treating this generalization, which would preserve min-max reasoning, but that would probably be too computationally costly...

Thinking about this, I keep imagining it as the interaction of two "agents," the Advocate and the Critic, who try to refute each other's proofs and prove what *they* want about the original statement. This leads me to a thought which doesn't actually correspond to how UCT works: I think of the advocate as not just trying to figure out the best route to refute the critic in the current "game," but also as trying to learn how to find effective proofs in general; similarly, it seems like the critic should be learning how to refute things.

ReplyDelete(Actually, one would probably want them to both collaboratively learn the same policy, since both have to know how to prove what they want & refute what the other wants.)

This suggests a modification to UCT: rather than using a Bandit algorithm for choosing at each node in the search tree, a full RL algorithm would be integrated, which would attempt to learn a policy for choosing next moves. (For the game of Go, UCT's big current application, this policy would have to rely on partial, relational, and mainly local notions of "state" and "action" rather than full-board, full-detail state/action descriptions.) I'm imagining that the RL choice would be used as a heuristic to specify some of the interesting moves to explore at each leaf... the policy's estimate of the goodness of a move should be quickly drown out by the estimates based on actual continuations the system tries.

Another random modification comes to mind (based on the meandering thoughts in the blog post). it seems reasonable also for the system to set up and learn to achieve short-term subgoals, which it can independently learn the long-term benefits of... again thinking about Go, subgoals that are usually (but not always) beneficial might be "capture this block," "make an eye for this block," connect these blocks," et cetera. (These sorts of subgoals would relate back to the partial, local notions of "state" needed by the RL subsystem, too...)

UCT is originally an RL algorithm, see the original paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296 "Bandit based Monte-Carlo Planning" (2006). Perhaps the "seminar" work on UCT in Go is Sylvian Gelly's thesis (2007) http://www.lri.fr/~gelly/ where he tries using RL-learned policies with UCT.

ReplyDeleteThanks for the references!

ReplyDelete