<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-266928470727008321</id><updated>2012-01-29T16:49:46.366-08:00</updated><category term='education'/><category term='competence'/><category term='referential generality'/><category term='reflection'/><category term='impredicative'/><category term='democracy'/><category term='funk2'/><category term='statistical relational methods'/><category term='culture'/><category term='sociocracy'/><category term='impredicativity'/><category term='procedural logic'/><category term='relational methods'/><category term='dyna'/><category term='intuitionism'/><category term='FRP'/><category term='uniqueness types'/><category term='provability'/><category term='propositional methods'/><category term='io monad'/><category term='scientific method'/><category term='validity'/><category term='trigonometry'/><category term='2nd order logic'/><category term='probability'/><category term='clean'/><title type='text'>In Search of Logic</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>31</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-2673389129170694391</id><published>2012-01-24T14:51:00.000-08:00</published><updated>2012-01-24T23:23:27.197-08:00</updated><title type='text'>Why Logic?</title><content type='html'>Continuing in my &lt;a href="http://www.johnnywander.com/comics/173"&gt;what are you doing and why are you doing it&lt;/a&gt; &lt;a href="http://lo-tho.blogspot.com/p/sequences.html"&gt;series&lt;/a&gt;, I'll try to explain why I'm interested in logic.&lt;br /&gt;&lt;br /&gt;Previously I talked about &lt;a href="http://lo-tho.blogspot.com/2011/11/why-relational-methods.html"&gt;why relational methods are needed&lt;/a&gt;. However, the argument was really incomplete. My reason for arguing that relational methods are needed was expressive power. As I said:&lt;br /&gt;&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;span style="background-color: white; font-family: 'Trebuchet MS', Verdana, Arial, sans-serif; font-size: 13px; line-height: 18px; text-align: left;"&gt;My major recurring rant has been referential generality. The idea is that a general machine intelligence system should be able to represent any kind of pattern that a human can. This is motivated by the fact that it's impossible to learn a pattern that you can't represent in the first place.&lt;/span&gt;&lt;/blockquote&gt;&amp;nbsp;In other words, my original motivation was learning theory, but I realized that in order to learn any kind of pattern that humans can learn, a system needs to be able to represent any kind of pattern a human can. So, I went from learning theory to knowledge representation.&lt;br /&gt;&lt;br /&gt;However, there is already a very general and powerful theory which tells us what kind of patterns can be represented in general: the theory of computation. The space of patterns can be taken as the space of computer programs. &lt;a href="http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=2&amp;amp;ved=0CEMQFjAB&amp;amp;url=http%3A%2F%2Fwww.scholarpedia.org%2Farticle%2FAlgorithmic_information_theory&amp;amp;ei=xyUfT6boNuiW2gWczaGjDw&amp;amp;usg=AFQjCNGzv3AU0M0ANwyqu2IFF2_nB-NUIA&amp;amp;sig2=ixsiO83mGo0Bfhr43NoP0g"&gt;Algorithmic information theory&lt;/a&gt; develops this line of thought. &lt;a href="http://roland.pri.ee/doktor/papers/vetta/disSol.pdf"&gt;Solomonoff Induction&lt;/a&gt; is a theory of learning based on this, and&lt;a href="http://www.idsia.ch/~juergen/optimalsearch.html"&gt; Levin Search&lt;/a&gt; is a general problem solving method which takes advantage of this idea.&lt;br /&gt;&lt;br /&gt;These ideas are all very good in my opinion (perhaps I'll write some posts explaining them a bit). Yet, I've chosen to focus on logic. Why?&lt;br /&gt;&lt;br /&gt;There is a long-standing argument between &lt;a href="http://en.wikipedia.org/wiki/Procedural_knowledge"&gt;procedural&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Declarative_knowledge"&gt;declarative &lt;/a&gt;knowledge representation in the artificial intelligence field. Those on the procedural side believe that it is better to represent knowledge through computer programs which execute to do things, whereas those on the declarative side prefer logic and related formalisms.&lt;br /&gt;&lt;br /&gt;I think both are needed for different purposes. However, I am more interested in the declarative side. When declarative methods are applied, the results can be more understandable. Knowledge can be more easily carried over from one task to another. The system can be instructed declaratively, given constraints, and so on.&lt;br /&gt;&lt;br /&gt;All of those things are nice, but perhaps my "real" reason for preferring logic is that I want to capture human reasoning (at least, the successful/useful aspects of it). A procedural representation is somewhat opaque. I can't tell for sure what elements of human reasoning it captures. Most of the time, the answer is something like "If that behavior is useful, then it can learn to duplicate it in principle". That's not quite satisfying for me. For example, I am very interested in the foundations of mathematics. A procedural learner doesn't fundamentally care about mathematics. I can only say that if math is useful, it could learn to do it. With a logical representation, however, I can more explicitly see how the system could reason with specific mathematical constructs.&lt;br /&gt;&lt;br /&gt;Of course, procedural representations also have advantages. The behavior of the system can be easier to examine. The system can be instructed procedurally (which can be easier in some cases). Inference is typically more efficient, since inference is just execution. As I said, it is more appropriate in some cases.&lt;br /&gt;&lt;br /&gt;One response to this situation might be to argue that the two options are equivalent from a theoretical point of view, the differences being pragmatic. Logic can be thought of as a programming language, with the logic engine being the interpreter. Or, programs can be thought of as a sort of logic. An advocate of this view might point to &lt;a href="http://en.wikipedia.org/wiki/Prolog"&gt;Prolog&lt;/a&gt;, the logic programming language, or to the &lt;a href="http://en.wikipedia.org/wiki/Curry_Howard_isomorphism"&gt;Curry-Howard isomorphism&lt;/a&gt; with its "programs are proofs" motto.&lt;br /&gt;&lt;br /&gt;I believe this is misguided. &lt;a href="http://lmgtfy.com/?q=%22prolog+is+not+logic%22"&gt;Prolog is not logic&lt;/a&gt;. Programs are not proofs (or, rather, they are very boring proofs: the curry-howard isomorphism just says that writing a program proves that there exists a program of that type). Declarative and procedural knowledge have essentially different characters. One difference is that declarative knowledge justifies a wide variety of inferences. It does not tell you what order to make those inferences in. That's why the same knowledge can be used in many different ways. Another difference is that logical theories can be &lt;a href="http://www1.cuni.cz/~svejdar/papers/sv_ybk07.pdf"&gt;essentially incomplete&lt;/a&gt;, as Goedel proved. There is no analogous phenomenon for programs. &lt;i&gt;[A program can be syntactically incomplete (ie, unfinished, partly unwritten), but in that case there always exists a completion. A program can contain an infinite loop or otherwise fail to halt, in which case we call it a partial program; but in that case, there is no relevant notion of "completion" so we can't define essential incompleteness.]&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Therefore, I am interested in studying learning theory from the perspective of logic. It turns out that these learning systems are strictly more expressive, in ways related to the above observations, and without becoming uncomputable. (Logic can represent any computable theory which would be generated by procedural learning, but logic can also represent some theories which can't be precisely translated to procedural knowledge, despite inference on those theories being computable.) I wouldn't say this is a decisive victory for logic (it's not clear what practical advantage a logic-based learner would have over a procedural learner), but it's certainly interesting. I'm not done yet, though... I can explain some nice properties of human reasoning, but others still escape me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-2673389129170694391?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/2673389129170694391/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2012/01/why-logic.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/2673389129170694391'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/2673389129170694391'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2012/01/why-logic.html' title='Why Logic?'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-5591854473071394125</id><published>2011-11-16T18:25:00.000-08:00</published><updated>2011-11-16T18:25:01.662-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='propositional methods'/><category scheme='http://www.blogger.com/atom/ns#' term='relational methods'/><category scheme='http://www.blogger.com/atom/ns#' term='statistical relational methods'/><category scheme='http://www.blogger.com/atom/ns#' term='referential generality'/><title type='text'>Why Relational Methods?</title><content type='html'>It's been a little while since I've written anything here. While that's not unusual, it's also not ideal. I've actually been meaning to write &lt;i&gt;more&lt;/i&gt;, not less. Attending my first "real" conference, AGI-11, has shown me that I have a public image whether I mean to or not. When I'm not writing stuff here, I'm writing stuff on AI mailing lists (or, more recently, facebook groups). People who I respect know my name.&lt;br /&gt;&lt;br /&gt;If that's the case, I might as well &lt;i&gt;manage&lt;/i&gt; my image... at least a little.&lt;br /&gt;&lt;br /&gt;As such, it would be good for me to explain a bit about what I am doing and why I am doing it.&lt;br /&gt;&lt;br /&gt;My major recurring rant has been referential generality. The idea is that a general machine intelligence system should be able to represent any kind of pattern that a human can. This is motivated by the fact that it's impossible to learn a pattern that you can't represent in the first place.&lt;br /&gt;&lt;br /&gt;This has led me to grapple with some difficult issues in logic and the foundations of mathematics, since the &lt;a href="http://en.wikipedia.org/wiki/Tarski's_undefinability_theorem"&gt;undefinability theorem&lt;/a&gt; proves that there is no referentially complete system (in a specific sense). Unsurprisingly, I have not come up with a general response to this problem (though I think &lt;a href="http://as.nyu.edu/object/hartryfield.html"&gt;Hartry Field&lt;/a&gt; has some interesting ideas). However, I do have one message which I'd like to get across to as many practitioners as possible:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Relational methods are needed for general intelligence.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;(I'm sure I've written about this before, but I will aim to give a decent unified post on the subject here.)&lt;br /&gt;&lt;br /&gt;Now, I'm specifically talking about&lt;i&gt; probabilistic relational methods&lt;/i&gt;. I will not, however, be giving an argument for probability theory. (Perhaps I'll discuss that in another post.) Rather, I'll be concentrating on the question of whether relational methods are needed.&lt;br /&gt;&lt;br /&gt;Many people are beginning to understand this point whether I preach it or not. However, a large amount of research still goes on at the level of propositional methods. This isn't a bad thing (better propositional methods are very useful), but people often don't think about the issue and end up thinking that propositional methodology can lead to human-level intelligence.&lt;br /&gt;&lt;br /&gt;So: what is the difference?&lt;br /&gt;&lt;br /&gt;Propositional methods are so-called by analogy to propositional logic. Propositional methods deal with a fixed, finite number of variables, looking for predictive relationships like correlations (or even causation) between these. Variables are "basic"; they have no further break-down. It is not possible to learn a correlational structure for some variables and then generalize that to other variables, because there is no information to suggest which variables might be analogous.&lt;br /&gt;&lt;br /&gt;Relational methods use more information about the structure of a problem. The details vary from formalism to formalism, but variables result from &lt;i&gt;instantiation.&lt;/i&gt;&amp;nbsp;This might be the instantiation of a class of objects (in object-oriented thinking), or an instantiation of a predicate or relation (in first-order language). In either case, we see that variables are now "about something": we have &lt;i&gt;entities&lt;/i&gt; which we are &lt;i&gt;describing&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;This distinction can be surprisingly difficult to get across at times. Advocates of propositional learning methods will typically say that the difference is "We are learning the relations, rather than giving them ahead of time". This confusion results from using the word "relation" to describe multiple things. Logically, we can split a domain into very distinct levels:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Entities&lt;/li&gt;&lt;li&gt;Entity Properties&lt;/li&gt;&lt;li&gt;Probabilistic Generalizations about Properties&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Entity properties include relationships of one entity to another. This is &lt;i&gt;very&lt;/i&gt; different from a probabilistic relationship, no matter how complicated. Trying to form relations from correlations is a level-confusion.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This mistake is rarely made with crisp propositional logic (Boolean logic). The limitation there is very clear. Chess is an &lt;a href="http://people.csail.mit.edu/milch/papers/nips05-tutorial/nips05-tutorial.pdf"&gt;often-cited example&lt;/a&gt;. Stated in propositional logic, the rules of chess would run thousands of pages; however, they can be stated in first-order logic in just a few pages. Why? Because in first-order logic, we can generalize the rules based on the structure of the game. In propositional logic, we must re-state rules for each board location and each round of the game.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For some reason, many people lose sight of the simple limitations of propositional methods when they begin to do more complicated things such as probabilistic reasoning.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, we can see one reason quite easily. I said that the rules of chess must be re-stated for each round of the game; ie, they do not generalize over time. However, it is quite standard to generalize over time with propositional methods in machine intelligence. Only a fool would not do so! Indeed, if I point out &lt;i&gt;&lt;b&gt;any&lt;/b&gt;&lt;/i&gt; one dimension which an algorithm fails to generalize over, it's generally fairly easy to add that dimension to the system's generalization capability. As such, it's not hard to miss the general pattern: a propositional method needs to be fed the dimensions in which it's useful to generalize up front. It can't see the possible analogies by itself.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(One good way to identify the representational power of a system is to ask what kinds of analogies it is capable of making.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Because the key idea of relational methods is structured regularity, it also seems to be easy for people to confuse various hierarchical methods for relational power. Deep belief networks, hierarchical temporal memory, and other similar methods identify more and more complicated sorts of patterns as we move up the hierarchy. (This is similar to the way that cortical columns have been observed to identify increasingly abstract patterns as we get further from the v1.) However, in the analogy to logic, this is like building up larger and larger propositional sentences. At no point does the system become capable of structured generalizations.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An astute reader may point out that a universal quantifier can be viewed as just a large conjunction, and an existential quantifier as a large disjunction. So long as the number of entities is finite, first-order logic &lt;i&gt;really can&lt;/i&gt; be viewed as just propositional logic. This is true. However, learning in this form will take exponentially more data than learning via generalizations. Obviously, this approach forces us to separately re-learn every case of the general rule. We could only see it as creating a "generalization" after the fact; it is unable to use this generalization to speed learning.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Along similar lines, the advocate of propositional methods may argue that humans must be essentially propositional, as we have a finite number of inputs and outputs. (This is related to arguing that the brain is obviously a finite-state machine.) Again, this kind of model does not seem to be a good account of the sorts of patterns people can actually learn to spot (and how quickly they can learn to spot them). Unfortunately, this would be a very long argument to try and spell out in detail. One example is the way people can perform on IQ tests. To my knowledge, the only way this performance can be replicated by computers is via analogical reasoning, which is a deeply relational approach.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-5591854473071394125?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/5591854473071394125/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/11/why-relational-methods.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/5591854473071394125'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/5591854473071394125'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/11/why-relational-methods.html' title='Why Relational Methods?'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-8695678855389309439</id><published>2011-09-04T14:15:00.000-07:00</published><updated>2011-09-04T14:16:53.992-07:00</updated><title type='text'>Useful Meta-Analysis</title><content type='html'>Continuing with procedural logic. I've created a &lt;a href="http://lo-tho.blogspot.com/p/sequences.html"&gt;sequences&lt;/a&gt; page now, with a summary of the procedural logic sequence.&lt;br /&gt;&lt;br /&gt;I recently finished reading the Sutton &amp;amp; Barto book on reinforcement learning. In the last part of that book, they discuss a general framework which is supposed to capture many of the different idea which have been presented. Specifically, it gives a view in which dynamic programming, Q-learning, SARSA, classical planning, monte carlo methods, and a few other things are all treated as value-propagation methods. The different propagation schemes differ, both in the equations of propagation and the scheduling of propagation, but the common structure is there.&lt;br /&gt;&lt;br /&gt;This generally fits with the Dyna-style systems I've been thinking about, though the details are fuzzy.&lt;br /&gt;&lt;br /&gt;Combined with the amount of thinking I've been doing about systems that use belief propagation as their major reasoning mechanism (which I should write more about...), this has me fairly deep into propagation-based reasoning systems.&lt;br /&gt;&lt;br /&gt;We've also been talking about dynamic programming in my advanced algorithms course, as it happens. We went over &lt;a href="http://en.wikipedia.org/wiki/Matrix_chain_multiplication"&gt;a common example based on matrix multiplication&lt;/a&gt; (which I'd seen before, but not thought about recently). This example is particularly interesting, because the dynamic programming is being used for &lt;i&gt;inference guidance&lt;/i&gt;, not for inference: we use dynamic programming to find the fastest value-propagation ordering, to great advantage. This leads to the idea of &lt;i&gt;using dynamic programming to determine the schedule for dynamic programming&lt;/i&gt;. This connects back to my original theme of procedural logic: logic which determines its own inference procedures.&lt;br /&gt;&lt;br /&gt;The issue with this is, we need some simpler inference guidance to get off the ground. If we wait for informed priority estimates before we do any inference, but rely on inference to get priority estimates, we'll go nowhere. (I referred to this as the reflection problem in the previous post in this sequence.) Furthermore, we have to balance between meta-inference and inference. It's tempting to give meta-inference a higher priority in general, because we want to do it first; but in a general setting, there will be too many possible meta-inferences, so we'd never get anything done if we gave them priority.&lt;br /&gt;&lt;br /&gt;Since many of the propagations are just propagating real numbers (particularly in belief propagation and reinforcement learning for &lt;i&gt;discrete&lt;/i&gt;&amp;nbsp;domains), a simpler idea is to guide inference simply by giving priority to propagations which are tracking larger changes. This is referred to as prioritized sweeping in Sutton &amp;amp; Barto, but surprisingly not dealt with very much. I also haven't seen much about the effect of such a scheme on the efficiency of belief propagation. It's probably out there, though.&lt;br /&gt;&lt;br /&gt;It would be great to test the effects of different (combinations of) prioritization methods to see what actually worked. Hopefully I get to that point.&lt;br /&gt;&lt;br /&gt;Today I had a discussion with Laurent Orseau about these ideas. He has been thinking about very similar things-- a system based on a priority queue and a hash table, like the Dyna implementation I've been slowly working on. (Actually, I'm not even worrying about a hash table at the moment, but building something that looks more like a Rete network's alpha memory; when I need to optimize it, I'll go for &lt;a href="http://en.wikipedia.org/wiki/Term_indexing"&gt;term indexing&lt;/a&gt;, thanks to some advice from Russel Wallace.) He has some ideas about inference control as well (perhaps more unified than mine), which fit in with the theme of using dynamic programming to determine the optimal inference. Perhaps we will make some progress.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-8695678855389309439?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/8695678855389309439/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/09/useful-meta-analysis.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8695678855389309439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8695678855389309439'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/09/useful-meta-analysis.html' title='Useful Meta-Analysis'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-4083952353264150376</id><published>2011-08-01T07:18:00.000-07:00</published><updated>2011-08-01T07:18:44.780-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='clean'/><category scheme='http://www.blogger.com/atom/ns#' term='uniqueness types'/><category scheme='http://www.blogger.com/atom/ns#' term='io monad'/><title type='text'>Corrected- IO Monad</title><content type='html'>&lt;span class="Apple-style-span" style="background-color: white;"&gt;This is a correction to the addendum of the &lt;a href="http://lo-tho.blogspot.com/2011/07/procedural-logic-answers.html"&gt;previous post&lt;/a&gt;, where I claimed that Clean's way of handling IO was flat-out better than Haskell's. A quote from William Tyson (to a mailing list):&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="background-color: white;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="background-color: white; border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px;"&gt;At first&amp;nbsp;&lt;span class="il" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; color: #222222;"&gt;Haskell&lt;/span&gt;&amp;nbsp;did try world passing. &amp;nbsp;Monads hide the world passing. &amp;nbsp;Compare the following echo procedures which reads a line and prints it character by character.&lt;br /&gt;&lt;br /&gt;==&amp;nbsp;&lt;span class="il" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; color: #222222;"&gt;Clean&lt;/span&gt;&amp;nbsp;Echo ==&lt;br /&gt;&lt;br /&gt;echo :: *World -&amp;gt; *World&lt;br /&gt;echo world&lt;br /&gt;| c = '\n' &amp;nbsp; &amp;nbsp; &amp;nbsp;= world2&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;= echo world2&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;where&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(c, world1) =: getChar world&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;world2 =: putChar c world1&lt;br /&gt;&lt;br /&gt;==&amp;nbsp;&lt;span class="il" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; color: #222222;"&gt;Haskell&lt;/span&gt;&amp;nbsp;Echo ==&lt;br /&gt;&lt;br /&gt;echo :: IO ()&lt;br /&gt;echo = do&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;c &amp;lt;- getChar&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;putChar c&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;unless (c == '\n') echo&lt;br /&gt;&lt;br /&gt;==&lt;br /&gt;&lt;br /&gt;In&amp;nbsp;&lt;span class="il" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; color: #222222;"&gt;Clean&lt;/span&gt;, each IO operation gives you a new world which you need to explicitly pass around. &amp;nbsp;Uniqueness typing ensures that you don't use the same world twice. &amp;nbsp;In&amp;nbsp;&lt;span class="il" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; color: #222222;"&gt;Haskell&lt;/span&gt;, the world is implicit.&lt;/span&gt;&lt;/blockquote&gt;In other words, the monad is making the syntax nicer, but basically the same thing is going on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-4083952353264150376?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/4083952353264150376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/08/corrected-io-monad.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4083952353264150376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4083952353264150376'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/08/corrected-io-monad.html' title='Corrected- IO Monad'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-6111523891326108981</id><published>2011-07-10T21:33:00.000-07:00</published><updated>2011-07-10T21:33:41.199-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dyna'/><category scheme='http://www.blogger.com/atom/ns#' term='procedural logic'/><category scheme='http://www.blogger.com/atom/ns#' term='FRP'/><title type='text'>Procedural Logic Answers</title><content type='html'>After some discussion with Russel Wallace, I've come to a few conclusions about my procedural logic project.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://lukepalmer.wordpress.com/2009/12/20/reflections-on-a-holy-grail-frp/"&gt;FRP&lt;/a&gt; provides an in-principle answer: there is no &lt;i&gt;essential&lt;/i&gt; problem with describing behavior-in-time in a logically pure way. You just assert some things about how a system behaves through time. There are certainly questions about the details, but the problem of describing actions with logic is not as nebulous as I was making it. It's computer science, not artificial intelligence.&lt;/li&gt;&lt;li&gt;The problem which &lt;i&gt;is&lt;/i&gt; nebulous is the "reflection problem" that I mentioned in the previous post. This is the problem of making a system which dynamically reasons about and improves its own implementation. It's possible to reduce many aspects of this to computer science as well, but ultimately it is an "AI kind of thing" to worry about.&lt;/li&gt;&lt;li&gt;The main technique for making #2 more tractable is to "stage" the reflection, separating an action and reflection on that action. For example, the kind of dynamic priority assessment I was imagining (allowing reasoning about the priority of a task between the time when the task is added to the queue and its execution) is just not a great idea. It's better to reason about the priority of rules.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;When it's possible to reduce something to a computer science problem rather than an AI problem, the results become more generally applicable. I mean two things by this. First, it's a computer science problem when it doesn't depend on any standpoint concerning what intelligence is, best approaches to achieving AI, et cetera. It cuts out philosophical stuff and just focuses on a practical problem, with no need to justify itself via claims about building a thinking machine within 20 years. Second, it generally falls within the polynomial-time asymptotic complexity class, rather than exponential-time. Exponential time problems are roughly where the need for methods we refer to as "AI" start, because we need to rely on heuristics for approximate problem-solving. Obviously there is an advantage if we can manage to work on poly-time problems rather than exp-time problems.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Whereas FRP-like solutions to the procedural logic question involve programs which only refer to their external behavior, and so can be implemented in whatever way is convenient, reflective procedural logic would refer to its implementation details. It's possible that it could still maintain a certain amount of purity, but it's difficult to say.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm working on a bare-bones implementation of Dyna, with the goal of seeing how much reflection I can conveniently add. I also have a few other goals for the project (but these are not high priorities):&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Integrate functional and logical programming, as with &lt;a href="http://en.wikipedia.org/wiki/Curry_(programming_language)"&gt;Curry&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Mercury_(programming_language)"&gt;Mercury&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Integrate the &lt;a href="http://www-staff.it.uts.edu.au/~cbj/patterns/"&gt;pattern calculus&lt;/a&gt;, a generalization of lambda calculus for pattern matching&lt;/li&gt;&lt;li&gt;Use a type system modeled after the one &lt;a href="http://en.wikipedia.org/wiki/Qi_(programming_language)"&gt;Qi&lt;/a&gt; uses&lt;/li&gt;&lt;li&gt;Try FRP&lt;/li&gt;&lt;li&gt;Try uniqueness types&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;We'll see how far I get. :)&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-6111523891326108981?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/6111523891326108981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/07/procedural-logic-answers.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/6111523891326108981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/6111523891326108981'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/07/procedural-logic-answers.html' title='Procedural Logic Answers'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-2407183424468889630</id><published>2011-07-04T00:09:00.000-07:00</published><updated>2011-07-04T11:09:23.395-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reflection'/><category scheme='http://www.blogger.com/atom/ns#' term='dyna'/><category scheme='http://www.blogger.com/atom/ns#' term='procedural logic'/><category scheme='http://www.blogger.com/atom/ns#' term='funk2'/><title type='text'>Procedural Logic Once Again</title><content type='html'>&lt;div&gt;Follow up to &lt;a href="http://lo-tho.blogspot.com/2010/06/procedural-logic.html"&gt;Procedural Logic&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;This is one of those things I've been meaning to write about for a while now.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Sometimes early this year, I had the thought that some form of "agenda" or priority queue is lurking inside a very large number of AI algorithms. What happens if we push the idea of making priority queues the heart of AI systems? Can I make a generally useful AI toolkit by creating an optimized priority queue with some generally useful built-in features? (This would be similar to the way &lt;a href="http://sitemaker.umich.edu/soar/home"&gt;SOAR&lt;/a&gt; is "just" a highly optimized rule engine with useful extra features.) This was to be my answer to the "Procedural Logic" question: a principled way to handle thinking-through-time (and the acting/thinking trade-off) by making use of scheduling at the basic level.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If I was writing this post back then, I would have had a big argument justifying this idea. The main thrust of the argument would be that priorities can be thought of as a procedural form of expected utilities; IE, they should be an immediate estimate of the value of performing that action. The special features of the system would center around improving that estimate, through methods like reinforcement learning. Via an analogy to the copycat system, I was referring to each task as a "codelet".&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When I told this idea to Russel Wallace, he immediately spotted the problem (though it took me a while to appreciate the insight). Such a system will run into the "problem of reflection" (representing and reasoning about one's own representations and reasoning) head-on, with no direct solution. We can't easily use the task system to think about what priorities should be assigned to tasks without risking adding infinitely many tasks to the queue (if we try to have an "estimate the priority of this" task for each task).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We &lt;i&gt;can&lt;/i&gt;&amp;nbsp;get around that if we are careful, but we have to make compromises. Even if we settle on a good trade-off, the credit assignment problem is far from easy; we can't just apply standard RL. To get a good idea of what is going on, it seems as if we have to keep a record of dependencies (what tasks pushed what other tasks into the queue) within the system. This is something SOAR does in order to implement its chunking mechanism, so it's possible.&amp;nbsp;Chunking-like behavior was another feature I was considering: if several codelets often run in sequence, compile their actions into one codelet so that it runs faster. This would actually be very similar to JIT compilation of bytecode, so my hope was to use algorithms which had been developed for that.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A system more heavily built upon the concept of activity tracing is&amp;nbsp;&lt;a href="http://funk2.org/about/"&gt;Funk2&lt;/a&gt;. The goal of Funk2 is exactly the one I mentioned: solving the credit assignment problem by tracing the cause-and-effect relationships within the code.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, perhaps there is some hope for the idea. At every corner, though, just when Solution X looks like it will make the idea work, I start wanting to make "Smart X"; ie, X which learns over time thanks to making more general use of the system, and in particular, a version of X which makes use of the task queue. This is, of course, Bad Thinking. There really does need to be a bottom somewhere, even if as a programmer I'm used to the power of recursive decomposition of problems. This takes some of the magic out of the idea, forcing concrete design decisions which may not be as general-purpose as my original vision.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, all told, I reached a point where the idea was set aside.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then I found&amp;nbsp;&lt;a href="http://cs.jhu.edu/~jason/tmp/32969-datalog20-paper.pdf"&gt;Dyna&lt;/a&gt;. The basic idea of Dyna sounded at first outlandish: an AI programming language which manages to &lt;a href="http://www.cs.jhu.edu/~jason/papers/eisner.nipsw08.pdf"&gt;make no assumptions&lt;/a&gt; about the style of AI being implemented (ie, it "takes no sides" concerning the different debates going on within AI) yet makes it possible to implement many standard AI algorithms in just a few lines. Furthermore, the concept of the language is very simple! It's not obvious at all that this should be possible; usually, you've got to make assumptions to gain anything. Yet, there it is: simple, (mostly) intuitive descriptions of textbook algorithms, often in less than 10 short lines of code. And, as if it was designed just to make me happy, it's got a priority queue at the core of its implementation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The basic idea behind Dyna, in case you don't want to just dive into the links, is basically just to take rule-based programming / logic programming and completely scrap the idea that the rules have to be passing anything like truth values. A "rule" is just an expression which determines how some sort of value should be propagated. The value can be anything you like; it can be a true/false, or a probability, but it can also be a neural-network activation or an instance count or even a complicated data structure. In this way, it is very similar to probabilistic programming languages like &lt;a href="https://lirias.kuleuven.be/bitstream/123456789/147433/1/05-book-srl-chapter.pdf"&gt;BLP&lt;/a&gt;, but drops the "probabilistic" part. (Dyna also relies fundamentally on a concept of aggregation which is very similar to the concept of "combination" in BLP, and also similar to "summary" in the summary-product alg and yet also captures the idea of "aggregate CPD" in statistical-relational modeling... but I'll let you read the papers.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Dyna, however, is far from a "procedural logic": the philosophy of Dyna is to strictly separate the abstract program specification from the implementation level. The Dyna programmer specifies the logical (or "equational") relationships between different values, but it's up to the Dyna compiler to decide how to implement these effectively.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As with any "Solution X", the obvious thing to want next is "smart X"; and, indeed, the papers refer repeatedly to the idea that Dyna will eventually use machine learning to figure out what implementation strategies work best for which kinds of programs. My reaction to this is that it seems a tad hopeful to think that ML can figure out what the human programmer has difficulty with. It seems safer to just say that we should buckle down and look for good rules for the compiler to use to decide how to implement particular pieces of code, perhaps tuning the occasional threshhold via experimentation...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;...but what I &lt;i&gt;&lt;b&gt;really&lt;/b&gt;&lt;/i&gt;&amp;nbsp;want to say is that Dyna should not have that strict divide between specification and implementation at all; instead, it should be build from the beginning with the reflection capabilities necessary for it to control its own execution strategies via run-time priority manipulation, and so on. This is only bolstered by the fact that Dyna was originally designed with dynamic programming in mind, and RL is a dynamic programming problem!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But we know where all of this leads. Right into the reflection problem. [Update: Ah, actually, they do plan on allowing this sort of thing! They call it "voodoo computation".]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, all told, I've made very little progress on the whole "procedural logic" idea. However, I do think Dyna is very cool; it brings together a lot of ideas of textbook AI in a small and elegant space, perhaps revealing that there was more to them than we thought when we were in undegrad AI class. :)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;PS:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One more semi-relevant idea. &lt;a href="http://en.wikipedia.org/wiki/Clean_(programming_language)"&gt;Clean&lt;/a&gt;&amp;nbsp;is a programming language which is very similar to Haskell, slightly older, slightly less elegant, but with a much better way of thinking about IO (it seems). It allows side effects without compromising referential transparency by introducing a &lt;a href="http://en.wikipedia.org/wiki/Uniqueness_type"&gt;uniqueness type&lt;/a&gt; which allows only one reference to values which may change due to side-effects (so that we can't be surprised when the value changes). This seems, well, cleaner than the IO monad (though I'm no expert on IO in Haskell); a good way of interfacing the logical with the procedural, but no solution to generating nanosecond-level goal-oriented behavior.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Dyna is actually a "pure" language, no side-effects allowed... though values can change over time in several ways!)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-2407183424468889630?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/2407183424468889630/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/07/procedural-logic-once-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/2407183424468889630'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/2407183424468889630'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/07/procedural-logic-once-again.html' title='Procedural Logic Once Again'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-4993873044771525934</id><published>2011-06-11T11:49:00.000-07:00</published><updated>2011-06-11T11:49:04.676-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific method'/><category scheme='http://www.blogger.com/atom/ns#' term='culture'/><category scheme='http://www.blogger.com/atom/ns#' term='education'/><category scheme='http://www.blogger.com/atom/ns#' term='democracy'/><category scheme='http://www.blogger.com/atom/ns#' term='competence'/><category scheme='http://www.blogger.com/atom/ns#' term='sociocracy'/><title type='text'></title><content type='html'>I recently encountered &lt;a href="http://squid314.livejournal.com/297579.html"&gt;this article&lt;/a&gt; (shared by &lt;a href="https://profiles.google.com/gwern0/about"&gt;Gwern&lt;/a&gt;). The following is a riff off of that, which appeared partially as a buzz comment as well.&lt;br /&gt;&lt;br /&gt;Education is very important.&lt;br /&gt;&lt;br /&gt;Culture is very important.&lt;br /&gt;&lt;br /&gt;The culture of not admitting mistakes which is described is quite different from here, but at the same time, an uncomfortably familiar facet of human nature. Look how bad it can be!&lt;br /&gt;&lt;br /&gt;In America, it is more often simply keeping silent rather than admitting something which might make you sound stupid. That is also a mistake, _especially_ when it's a matter of being afraid to ask stupid questions. We've all been there, right? No one wants to sound uninformed. The problem is that the best way to _get_ informed is to ask these questions.&lt;br /&gt;&lt;br /&gt;While we may understand this on a conceptual level, it is hard to get that "small amount of courage" needed to ask the critical questions immediately when we have the chance. Perhaps it is because it's an immediate cost for a delayed reward, and we are not very good at thinking of the delayed reward. Here is a trick which may help. So you are worried about making a good impression, right? By keeping silent when you don't understand, you don't make any impression on that issue. By asking questions, you give the impression that you are really interested. If you ask enough questions to really understand, you make the impression of being able to pick things up quickly. By practicing the art of asking good questions, you stand a better chance of making an impression as a good critical thinker.&lt;br /&gt;&lt;br /&gt;What we need is a culture which encourages admitting mistakes and inadequacies as soon as possible. Specifically, we need to encourage the idea that the best way to *be* that competent, informed person is to seek feedback and correct yourself whenever you fall short. This also gets into the space of what we do when we argue. We bring in all the support we can to make our side right. Instead, we should be looking at the evidence to choose the right side. That is discussed in &lt;a href="http://youarenotsosmart.com/2011/06/10/the-backfire-effect/"&gt;this article&lt;/a&gt;, also shared recently by Gwern. (It's also discussed extensively on Less Wrong.) How do we engineer a culture in which real discussion, from which both sides can learn, is encouraged? Here are &lt;a href="http://www.sociocracy.info/applying-sociocrac-as-an-individual/"&gt;a few tips&lt;/a&gt; (this time shared by Luke Palmer). These come from a system of government called Sociocracy.&lt;br /&gt;&lt;br /&gt;I'm not buying into the whole of Sociocracy (and neither is Luke), but these tips for how to run small groups seem immediately useful. The basic idea is that our win/lose argument mindset is partly due to the pervasive culture of democracy. What do we do to resolve a dispute in a small group? We vote, and the winning side is considered correct. Sociocracy suggests an alternative: ask for everyone's objections, and don't make a decision until all the objections have been dealt with. This builds a better decision. The phrase "science is not a democracy" comes to mind. Science instead relies on the merit of arguments to convince individuals of the correctness of views. This is the idea behind sociocracy. (I do feel voting is still a good way of making decisions when unified decisions are needed and consensus can't be had or would be too expensive, though.)&lt;br /&gt;&lt;br /&gt;So how do we build a more effective culture? By seeking out good ideas like this, and sharing them. (Ultimately, Luke's idea of &lt;a href="http://lukepalmer.wordpress.com/2011/05/14/sociocracy-game/"&gt;experimenting&lt;/a&gt; with new cultures to see what works is also good.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-4993873044771525934?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/4993873044771525934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/06/i-recently-encountered-this-article.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4993873044771525934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4993873044771525934'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/06/i-recently-encountered-this-article.html' title=''/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-1827697501082695137</id><published>2011-05-08T22:19:00.000-07:00</published><updated>2011-05-08T22:19:55.313-07:00</updated><title type='text'>Current Probability Ideas</title><content type='html'>Follow-up to &lt;a href="http://lo-tho.blogspot.com/2011/02/probabilities-over-subjective-worlds.html"&gt;this&lt;/a&gt; post (and the series preceding it). I think that while my distinction between subjective and objective probability makes sense, it's not the right direction for answering the questions I'm asking-- it overcomplicates the issues. That change of view is based on &lt;a href="http://www.springerlink.com/content/b46877525x984034/"&gt;a&lt;/a&gt; &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.86.9151&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;few&lt;/a&gt; papers, which I still need to read in more detail. (IE, this post does not summarise insights from those papers, it summarises my reaction from discovering their existence.)&lt;br /&gt;&lt;br /&gt;Aside: my previous post, this one, and perhaps the next one or two do not represent me "thinking aloud" as my posts usually do... instead, I'm summarising thoughts that I had over the semester, but felt too busy to blog about.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;So. How well can we do with just a single layer of probability, over a standard logic? How can we achieve something similar to &lt;a href="http://www.scholarpedia.org/article/Algorithmic_information_theory"&gt;Solomonoff induction&lt;/a&gt;?&lt;br /&gt;&lt;br /&gt;One idea is to imagine that the universe we are observing was constructed by randomly writing down sentences in logic to determine each fact, only taking care not to ever write something which contradicts what we've already written. For example, we can&amp;nbsp; use a length-based prior to choose a sentence at random from first-order logic.&amp;nbsp; This prior obviously &lt;i&gt;contains&lt;/i&gt; the Solomonoff prior ("is dominant over" as they say), but it's not clear to me whether the relationship goes the other way ("is dominated by") so that they are roughly equivalent. I'm thinking the prior I'm specifying is "even more uncomputable," since we have to check for consistency of theories.&lt;br /&gt;&lt;br /&gt;This prior lends itself to an interesting approximation scheme, though.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Each sentence has at least log-probability inversely proportional to its length&lt;/li&gt;&lt;li&gt;To this, we can add some probability mass for each consistent sequence of sentences which we find that implies the sentence. (To test for "consistent" we just try to find a contradiction for some reasonable amount of time, perhaps trying harder later if it's important.)&lt;/li&gt;&lt;li&gt;Normalise this probability by dividing by (1 - probability we encounter an inconsistent sequence drawing randomly), which is estimated based on the cumulative probability of the inconsistent sequences we've identified so far.&lt;/li&gt;&lt;li&gt;This gives a rough lower bound for the probability. To get an upper bound, we apply the same technique to the negation of the sentence.&lt;/li&gt;&lt;li&gt;If A implies B, then B is at least as probable as A (ie, it can inherit A's lower bound). Likewise, A is at most as probable as B (inheriting B's upper bound). In other words, we can use existing ideas about propagating probability intervals to work with these estimates if we like.&lt;/li&gt;&lt;li&gt;To update on evidence, we use compatibility with that evidence as an extra criteria for consistency of sequences of statements. Intuitively, this will throw out a bunch of possibilities (and increase the amount of normalisation by doing that, which increases the probability of the remaining possibilities).&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Unfortunately, these bounds would tend to be very wide, I think. The system would have to do a lot of thinking to narrow them down. More generous approximations might find one model and stick to it until it's disproved or a better model is found, say. This would at least give the system something to work with.&lt;br /&gt;&lt;br /&gt;Now that we have a little bit of foundation in place, I think we can more usefully consider what to do if we want higher-order probabilities. Just as we can randomly choose sentences of first-order logic, we can randomly choose sentences of any system of probabilistic models we like (such as &lt;a href="https://lirias.kuleuven.be/bitstream/.../1/05-book-srl-chapter.pdf"&gt;BLP&lt;/a&gt;, &lt;a href="http://people.csail.mit.edu/milch/blog/index.html"&gt;BLog&lt;/a&gt;, etc). This allows us to use the inference algorithms, et cetera from our favourite formalism. As long as first-order logic is a special case of the formalism, it will be just as universal.&lt;br /&gt;&lt;br /&gt;For example, we could perfectly well use a prior based on random theories stated with the (objective-)probability-theoretic quantification which I was discussing in the previous posts. If we do this, we basically solve the puzzles I was running up against there: the inference from objective probabilities over classes to subjective probabilities over individuals is justified by the regularity assumption in the prior, IE, by the idea that the objective probability might be an "essential feature" of the class... something that was written down in the book of rules for the universe &lt;i&gt;before&lt;/i&gt; the individual cases were decided.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-1827697501082695137?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/1827697501082695137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/05/current-probability-ideas.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1827697501082695137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1827697501082695137'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/05/current-probability-ideas.html' title='Current Probability Ideas'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-7547265020069155537</id><published>2011-05-07T10:57:00.000-07:00</published><updated>2011-05-07T10:59:47.638-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='impredicativity'/><category scheme='http://www.blogger.com/atom/ns#' term='validity'/><category scheme='http://www.blogger.com/atom/ns#' term='impredicative'/><category scheme='http://www.blogger.com/atom/ns#' term='provability'/><category scheme='http://www.blogger.com/atom/ns#' term='2nd order logic'/><category scheme='http://www.blogger.com/atom/ns#' term='intuitionism'/><title type='text'>Impredicativity</title><content type='html'>Lukasz Stafiniak sent me &lt;a href="http://www.cse.chalmers.se/%7Ecoquand/NARA.pdf"&gt;this&lt;/a&gt; about impredicativity. I had seen the concern before, but never seen why it was worth any worry. Impredicative comprehension principles are a big convenience for normal mathematics, and don't lead to any paradoxes that we know of.&lt;br /&gt;&lt;br /&gt;However, looking at it again, I realised that the basic complaint could be read as "impredicative comprehension is not valid in Kripke's theory of truth." This is more concerning to me. In fact, the concern can be stated just in terms of the Tarski hierarchy: 2nd-order logic does not exist at any stage in the Tarski hierarchy. I was foolishly thinking that there was a direct correspondence between stages in the hierarchy and orders of logic. There is, if we use predicative comprehension! But, impredicative comprehension is "too powerful" to exist at any stage of any recursive hierarchy.&lt;br /&gt;&lt;br /&gt;This causes trouble for me, because impredicative comprehension is necessary to my understanding of the natural numbers (and other important things).&lt;br /&gt;&lt;br /&gt;Fortunately, the article suggests a solution: impredicative quantifications should be understood at the level of provability. IE, we aren't stating that all instances are true; we are stating something stronger, that they are all provable. (But we use infinitary logic, so that this is not as much of a restriction.) This is a very intuitionistic suggestion.&lt;br /&gt;&lt;br /&gt;Unfortunately, we can see that this really just introduces another metalanguage (the language of provability). Hence, comprehension interpreted this way does not include all possible predicates after all (even if we extend the provability interpretation to the point where it covers all cases of 2nd-order comprehension): it skips the predicates which can be defined in terms of provability.&lt;br /&gt;&lt;br /&gt;Building a self-referential theory of logical implication which follows the rules we want is hard. The obvious way of doing it just leads directly to Curry's paradox. I still have not finished reading Hartry Field's "Saving Truth from Paradox", but it includes an account of implication which may be helpful. &lt;a href="http://m-phi.blogspot.com/2011/04/validity-and-truth-preservation.html"&gt;These&lt;/a&gt; &lt;a href="http://m-phi.blogspot.com/2011/03/on-adding-validity-predicate-to-pa.html"&gt;posts&lt;/a&gt; also contain some notes which may be helpful.&lt;br /&gt;&lt;br /&gt;I should have known for about a year now that the immediate challenge is to built up to 2nd-order logic, not to build up to set theory. However, this made it hit home: &lt;i&gt;I can't rely on 2nd order logic to be there for me when I need it&lt;/i&gt;. I don't yet know what 2nd-order logic is, or where it comes from.&lt;br /&gt;&lt;br /&gt;Still, the fact remains that 2nd-order logic is a fairly natural, very useful, and almost certainly consistent set of axioms. It seems probable that aliens would stumble upon it, etc. It does not just poof out of existence. Even just as a syntactic entity rather than one with a semantics, it's still important!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-7547265020069155537?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/7547265020069155537/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/05/impredicativity.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7547265020069155537'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7547265020069155537'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/05/impredicativity.html' title='Impredicativity'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-8121073325641290170</id><published>2011-02-04T09:08:00.000-08:00</published><updated>2011-02-04T09:08:05.545-08:00</updated><title type='text'>Probabilities over Subjective Worlds</title><content type='html'>This is the promised follow-up to &lt;a href="http://lo-tho.blogspot.com/2011/01/subjective-probability.html"&gt;this post&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;What sense can we make of the standard treatment of subjective higher-order probabilities? Beyond the formalism, how are we to visualise this stuff?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I wanted a possible-worlds type semantics, but I decided to construct my own rather than looking at existing ones.&lt;br /&gt;&lt;br /&gt;In normal possible-world semantics, we've got a bunch of worlds which hold complete valuations-- assignments of every statement to true or false. Then, we often talk about an "accessibility" relation between worlds: world A is only accessible from world B under certain conditions. The accessibility relation controls our notion of "possible" and "necessary": from our perspective at a world W, something is considered "possible" if it is true in some world accessible from W, and "necessary" if it is true in all accessible worlds.&lt;br /&gt;&lt;br /&gt;Examples of useful accessibility relations include ones based on knowledge (ie, the accessible worlds are the ones consistent with what I know about the world I'm in) and based on time (ie, the accessible worlds are potential futures).&lt;br /&gt;&lt;br /&gt;One way of getting probabilities out of this is to make the accessibility relation continuous-valued; rather than a world being accessible or not, a world has a degree of accessibility.&lt;br /&gt;&lt;br /&gt;So, we start with just non-probabilistic facts in each world; then 1st-order probabilistic facts get values based on the immediately accessible worlds. Then 2nd-order facts, 3rd-order, et cetera. (This can go up to infinite orders if the language is expressive enough to mention them.)&lt;br /&gt;&lt;br /&gt;To get coherent probabilities, we need to have some restrictions on the weights of the accessibility relations; I won't go into detail here. &lt;br /&gt;&lt;br /&gt;Unfortunately, this really only gets us a 1st-order distribution: since all our facts are completely determined in each world, all our 1st-order probabilities will be completely determined in the first propagation step, so all higher-order probabilities will be 1 or 0.&lt;br /&gt;&lt;br /&gt;To get an interesting higher-order distribution, we can give up the idea that possible worlds always hold complete valuations. Instead, a possible world contains just a partial picture: an assignment of a few facts to true or false. This allows the probabilities to be only partially determined, allowing nontrivial probabilities at each stage.&lt;br /&gt;&lt;br /&gt;This goes along with an idea called "situation theory" which I know only a little about. As I understand it, the idea is that when talking about possibility, we talk about "situations" rather than worlds: partial assignments to true/false rather than complete ones.&lt;br /&gt;&lt;br /&gt;It's probably better to think of these as "possible states of knowledge" rather than possible worlds or even possible situations, given the probabilistic interpretation. We move between worlds when we gain knowledge. Some of the worlds *will* be fully specified, and one of these will correspond to the "actual world"; however, we will never get there in terms of our state of knowledge.&lt;br /&gt;&lt;br /&gt;Now, since this construction gives us a self-referential probability predicate, it is interesting to think of it as a sort of generalised theory of self-referential truth. P(S)=1 plays the role of True(S), but does so imperfectly: it is possible for the probability of a statement to be 1 but for the statement to later turn out to be false. For example, if we have an even distribution over the possible probabilities of some statement A, then P(P(A)=r)=0 for all real numbers r. This means P(P(A) not equal r)=1 for all r. Yet, we may eventually transition into a world in which r takes on a specific value-- so we can't think of all statements "P(A) not equal r" as being true. This makes our theory of (pseudo-)truth one in which the inference A =&amp;gt; True(A) is justified, but not the inference True(A) =&amp;gt; A.&lt;br /&gt;&lt;br /&gt;The propagation method gives us a "least-fixed-point" type probability distribution. However, there might also be other useful fixed points. For example, we may want some self-referential sentences to come out with probability values.&lt;br /&gt;&lt;br /&gt;In any fixed point, there will be probabilistic statements which do not get any truth value. First off, any statement which has a (non-integer) probabilistic belief associated with it can't also have a truth value. There will also be statements, though, which can't be consistently assigned any level of belief.&lt;br /&gt;&lt;br /&gt;Overall, this probably doesn't offer an especially appealing theory of truth. It would be interesting to know more about precisely how much it can give us, though.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-8121073325641290170?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/8121073325641290170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/02/probabilities-over-subjective-worlds.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8121073325641290170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8121073325641290170'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/02/probabilities-over-subjective-worlds.html' title='Probabilities over Subjective Worlds'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-1807555623122281472</id><published>2011-01-28T20:14:00.000-08:00</published><updated>2011-01-28T20:14:53.009-08:00</updated><title type='text'>Some Consequences of Incompleteness</title><content type='html'>&lt;a href="https://docs.google.com/viewer?a=v&amp;amp;pid=explorer&amp;amp;chrome=true&amp;amp;srcid=0B1pyu1tDAnNTNTM0OGZmYmItODFhMi00YTZiLTlkMTItZjMxOTY4NGVkNDQ0&amp;amp;hl=en"&gt;I wrote an essay&lt;/a&gt; about some issues in logic a little while ago.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-1807555623122281472?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/1807555623122281472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/01/some-consequences-of-incompleteness.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1807555623122281472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1807555623122281472'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/01/some-consequences-of-incompleteness.html' title='Some Consequences of Incompleteness'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-4514423524273752082</id><published>2011-01-21T19:28:00.000-08:00</published><updated>2011-01-21T19:28:02.027-08:00</updated><title type='text'>Subjective Probability</title><content type='html'>The system I described in the previous post doesn't directly lend itself to "subjective" higher-order probabilities. Specifically, we can't just have a degree of belief in an arbitrary statement. We need to choose a variable randomly in order to have a probability-- a completely specific statement (with no variables) can't have a probability other than 1 or 0. Probabilities are reduced to a sort of counting -- specifically, what's known as a "measure" -- rather than being a way of representing uncertainty.&lt;br /&gt;&lt;br /&gt;There are some ways we can try to interpret subjective probabilities as measures. We can introduce "possible worlds" (or "possible situations"), which we treat as being chosen randomly-- so if we're uncertain about whether it will rain or snow tomorrow, it's because in some possible worlds it rains, and in some it snows. The probability of each is obtained by counting the possibilities. However, prompted by Lukasz's comment on my previous post, I took a much closer look at how higher-order subjective probability measures should be represented. I found myself re-inventing the standard higher-order probability theory which I complained about in the first place. This &lt;i&gt;can&lt;/i&gt; be represented within my framework, but it isn't clear that it &lt;i&gt;should&lt;/i&gt; be. Each theory can stand on its own, and is more convenient for different purposes.&lt;br /&gt;&lt;br /&gt;One problem with my proposed system is that it does not directly endorse the idea of generalizing from examples. Having knowledge of more examples will give us more knowledge of the probability involved, but only because the probability is defined by literally counting examples. If we have 100 buttons (each measured equally) and we know only that each button is either red or blue, seeing 98 of those buttons will tell us the probability to within a range of .02, but that tells us nothing about the last two buttons! 98 blue buttons are no indication that the last two are blue, unless we add a possible-world framework so that we can measure probabilities of the last two colors as a function of randomly chosen worlds.&lt;br /&gt;&lt;br /&gt;Frequentism is a bit better here: probabilities are not just arbitrary measures, but rather, are limiting frequencies. What this means is that the probability of an event is defined as the ratio one would get after an infinite number of experiments. It seems more justifiable to use a limiting frequency as a generalization; if we have a limiting frequency, then we know that if we run experiments long enough, we'll get close to that ratio. This in-the-long-run statement is potentially quite weak with respect the the &lt;i&gt;immediate&lt;/i&gt; future, but at least it tells us &lt;i&gt;something&lt;/i&gt; about the future!&lt;br /&gt;&lt;br /&gt;There are some problems with limiting frequencies, however. One problem&amp;nbsp; is that there is no mathematical guarantee that limiting frequencies  exist! Mathematically speaking, the limiting frequency will typically  depend on such things as the order in which we perform the experiments;  some orderings will have different limits, and some will have none at  all (ie, the ratio varies up and down infinitely). We have to assume that the actual order in which we perform experiments will have a limit. Another problem is how we might get knowledge of limiting frequencies. A limiting frequency is not something we can directly use as a degree of belief concerning another limiting frequency-- limiting frequencies require something called a &lt;i&gt;reference class &lt;/i&gt;(meaning that a probability for a specific event is only defined when we think of that event in the context of a specific sequence of random experiments). Furthermore, a ratio from a &lt;i&gt;finite&lt;/i&gt; series of experiments does not &lt;i&gt;necessarily&lt;/i&gt; tell us anything about the ratio after an &lt;i&gt;infinite&lt;/i&gt; number of experiments; we need additional assumptions to try and make this connection.&lt;br /&gt;&lt;br /&gt;This brings in the idea that we have to &lt;i&gt;start&lt;/i&gt; with some probabilities in order to get more (effectively, we need a Bayesian prior). Taken to its extreme, we get the theory of subjective Bayesian probabilities; all probabilities are interpreted as personal degrees of belief. New information provides evidence for hypotheses by Bayes' Law, so that we can generalize from examples by performing bayesian updates on our probabilistic models of the world.&lt;br /&gt;&lt;br /&gt;Really, all three of these options are valid applications of probability theory. One does not have to be a purist-- the different types of probability can be mixed. In particular, it seems useful to take Bayesian-style degrees of belief as a way of estimating the two other kinds of probabilities.&lt;br /&gt;&lt;br /&gt;However, that sort of inclusiveness does not settle the issue of how probabilities should be used for particular applications. For my curiosities about higher-order probabilities, the system I presented in the previous post offers a somewhat nice, tight connection between higher-order probabilities and first-order logic (probabilities being a generalization of universal quantifiers). This may be appealing for certain probabilistic logic issues. On the other hand, the Bayesian theory of higher order probabilities has a nicer way of allowing a degree of belief for any statement... (though, higher-order belief distributions require us to just use &lt;i&gt;expected&lt;/i&gt; probabilities when making bets, as I complained in the previous post).&lt;br /&gt;&lt;br /&gt;I'll end this here... next time, I hope to talk a bit about the structure of possible-world semantics for purely subjective probabilities, and what it provides in the way of a theory of truth and foundation for mathematics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-4514423524273752082?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/4514423524273752082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2011/01/subjective-probability.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4514423524273752082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4514423524273752082'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2011/01/subjective-probability.html' title='Subjective Probability'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-3754899037036683858</id><published>2010-12-19T18:06:00.000-08:00</published><updated>2011-05-08T21:55:49.465-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='probability'/><title type='text'>Foundations of Higher-Order Probability</title><content type='html'>I think the foundations of higher-order probability theory may be in a bit of a mess.&lt;br /&gt;&lt;br /&gt;Whereas first-order probabilities express uncertainty about the world, a higher-order probability expresses uncertainty about what value a first-order probability has. Or, if you ask a Frequentist instead of a Bayesian: first-order probabilities give limiting frequencies of results from experiments, whereas higher-order probabilities give limiting frequencies of getting certain limiting frequencies.&lt;br /&gt;&lt;br /&gt;Uncertainty about uncertainty could be argued to defeat the whole point of having a plausible degree of belief... why can't it be collapsed into a straight degree of belief?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.102.4940&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;This paper&lt;/a&gt; tries to base higher-order probability on the idea that there is some "ideal expert" who knows the "true probability" which we are uncertain about. However, the paper puts it up to the agent to decide what constitutes a "fully informed ideal expert." If the ideal expert knows which events actually happen or not, then the first-order probabilities will all be 0 or 1, so the higher-order probabilities take on the role that 1st-order probabilities usually take on. In the opposite extreme, the "ideal expert" is just the agent itself, so that the higher-order probabilities are all 1 or 0 and the first-order probabilities are known. (All this is specifically laid out by the author.) This seems to put higher-order probability on a very shaky footing; at least to my ear.&lt;br /&gt;&lt;br /&gt;Consider the following situation concerning an event A:&lt;br /&gt;&lt;br /&gt;P(P(A)=1/3)=1/2&lt;br /&gt;P(P(A)=2/3)=1/2&lt;br /&gt;&lt;br /&gt;In English, we do not know if the probability of A is 1/3 or 2/3-- we assign 50% probability to each.&lt;br /&gt;&lt;br /&gt;Now, two plausible principles of higher-order reasoning are as follows.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;Expectation principle&lt;/i&gt;: We can find our first-order belief in event A by a weighted averaging over the possibilities; ie, P(A)=E(P(A)), where E is the expectation operator (which takes the average value, weighted by probability). This is necessary to convert higher-order distributions into numbers we can use to place bets, etc.&lt;/li&gt;&lt;li&gt;&lt;i&gt;Lifting principal&lt;/i&gt;: If we know X, we know P(X)=1. &lt;a href="http://www.jstor.org/pss/186197"&gt;This paper&lt;/a&gt; argues for a related rule: P(X)=p if and only if P(P(X)=p)=1. Either rule is sufficient for the argument that follows; specifically, I only need that P(X)=p implies P(P(X)=p)=1. (This is a special case of my "lifting" by instantiation of X to P(X)=p, and a weakening of their coherence-based principle since it takes just one direction of the "if and only if".) [It is interesting to note the similarity of these rules to proposed rules for the truth predicate, and in fact that is what started my investigation of this; however, I won't give significant space to that now.]&lt;/li&gt;&lt;/ul&gt;Granting both of these rules,&amp;nbsp; it is not hard to see that any non-trivial higher-order distribution will be contradictory. Returning to the example at hand, we apply the two rules in succession:&lt;br /&gt;&lt;br /&gt;P(P(A)=1/3)=1/2&lt;br /&gt;P(P(A)=2/3)=1/2&lt;br /&gt;-&amp;gt;&lt;br /&gt;P(A)=1/2 via expectation,&lt;br /&gt;-&amp;gt;&lt;br /&gt;P(P(A)=1/2)=1 via lifting.&lt;br /&gt;&lt;br /&gt;But this contradicts the original distribution; we have&lt;br /&gt;&lt;br /&gt;P(P(A)=1/3)=1/2&lt;br /&gt;P(P(A)=2/3)=1/2&lt;br /&gt;P(P(A)=1/2)=1,&lt;br /&gt;&lt;br /&gt;which sums to 3/2, violating the laws of probability.&lt;br /&gt;&lt;br /&gt;For higher-order probability to be non-trivial, then, it &lt;i&gt;seems&lt;/i&gt; we&amp;nbsp; must reject one or both of my given principles. If this is granted, I think the best view is that the expectation principle should be abandoned; after all, &lt;a href="http://www.jstor.org/pss/186197"&gt;that paper I cited&lt;/a&gt; gives a pretty good argument for a form of lifting principle. One way of looking at the expectation principle is that it flattens the higher-order distribution to a first-order one, trivialising it.&lt;br /&gt;&lt;br /&gt;However, I want to preserve both principles. Both seem totally reasonable to me, and I've found a solution for keeping a form of each without any problem. I will argue that the problem here is one of equivocation; the notation being used is not specific enough. (The argument I use comes from my brother.)&lt;br /&gt;&lt;br /&gt;For concreteness, suppose the event 'A' I mentioned is flipping a coin and getting heads. One scenario which fits with the probabilities I gave is as follows. We have two unfair coins in a bag; one gives heads 2/3 of the time, and the other gives heads 1/3 of the time. If we want a fair chance of heads, we can just draw at random and flip; half the time we will get the coin bias towards heads, but half the time we will get the one bias against. Since the bias is equal in each direction, we will get heads 50% of the time. This illustrates the expectation principle in action, yet it does not trivialise the higher-order characterisation of the situation: if we draw one coin out of the bag and keep it out, we can try and determine whether it is the 1/3 coin or the 2/3 coin through experimentation. This involves a Bayesian update on the higher-order distribution I gave.&lt;br /&gt;&lt;br /&gt;The intuition, then, is that the "flattened" probability (1/2) and the "non-flat" probabilities (1/3 and 2/3) are the limiting frequencies of very different experiments. There is no contradiction in the two distributions because they are talking about different things. 1/2 is the limiting frequency of drawing, flipping, and re-drawing; 1/3 or 2/3 is what we get if we draw a coin and don't re-draw as we continue to flip (and we will find that we get each of &lt;i&gt;those&lt;/i&gt; 1/2 of the time).&lt;br /&gt;&lt;br /&gt;So, how do we formalise that? One way, which I particularly like, is to let the probability operator bind variables.&lt;br /&gt;&lt;br /&gt;I'll use the notation P[x](A)=e to represent P binding variable x (where A is some statement with x as a free variable). Intuitively, the meaning is that if we choose x randomly, the statement A has probability e of being true. In addition, we can talk about choosing multiple variables simultaneously; this will be given notation like P[x,y,z](A)=e.&lt;br /&gt;&lt;br /&gt;In this notation, the two principles become the following:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;Expectation principle:&lt;/i&gt; If we have a higher-order distribution represented by a collection of statements P[x](P[y](A)=a)=b, P[x](P[y](A)=c)=d, et cetera, then we can form the first-order distribution P[x,y](A)=E[x](P[y](A)), where E[x](P[y](A)) is the expected value of P[y](A), choosing x randomly.&lt;/li&gt;&lt;li&gt;&lt;i&gt;Lifting principle:&lt;/i&gt; If we know some statement A (possibly with free variables), we can conclude P[x](A)=1, for any variable. (Together with the expectation principle, this implies the related if-and-only-if statement from the paper I cited.)&lt;/li&gt;&lt;/ul&gt;With these rules, the argument from earlier has a different conclusion: &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;P[x](P[y](A)=1/3)=1/2&lt;br /&gt;P[x](P[y](A)=2/3)=1/2&lt;br /&gt;-&amp;gt;&lt;br /&gt;P[x,y](A)=1/2 via expectation,&lt;br /&gt;-&amp;gt;&lt;br /&gt;P[z](P[x,y](A)=1/2)=1 via lifting.&lt;br /&gt;&lt;br /&gt;This makes my earlier point about equivocation clear: the concluding probability is not about P[y](A) at all, but rather, is about P[x,y](A).&lt;br /&gt;&lt;br /&gt;There are many details to fill out here; we'd like convenient notation for probability densities,&amp;nbsp; a proper analysis of what subjective probabilities are like in this system, and several other things. However, I feel these are better left to a paper rather than a blog post. :)&lt;br /&gt;&lt;br /&gt;PS-- if anyone has references on similar systems, that'd be great!&lt;br /&gt;PPS-- &lt;a href="http://www.springerlink.com/content/b46877525x984034/"&gt;This paper&lt;/a&gt; talks about similar issues. :D&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-3754899037036683858?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/3754899037036683858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/12/foundations-of-higher-order-probability.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/3754899037036683858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/3754899037036683858'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/12/foundations-of-higher-order-probability.html' title='Foundations of Higher-Order Probability'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-4910236864578516067</id><published>2010-12-05T21:17:00.000-08:00</published><updated>2010-12-05T21:17:27.063-08:00</updated><title type='text'>Three Dreams</title><content type='html'>(1) I wish I had time to build self-education software. This is a wish carried over from my brother, who came up with the idea; a system which hands you exactly the practice problem you need at the moment, to challenge you, keep your interest, but not give you something you can't do. Ideally, the problem sets would be created and shared in a wiki-lije online community (which is the main difference from current high-end education software!) The broadness enabled by this could revolutionise education, job training, and certification. (Since the program estimates your skill level to determine what problem to give you, sitting potential employees down for a session on the machine would give valuable information about their abilities.)&lt;br /&gt;&lt;br /&gt;(2) I would also like to have the time to build a powerful "open work" software in the spirit of mechanical turk, love machine, rentacoder, and others. The hope would be to emphasise love machine's social aspect, but mechanical turk's "all are welcome" aspect. Ideally, the environment would also feel like stack overflow-- you can build reputation by answering questions for free, so that you are more likely to get paid in the future. There are a lot of issues to be dealt with here! The nature of contracts, the details of any "reputation" system... it's complicated. However, it would be great for the efficiency of the intellectual labour market!&lt;br /&gt;&lt;br /&gt;(3) I want to build a Bayesian reasoning system capable of acting as mankind's store of knowledge, so that all arguments for and against certain points can be kept track of and weighed properly against the evidence. This is the least realistic of the three dreams, but if it could replace/augment Wikipedia, it would help to end debates like global warming (or, rather, force the debate to proceed entirely from empirical evidence and in a logically correct manner). There are lots of big problems with this one, including the problem of verifying sources of empirical data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-4910236864578516067?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/4910236864578516067/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/12/three-dreams.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4910236864578516067'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4910236864578516067'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/12/three-dreams.html' title='Three Dreams'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-7436987547219861150</id><published>2010-11-15T11:38:00.000-08:00</published><updated>2010-11-16T09:02:43.694-08:00</updated><title type='text'>Bayesian Statistics</title><content type='html'>I'm in a statistics class this semester. On the first day, the professor gave an argument for Bayesianism: frequentist probabilities are only defined when events can be viewed as members of a series of experiments with a limiting frequency, but we also sometimes want to talk about the probability of events which can't be framed in that way. For example, Obama being re-elected is a singular event, so we would have difficulty framing it as one of a sequence of experiments. Bayesianism extends the notion of probability to such cases.&lt;br /&gt;&lt;br /&gt;Since then, of course, I've been bringing up a few Bayesian points in class when relevant. On his end, the prof goes so far as to point out that Bayes Law is not strictly necessary whenever he uses it, and work through the problem a second time avoiding Bayes Law.&lt;br /&gt;&lt;br /&gt;This makes me want to write about a few things.&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Higher Moments&lt;/li&gt;&lt;li&gt;The square and the second moment&lt;/li&gt;&lt;li&gt;Least-square fitting&lt;/li&gt;&lt;li&gt;p-testing and the sort of refutation which Bayesianism is capable of&lt;/li&gt;&lt;li&gt;covariance vs mutual information, variance vs entropy&lt;/li&gt;&lt;li&gt;Information theory (&amp;amp; coding theory) as a foundation for subjective probability&lt;/li&gt;&lt;/ul&gt;However, I have too little time to write about these things. :p If I take the time later, I will turn these bullet points into links. &lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-7436987547219861150?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/7436987547219861150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/11/bayesian-statistics.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7436987547219861150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7436987547219861150'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/11/bayesian-statistics.html' title='Bayesian Statistics'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-1094365224612145644</id><published>2010-09-01T21:22:00.000-07:00</published><updated>2010-09-01T21:22:22.787-07:00</updated><title type='text'>Holes in Weak Inferentialism</title><content type='html'>If something like my &lt;a href="http://lo-tho.blogspot.com/2010/08/logical-systems.html"&gt;weak inferentialism&lt;/a&gt; is to be taken seriously, it should be able to account for common structures of mathematical reasoning. In particular, let's take two examples: mathematical induction (used for reasoning about the natural numbers and other discrete structures) and the continuum (used for reasoning about real numbers and other continuous structures).&lt;br /&gt;&lt;br /&gt;My favorite way of thinking about mathematical induction is as a "nothing more" operator-- the principle of mathematical induction essentially asserts that there aren't any numbers except the ones whose existence is assured by the other axioms.&lt;br /&gt;&lt;br /&gt;Viewed as a second-order axiom, this is the only axiom of number theory whose logical consequences are not computably enumerable. If we view it instead as a first order infinite axiom schema, the consequences are computably enumerable, but are an incomplete characterization of the natural numbers. (This is as a result of applying the classical semantics to 1st-order and 2nd-order logics.)&lt;br /&gt;&lt;br /&gt;In a weak inferentialist system like the one I described, it's quite possible to list the consequences of the first-order axiom schema or other stronger versions which approach the 2nd-order axiom.&amp;nbsp; However, the full 2nd-order axiom causes some trouble. The result depends on how paradoxes are resolved. In my description, I said that Kripke's theory of truth would provide a way of avoiding paradoxes. Kripke's theory, however, leaves open some questions.&lt;br /&gt;&lt;br /&gt;Basically, Kripke says that some sentences must have definite truth values, and some cannot, but leaves some sentences between the two which we are allowed to assign values or not based on preference. The &lt;i&gt;least fixed-point&lt;/i&gt; is the theory resulting from leaving all these middle sentences undefined as well; it's generally the preferred theory. However, there are others, such as those based on supervaluation. (Note: This is not an entirely complete way of spelling out what's going on here. There are actually two different things which Kripke leaves open: the fixed-point and the valuation scheme. When I say "least fixed point" I'll actually mean the least fixed point with a Kleene evaluation scheme. However, I won't go into those details here.)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The full inferentialist 2nd-order induction axiom would say that for any predicate, we can conclude that if it is true of 0 and is true of &lt;i&gt;n+1&lt;/i&gt; whenever it is true of &lt;i&gt;n&lt;/i&gt;, it is true of all numbers.&lt;br /&gt;&lt;br /&gt;According to the least-fixed-point, this assertion should always be undefined. This is because the quantification over all predicates necessarily includes undefined predicates, for which the whole statement will come out undefined.&lt;br /&gt;&lt;br /&gt;I &lt;i&gt;believe&lt;/i&gt; the supervaluation version will instead allow the assertion to be well-defined. As I understand it, supervaluation says that a statement is true if it is true for any consistent assignment of truth values to the undefined sentences. This means that, conceptually, for the instances of induction which are handed ill-defined predicates, we ask "what would be true of well-defined extensions of this predicate?"&lt;br /&gt;&lt;br /&gt;This is hopeful, but at the same time it seems unfortunate that the properties of the system would depend so much on which version of Kripke's theory is used. The minimal fixed-point seems (at least to some) like the nicest one, so relying on a different version needs some explanation. Can the choice be motivated in a way more strongly tied to the thesis of weak inferentialism? I'll leave that question for later.&lt;br /&gt;&lt;br /&gt;Now, for the continuum. By continuum, I mean to refer to any entity with the cardinality of the powerset of the natural numbers. That powerset (ie, the set of all sets of natural numbers) is one such entity; the real numbers are another. These objects have the essential feature that not every element can be described-- there are far more elements than there can be formulas in any symbolic language. It is a larger sort of infinity.&lt;br /&gt;&lt;br /&gt;The &lt;i&gt;main&lt;/i&gt; question here is, does this sort of system allow for a classical continuum, or is it more like a constructive continuum (in which only the describable elements exist)?&lt;br /&gt;&lt;br /&gt;The answer is, again, dependent on the version of Kripke's truth that we choose. Supervaluation seems to do just what classical mathematics wants: "all possible valuations" will include an uncountable number of possibilities if the setup is right. This will cause universal generalizations about sets if natural numbers (such as the induction axiom!) to take the correct truth values. I could be wrong here, though-- I am not sufficiently familiar with supervaluation.&lt;br /&gt;&lt;br /&gt;Least fixed point will again seem to fail us, refusing to make many generalizations which are taken to be sensible in classical mathematics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-1094365224612145644?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/1094365224612145644/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/09/holes-in-weak-inferentialism.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1094365224612145644'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1094365224612145644'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/09/holes-in-weak-inferentialism.html' title='Holes in Weak Inferentialism'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-294622525418837657</id><published>2010-08-04T16:23:00.000-07:00</published><updated>2010-12-27T14:03:22.213-08:00</updated><title type='text'>Logical Systems</title><content type='html'>This is a rehash of some ideas from my &lt;a href="http://dragonlogic-ai.blogspot.com/"&gt;older blog&lt;/a&gt;. Basically, I'm trying to state the problem anew, in light of reading Jon Cogburn's paper &lt;a href="http://www.projectbraintrust.com/cogburn/abstracts.html#anchor22275"&gt;Are Turing Machines Platonists&lt;/a&gt;? (Unfortunately not available online, but do email him if you want a copy, he is a nice fellow.)&lt;br /&gt;&lt;br /&gt;Inferentialism is, roughly speaking, the view that the only thing that is important to our understanding of a statement is the way that statements interacts with the surrounding web of statements in our belief system. This is made precise by saying that we understand a statement precisely when we could recognize a proof or disproof of it.&lt;br /&gt;&lt;br /&gt;Computationalism is the view that a mind can be represented as a computer program, that is, there is no fundamentally non-computable stuff going on up there: if a computer was fast enough, it could compute the proper outputs to the nerves based on the microsecond-to-microsecond inputs received.&lt;br /&gt;&lt;br /&gt;Together, these two views entail that we cannot fully understand math in its present form. Goedel's semantic incompleteness theorem shows that for any computer program, there will exist statements in basic number theory which that computer program can recognize no proof or disproof of. Perhaps one might respond that this seems acceptable, as it becomes very difficult to understand complicated mathematical statements, and it doesn't seem implausible that we have some limit corresponding to Goedel-style incompleteness. However, in general our ability to understand the meaning of mathematical statements does not seem to correspond that well to our ability to prove or disprove them (or understand proofs provided by others). The continuum hypothesis, for example, seems understandable; yet it is known not to have a proof or disproof in any widely accepted set of axioms. It seems implausible (to me, at least) that it's understandability comes from its being provable or disprovable in our mental logic. The halting problem provides numerous other examples which I would claim were understandable yet not amenable to proof or disproof.&lt;br /&gt;&lt;br /&gt;This means we've got to either give up inferentialism, computationalism, or classical mathematics. Very roughly speaking: people who give up classical mathematics are some variety of intuitionist or constructivist; people who give up computationalism are some variety of dualist or hypercomputationalist (like Roger Penrose).Now, I don't disagree in principle with restructuring math from the bottom up, but it seems desirable for a foundational program to capture as much as possible of the way mathematicians intuitively reason, so I'm hesitant to be a constructivist (though I may yet be converted). Similarly, I don't have anything against the possibility that some processes yet-unknown to physicists are endowing minds with special non-computational behaviors (and since I'm not a constructivist, I even believe that such behaviors can be well-defined and deterministic!). However, I don't &lt;i&gt;know&lt;/i&gt; that this is the case, and neuroscience seems to suggest that &lt;i&gt;much&lt;/i&gt; of neural processing can be accounted for in a computational way. Furthermore, as an artificial intelligence researcher, I &lt;i&gt;hope&lt;/i&gt; that the essence of "mind" can be captured computationally. I fall in the third category, wanting to give up inferentialism.&lt;br /&gt;&lt;br /&gt;Giving up inferentialism is not to be done lightly. It's a highly plausible assertion, &lt;i&gt;especially&lt;/i&gt; for the computationalist: what else should matter about a statement then the computational interactions it has with other statements? &lt;br /&gt;&lt;br /&gt;The solution I find plausible I'll call &lt;i&gt;weak inferentialism&lt;/i&gt;: we understand a statement if we can compute a &lt;i&gt;defining set of inferences&lt;/i&gt;. The statement's meaning is precisely that defining set; it is merited when all inferences in that set are merited, and (optionally?) false when one of them is false. (Should falseness be taken as a basic concept?) This does not mean that we can compute all of the statement's consequences, though. For example, the defining set of a universal statement P: "For all x, S is true of x" will be all the statements "S is true of A", "S is true of B", ... It's possible that another statement, Q, has a list of consequences which is some subset of P's list. In this case, Q would be a consequence not in the list for P. In some sense, however, Q does not add anything to the list: it just summarizes a portion of it. The weak inferentialist argues that this allows us to understand P without necessarily knowing that Q follows from it.&lt;br /&gt;&lt;br /&gt;(There may be some interesting connections between these two types of inferentialism and the "Principle of Harmony" from proof theory, which states that the introduction rules and elimination rules for symbols should precisely mirror each other. This basically corresponds to a connection between the inferences from which we can conclude a statement and the inferences we can make from that statement. This principle may have to be spelled out differently for the two types of inferentialism. I don't know enough about the principle of harmony to make a well-considered connection, though.)&lt;br /&gt;&lt;br /&gt;Now, the question: what foundational logics do the two different inferentialist pictures recommend? In particular, if we're also computationalist?&lt;br /&gt;&lt;br /&gt;Strong inferentialism will only care about notions of logical consequence which have complete proof theories, like first-order logic. An inferentialist will only care about what structures of reasoning can be implemented in the logic. In particular, it seems natural to consider a logic as a programming language. Think of it like this: we have some basic domain of discourse we wish to talk about (such as the actual world), and we have the logic which allows us to make assertions which will cause some statements about the domain of discourse to entail other such statements. The logic is nothing more than a means for expressing these entailment relationships between the domain-level facts.&lt;br /&gt;&lt;br /&gt;Ignoring computational efficiency and one or two other practical matters, it seems that little about the logic matters once we've determined that it is Turing complete. Classical first-order logic, intuitionistic first-order logic, and a host of others will all do equally well.&lt;br /&gt;&lt;br /&gt;Interestingly, though, more powerful logics can be motivated even in this minimalistic worldview (if we bring practical matters like speed back into the picture). &lt;a href="http://plato.stanford.edu/entries/goedel/#SpeUpThe"&gt;Goedel showed&lt;/a&gt; that mathematically more powerful logics (in a specific sense) will always have the following property: there will be some inferences which can be made in an astronomical number of inference steps in the less-powerful logic, but which the more-powerful logic proves in just a few steps. This theorem only holds for arbitrarily large domains of discourse, though, so it is an empirical question whether the phenomenon occurs in practical situations. The paper "A curious inference" by George Boolos and "Some More Curious Inferences" by Jeffrey Ketland discuss the issue (taking the affirmative).&lt;br /&gt;&lt;br /&gt;Happily, the notion of "more powerful" here coincides at least to an extent with the more typical notions, which seems to mean that we can justify a good amount of normal math via this sort of reasoning, despite the fact that strong inferentialism will deny that math its standard meaning. However, I don't know the precise connection here, and I won't try to explore (in this blog post) precisely what of mathematics could be justified in that way.&lt;br /&gt;&lt;br /&gt;In any case: what sort of view of logic does &lt;i&gt;weak&lt;/i&gt; inferentialism suggest? Well, based on the idea of the defining set of consequences, we could say that a (non-basic) statement is a computer program for listing its own consequences. The "most expressive" logic will be one which uses a Turing-complete notation to do this. The key difference between this system and the previous is that we may not be able to infer a statement even if we can infer all of its defining consequences: we cannot implement the truth conditions computationally. Hence, we still have a (highly abstract, irrelevant of speed issues) concept of a more powerful logic: a more powerful logic will know more about which statements follow from which others. This is equivalent to knowing more about the halting problem (since we could check for implication A-&amp;gt;B by making a program that halts when B implies something A does not, but keeps looping otherwise).&lt;br /&gt;&lt;br /&gt;Fortunately, the extra information can &lt;i&gt;always&lt;/i&gt; be expressed in the same logic! We never need to add more expressiveness, only more knowledge. The work done by any additional symbols can evidently be done without them, because the notation is Turing-complete.&lt;br /&gt;&lt;br /&gt;The weak inferentialist logic includes the Liar sentence, ie, the sentence whose defining consequence is just its own negation. This can be dealt with via Kripke's fixed-point valuation: we enforce the constraint that a statement is considered true exactly when its defining inferences are merited, but we don't require that a sentence is either true or false. The inference "The Liar sentence is false" is neither right nor wrong; it remains undefined, since there is nothing for it to take its truth or falsehood from. The Liar sentence is ungrounded.&lt;br /&gt;&lt;br /&gt;Exactly how this works will depend on whether we want falsehood as a basic concept, which I left open at the beginning. If we don't take it as basic, then falsehood might be defined as the property of implying everything.&amp;nbsp; The Liar paradox then becomes something very reminiscent of the Curry paradox: "This sentence implies everything." What the fixed-point construction tells us is that we can't in general use hypothetical reasoning to see if inferences are indeed justified: if we want to know whether sentence X, which asserts just X|-Y, is true, then it appears we should assume X and see if we can derive Y. (Read X|-Y as "from X&amp;nbsp; we can infer Y".) If we assume X, then we know X|-Y, but combining those two, we know Y. This hypothetical reasoning proves X|-Y, so we know X (un-hypothetically). But this entails Y, which might be "everything"! Logicians have provided weaker forms of hypothetical reasoning which conform to the fixed-point construction in order to avoid this problem. (Specifically, we can only make hypothetical assumptions which we already know are grounded.)&lt;br /&gt;&lt;br /&gt;It's interesting that this means the sentence which just claims that inferring Y is justified is radically different from the sentence which claims that inferring Y from itself is justified, despite the fact that once we believe either, they justify the same inferences. The two even have the same conditions for being true: if we know Y, we can conclude both (since A|-B is true when B is inferable, that is, |-B implies A|-B, regardless of A). However, when Y is false, then the statement "infer Y" is false, but "From this statement, infer Y" is ungrounded and thus considered undefined.&lt;br /&gt;&lt;br /&gt;The final thing to note is that, although no further expressiveness appears to be justified by weak inferentialism, the system described cannot fully express the concept of "groundedness" I've been using. (It can only mark it true, never false; but I've noted that statements are &lt;b&gt;&lt;i&gt;un&lt;/i&gt;&lt;/b&gt;grounded more than once in this discussion.) Hence, we have an immediate example of a concept which &lt;i&gt;appears&lt;/i&gt; to be mathematically well-defined, but which weak inferentialism does not seem to be able to account for. Yet, what is lost? After all, these supposed statements can't even be given a computable list of defining inferences they justify! Is it useful to state that something is ungrounded? (I think the more basic notion called into question here is negation itself: is it always meaningful to know that something is not the case? Negation has no defining set of inferences!)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-294622525418837657?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/294622525418837657/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/08/logical-systems.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/294622525418837657'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/294622525418837657'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/08/logical-systems.html' title='Logical Systems'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-4244866532146623788</id><published>2010-07-06T13:07:00.000-07:00</published><updated>2011-03-31T20:11:21.317-07:00</updated><title type='text'>Concerning Monetary Systems</title><content type='html'>It all started with a math-geek desire to come up with an "imaginary" dollar-- a unit of currency&amp;nbsp; equal to the square root of debt. Unfortunately, googling "imaginary dollar" just gets a load of people saying that money is already imaginary. :) So, I eventually googled "hypercomplex dollar" (the hypercomplex numbers go beyond imaginary numbers, introducing things like hyperbolic multiplication, et cetera). I got this:&lt;br /&gt;&lt;br /&gt;http://www.halfbakery.com/lr/idea/Hypercomplex_20money&lt;br /&gt;&lt;br /&gt;Sadly, this and the other proposals it links to are not &lt;i&gt;really&lt;/i&gt; attempts to construct imaginary money: they just use the fact that the imaginary numbers (and hypercomplex numbers) are orthogonal to real money; they don't attempt to give an answer to what the square root of a monetary amount might be.&lt;br /&gt;&lt;br /&gt;Anyway, these proposals referenced the LETS money system, which is interesting...&lt;br /&gt;&lt;br /&gt;http://www.gmlets.u-net.com/home.html&lt;br /&gt;&lt;br /&gt;The basic idea: everyone starts out with 0 money, but is free to spend anyway. This puts them in the negative, but negative is referred to as "commitment" rather than debt, and is not supposed to be a bad thing. There is always as much negative as positive, so being in the negative is normal and does not cause a problem (since you're still free to spend even more). The justification: scarcity of money causes a lot of problems. Scarcity of dollars stops local economies, even if there is really enough goods and labor to go around. This is why more US dollars are printed then go out of service each year: to fight the scarcity of money. Unfortunately, that leads to inflation, which has its own problems... LETS tries to fix this by letting anyone issue money whenever they like, creating "commitment" in doing so. This means there doesn't have to be money around in order for people to exchange commitment for goods and labor, so that money scarcity can never be a problem. &lt;br /&gt;&lt;br /&gt;This basically relies on individuals to honor their commitment. I think this is a big part of why LETS is supposed to remain a &lt;i&gt;local&lt;/i&gt; currency: it relies on the goodness of people, so if it got too big it would run into troublemakers.&lt;br /&gt;&lt;br /&gt;My take on the future of money is that we are headed for a credit-based economy (in which credit is more important than raw money). We're already almost there! However, there are some serious flaws with the current credit economy... credit cards are a Bad Plan. Personally, I think things like &lt;a href="http://www.kickstarter.com/"&gt;kickstarter&lt;/a&gt; are a better sign of what's coming... people investing in people. The problem with banks and credit cards is that all the investing is being done by a few monolithic entities, whereas the "little guy" just gets the debt. The little guy needs to be able to &lt;i&gt;give&lt;/i&gt; credit as readily as &lt;i&gt;take&lt;/i&gt; it.&lt;br /&gt;&lt;br /&gt;The idea of "debt" is owing money to one person. The LETS idea of "commitment" is to instead owe to a &lt;i&gt;community&lt;/i&gt;. The problem is that this is too easy to take advantage of it becomes too mainstream (ie, if people don't feel a personal commitment to the whole community). The currency would just inflate.&lt;br /&gt;&lt;br /&gt;To address this, I propose an idea of &lt;i&gt;financial backers&lt;/i&gt;. Individuals back each other rather than being backed by a big bank. If a group of people back each other with no reservation, then they should automatically&amp;nbsp; form something like a LETS network: they can spend as many credits with each other as they want, making "commitment" with the group as they do so. When they get money, either within the group or from outside, the commitment gets payed off.&lt;br /&gt;&lt;br /&gt;More commonly, people could offer each other limited lines of credit. The amount I offer to you would depend on a combination of how much I trust you, how much I can afford personally, and how much credit I have from other sources.&lt;br /&gt;&lt;br /&gt;Obviously more details need to be worked out here. One principle from LETS is that no interest should be charged on commitments. It's not obvious whether that's preferable in the system I'm thinking about. It promotes a positive and friendly outlook (as does the option of offering an unlimited credit line to people you trust, rather than worrying about placing a $ cap). However, it also seems to make sense to pay people back for their hospitality... perhaps mutual credit agreements are better than interest for that purpose, though.&lt;br /&gt;&lt;br /&gt;One conspicuous advantage of this scheme (in contrast to the currently mainstream big-bank approach!) is that the distributed nature of the system prevents large failures. An uncareful collective of individuals might fall together, but the impact would be limited.&lt;br /&gt;&lt;br /&gt;EDIT: This proposal &lt;a href="http://ripple-project.org/"&gt;has already been thought of and is being implemented&lt;/a&gt;! If you extend me a little credit in ripplePay I'll extend you a little back. :) I may write a post soon on the differences between what I was imagining and what these people are doing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-4244866532146623788?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/4244866532146623788/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/07/concerning-monetary-systems.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4244866532146623788'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4244866532146623788'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/07/concerning-monetary-systems.html' title='Concerning Monetary Systems'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-7347818357670833851</id><published>2010-06-11T20:58:00.000-07:00</published><updated>2010-06-11T20:58:54.600-07:00</updated><title type='text'>Goal Systems</title><content type='html'>Continues &lt;a href="http://lo-tho.blogspot.com/2010/06/procedural-logic.html"&gt;this&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It  seems fairly clear that backwards-chaining should be seen as a result  of goal-oriented behaviour, whereas forward-chaining should be more a  result of the reactive reasoning layer. This thought arises from several  considerations (including efficiency), but mainly from the idea that  backtracking is simply an attempt to achieve a goal by trying different  possible (sets of) subgoals in turn.&lt;br /&gt;&lt;br /&gt;A simple  production-rules-vs-logic-programming view would see the issue as two  different ways of interpreting the conditional from propositional logic;  A-&amp;gt;B either means "if you have A, conclude B" (if you're a  production rules person) or "if you want B, try A" (if you're a logic  programming person). I'm denying that that is what's going on. A-&amp;gt;B  in the knowledge base means just what it means in the declarative logic,  and doesn't by itself trigger any inferences. For inferences to be  triggered, procedural knowledge must be present. Procedural knowledge  can always be put in a series of if-then type statements which are  executed by forward-chaining; both "if you have A, conclude B" and "if  you want B, try A" are just examples of this.&lt;br /&gt;&lt;br /&gt;Now,  for probabilistic-relational systems, goal-oriented reasoning is set  within the framework of bayesian utility theory. The mathematical theory  is simple and elegant (although there have been refinements over time,  such as causal decision theory, timeless decision theory, updateless  decision theory); but like pure probabilistic reasoning, it isn't  immediately obviously how to efficiently implement/approximate. Much of  the relevant research for this comes from the domain of reinforcement  learning; also, an algorithm that's generating a lot of research right  now is monte carlo tree search. It seems reasonable to follow these  threads of research for a goal-system implementation.&lt;br /&gt;&lt;br /&gt;For  me, it seems natural to replace the on/off subgoals &amp;amp; backtracking  which appear in crisp systems with a more continuous goal structure.  This is based on the analogy to the way the truth values become  continuous in these systems, which my be a bit hasty. Still, here is my  rough outline:&lt;br /&gt;&lt;br /&gt;I've been considering for some time a system which allocates time to different reasoning processes based on a currency of "effort". Effort could take two forms: a longer-term effort would represent percentage of processing time, while a shorter-term effort would represent amounts of time. There would be long-ish cycles in which every goal that has long-term effort attached to it gets a corresponding amount of short-term effort assigned to it. A goal can then cause subgoals to be spawned which it assigns some of its effort to.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This may or may not be a good idea. :) The shorter-term effort seems like a natural concept; it's been invented independently by two different people I've talked with (Ben Goertzel's group, and Russel Wallace). It seems useful for complex combinations of approximation algorithms that can be given more or less time to give better or worse answers, especially if they may employ each other or themselves in a loopy manner. Programs based on this sort of effort would be guaranteed to terminate in whatever amount of time they are given (though, perhaps not with a very good answer.)&lt;br /&gt;&lt;br /&gt;The longer-term type of effort is not so obviously useful; hard to say.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-7347818357670833851?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/7347818357670833851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/06/goal-systems.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7347818357670833851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7347818357670833851'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/06/goal-systems.html' title='Goal Systems'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-8016745202423332389</id><published>2010-06-09T08:28:00.000-07:00</published><updated>2010-06-09T08:28:35.734-07:00</updated><title type='text'>Procedural Logic</title><content type='html'>I've decided to start using the term "procedural logic" for what I called action logic in the &lt;a href="http://lo-tho.blogspot.com/2010/05/action-logic.html"&gt;previous post on this topic&lt;/a&gt;. This better reflects the desire to merge the procedural and declarative; also, it steers clear of logics already named "action logic."&lt;br /&gt;&lt;br /&gt;To pick up roughly where I left off...&lt;br /&gt;&lt;br /&gt;Tactical theorem provers such as HOL Light and Coq are the modern-day epitome of procedural logic, in some ways. They work by the simple lack-of-mechanism which I gave as the minimal procedural logic in the previous post: they are simply programmable theorem provers. "Tactics" (aka "strategies") are created, which are simply programs for guiding inference; tactics can use each other, so that in a weak sense they are like goals and subgoals. What's really nice about them is that they cannot be made to return an incorrect result, no matter how wild the tactic. What they lack compared to SOAR and PLANNER is:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Reflexive inferences: SOAR automatically forward-chains for all rules in its KB. In a similar way, Prolog automatically backwards-chains for all programmed rules. PLANNER was designed to do both, with a declaration of which way a rule should be used upon its creation. (SOAR does something similar with a little work.) ACL2 probably has the most advanced facilities, from my cursory examination; there are many categories of possible reflexive inferences for different sorts of theorems, and the system has intelligent defaults which the user can override. However, the reflexive inferences are still clearly too simplistic sometimes; for example, the user can create an infinite loop just by proving both B = A and A = B... ACL2 will happily swap A for B indefinitely the next time either one appears in a statement.&lt;/li&gt;&lt;li&gt;Truth Maintenance: Some facts are of the sort that can change. Under strict logic, we should index them by time to account for this, but it seems necessary for efficiency to ignore this and just change the facts directly. If that's the strategy, then we have a truth maintenance problem: we may need to update our entire KB to reflect the change, withdrawing any statements we proved based on the now-false information. In SOAR, this is reflected in the distinction between o-support and i-support: o-support represents that the rule is intended as making a change to the KB in firing, which cannot be undone except by another o-rule, whereas i-support represents that the rule's consequences should be undone immediately when the premises become false (thanks to an o-rule changing something). This is probably the most advanced form of this sort of distinction. PLANNER also addressed the issue, however. (That's why I called SOAR a "descendant of planner" in my previous post on this topic; since then, though, I've asked Paul Rosenbloom about it. He indicated that PLANNER did not actually influence SOAR in any significant way, so it looks like SOAR came up with the idea quite independently.)&lt;/li&gt;&lt;/ol&gt;So, the question becomes: what should a probabilistic relational system do about all this?&lt;br /&gt;&lt;br /&gt;Paul Rosenbloom's probabilistic-relational rethink of SOAR (see his &lt;a href="http://pollux.usc.edu/%7Erosenblo/pubs.html"&gt;publication&lt;/a&gt; page) takes the position that #2 can emerge in a simple way from the sum-product belief propagation algorithm; basically, this amounts to re-estimating all the weights in working memory whenever a change is made by a rule in long-term memory. The system also deals with time in additional ways. As for #1, there is some divergence from SOAR's system, but nothing as sophisticated as ACL2. Paul Rosenbloom's proposal does not have a reason to focus on proofs via mathematical induction in the way ACL2 does, so it would not benefit from a translated version of ACL2's guidance system... perhaps it's reasonable to say that much of the sophisticated stuff in ACL2 is fairly special-purpose for its domain of application.&lt;br /&gt;&lt;br /&gt;Having sum-product as a basic dynamic seems like a good idea, but at the same time, I'm thinking of a system which can reason about inference guidance even for that. For example, it seems reasonable to (A) weigh the product-sum algorithm's attention to specific areas of the belief network based on multiple factors, including which variables are currently undergoing the most change, and which variables are most critical for the goals; (B) augment the product-sum algorithm with more accurate algorithms in areas of the belief network where accurate estimation is critical. In the Rosenbloom system, (A) would have to be an addition to the basic level, but he has discussed (B) as something which might be implemented on top of the basic level using rules. The second situation is preferable, but at the same time, any "basic level" of reactive inference/decision will have some built-in assumptions. There is a way around this, however: dynamic replacement of inference strategies in the manner of &lt;a href="http://arxiv.org/pdf/cs/0206022"&gt;Hutter's Algorithm&lt;/a&gt;. Hutter's algorithm is tremendously inefficient, so it would be nice to find a small-scale, incremental way of employing the same trick. (Also, Hutter's algorithm would need to add &lt;i&gt;probabilistic&lt;/i&gt; reasoning about algorithm speed and correctness, to satisfy the desire for a fully probabilistic-relational system.)&lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;To summarise that disorganised paragraph: I like the idea of using sum-product as the reactive-level reasoner, but it may not be what I'm after for procedural logic. As a disclaimer: worrying about abstract asymptotic universality stuff is fun, but it is not necessarily relevant to practical implementations. Sometimes I am more of a mathematician than a computer scientist. However, there are some recent indications that the theory of universal intelligence might actually guide practical implementations: a working, practically applicable, asymptotically universal &lt;a href="http://users.cecs.anu.edu.au/%7Ekee/mcaixi_ctw.pdf"&gt;approximation of AIXI&lt;/a&gt;, and a Levin-search-based &lt;a href="http://www.atlantis-press.com/php/download_paper.php?id=1925"&gt;automatic programmer&lt;/a&gt; that works on some toy problems.&lt;br /&gt;&lt;br /&gt;In any case! The main point is, in probabilistic relational systems, truth maintenance gets replaced by some form of belief revision plus some way of dealing with time. (In &lt;a href="http://www.opencog.org/wiki/The_Open_Cognition_Project"&gt;OpenCog&lt;/a&gt;, I understand that part of the strategy will be to have the truth values of time-sensitive statements decay gradually as they become more probably out-of-date.)&lt;br /&gt;&lt;br /&gt;The other issue to address is how a goal system for a probabilistic-relational version should work, but, I guess that's too big a topic for the moment... though honestly I started writing this post &lt;i&gt;to&lt;/i&gt; talk about that :) Such is blogging.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-8016745202423332389?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/8016745202423332389/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/06/procedural-logic.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8016745202423332389'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8016745202423332389'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/06/procedural-logic.html' title='Procedural Logic'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-7025169592789702861</id><published>2010-05-18T20:11:00.000-07:00</published><updated>2010-06-06T11:48:37.221-07:00</updated><title type='text'>SOAR workshop so far</title><content type='html'>Daniel and I have just come back from the pre-workshop dinner. Our ride (another attendee) had a delayed flight, and we had to call a taxi; we ended up being about 45 minutes late for dinner. Daniel and I ended up getting seats at a separate table, so we didn't talk to anyone right away; eventually, though, we got some good discussions.&lt;br /&gt;&lt;br /&gt;My primary thesis has always been the importance of expressive power in AI. This has been a reaction to the overwhelming majority of AI research being done with propositional methods.&amp;nbsp; However, this method tends to fall on deaf ears for several reasons.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;If I don't phrase myself carefully, it sounds like I'm arguing the anti-AI position that many people abuse Gödel's theorem to argue. This is thanks, for example, to my use of words like "uncomputability" (and if I invoke Gödel's theorem, of course, that just makes it worse). If I say "system X can't represent uncomputable mathematical concepts, but humans can," it really makes people think that I mean "computers can't handle uncomputable mathematical concepts, but humans can." I don't mean that. Rather, I want to soundly answer the anti-AI position via a formal theory of which systems are capable of climbing the Tarski hierarchy on their own (and how they do it).&lt;/li&gt;&lt;li&gt;John Laird and the other fellow in the front seat when he was giving me a ride back to the hotel (sorry, don't remember the name! EDIT: Paul Rosenbloom) pointed out that worrying about expressive completeness is far less common than trying to find less-expressive subsets of existing systems which are more efficiently manipulated. Daniel put it another way: one shouldn't just worry about what a system can do &lt;i&gt;in principle&lt;/i&gt; but about what it can do in reasonable amounts of time. Systems which are "more expressive" might be able to do far less in practice. My response to this is just to say that I prefer systems which use heuristics to stay in the space of efficient reasoning, rather than reduced expressiveness... this is a weak-sounding point, though I do think it is the right response...&lt;/li&gt;&lt;li&gt;If I'm already talking to someone who believes that expressive completeness is an important issue, then the remainder of my pet theory consists of ways of generalizing from a propositional system to more expressive models. This typically becomes a situation of singing to the quire, because the person will already be past propositional models and into at least first-order. All I have left to argue is the seemingly reversed position that most reasoning should actually take place at the sub-first-order level (say, 1/2 n-tuple markov models, 1/4 context-free, 1/8 context-sensitive, 1/16 first-order, 1/32 2nd-order...). This is related to the response for the 2nd point.&lt;/li&gt;&lt;li&gt;My project is really quite an abstract one, because at present my best guess is that so long as a system is doing probabilistic relational learning (ie, at least 1st order), and so long as it approximates Solomonoff induction in its ability to discover hidden relations in data, then it satisfies the requirements to climb the Tarski hierarchy. Because of this, I can't argue that there is a pitfall of expressive &lt;i&gt;in&lt;/i&gt;completeness waiting to trap anyone who isn't careful. So, it seems I should only be arguing for those basic requirements rather than arguing that a general theory telling us how to climb the Tarski hierarchy must be found. If I can argue for those points on separate grounds, a general theory saying that those are sufficient is more a nice factoid for those worried about the foundations of mathematics than a guiding principle for AI. (Though, there's still the possibility that the required structure for a system is significantly more specific than Solomonoff induction, in which case the theory may have definite value.)&lt;/li&gt;&lt;/ol&gt;So, should I give up what has been my primary thesis for 4 years? The banner "Expressive completeness is the central problem!" may not be the best way of organizing and presenting my ideas at present... in any case, it's clear that I need to think about my arguments more carefully.&lt;br /&gt;&lt;ol&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-7025169592789702861?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/7025169592789702861/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/05/soar-workshop-so-far.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7025169592789702861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/7025169592789702861'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/05/soar-workshop-so-far.html' title='SOAR workshop so far'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-619208365029595282</id><published>2010-05-07T23:10:00.000-07:00</published><updated>2010-05-07T23:10:43.740-07:00</updated><title type='text'>Action Logic</title><content type='html'>Continuing a thought in &lt;a href="http://lo-tho.blogspot.com/2010/04/more-on-self-modifying-logic.html"&gt;this&lt;/a&gt; post.&lt;br /&gt;&lt;br /&gt;A normal axiomatic logic pins down which inferences are permissible, but makes no comment on how these inferences are to be made. For real implementations, we've come up with a plethora of inference strategies-- however, the axiomatic system is always treated as separate from the inference strategies. My idea of action logic is to merge the two, hopefully in order to do nifty things like letting the system reason about its own inference strategies in a smooth way.&lt;br /&gt;&lt;br /&gt;In other words, what we want is to nicely integrate logic and programming. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Several attempts to do this already exist. One example is Prolog. Prolog does not, however, satisfy the desire to allow deductive reasoning to determine flow of inference; Prolog just performs a backtracking resolution algorithm on horn clauses, paying some heed to meta-logical control flow statements that can be inserted.&lt;br /&gt;&lt;br /&gt;However, I think we can take some inspiration from a predecessor of Prolog, known as Planner. Planner worked through a backtracking mechanism similar to that of Prolog, but more expressive. It used a separate notion of "goal" and "fact" to drive reasoning: if a goal is present, then the system starts reasoning about how to achieve that goal. This can involve the creation of subgoals which drive more reasoning, the execution of programs, and performance of inference steps. &lt;br /&gt;&lt;br /&gt;One &lt;i&gt;could&lt;/i&gt; argue that Prolog makes this distinction, but it is much less explicit: since programs and statements are the same thing, the creation of subgoals always just corresponds to the conditions in an if-then. (In PLANNER one can specify arbitrary advice for what subgoal to create at a given step; it is not so strongly tied to the form of the logical assertions in the knowledge base.)&lt;br /&gt;&lt;br /&gt;PLANNER also has the feature that facts can be removed from the knowledge base when they become false, ie, the system does not need to index facts to time in order to model facts changing; this is important for efficiency, though it is less "logically pure" than the alternative.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://hdl.handle.net/1721.1/5833"&gt;This&lt;/a&gt; is a good overview of Planner for the interested. More recent systems such as &lt;a href="http://sitemaker.umich.edu/soar/home"&gt;SOAR&lt;/a&gt;, &lt;a href="http://www.cliki.net/Screamer"&gt;SCREAMER&lt;/a&gt;, &lt;a href="http://userweb.cs.utexas.edu/%7Emoore/acl2/"&gt;ACL2&lt;/a&gt;, et cetera have many of the same features; I am really picking out PLANNER mainly for its historical significance.&lt;br /&gt;&lt;br /&gt;In any case. I said that I was trying to &lt;i&gt;merge&lt;/i&gt; declarative and procedural, yet I am praising PLANNER's explicit goal system over Prolog's implicit way of dealing with goals. Why? Because it is more powerful: we can provide more inference guidance in PLANNER than in Prolog.&lt;br /&gt;&lt;br /&gt;In any case, I am not saying that Planner is a sufficient action logic, either! In fact, I'd say, let's question the basic mechanism which it shares with Prolog: if the system has goal G, and knows a statement of the form "S implies G," then it will automatically make a subgoal S. (For the curious: &lt;i&gt;Both PLANNER and Prolog will select out of several statements "X implies G", try that X, give up if that subgoal turns out infeasible, and try the next. This technique is applied recursively, creating sub-sub goals, et cetera. In Prolog this is the only mechanism at play; in PLANNER there are more conditions to be met for this to happen, there are other mechanisms at work, and the programmer can add arbitrary mechanisms.&lt;/i&gt;)&lt;br /&gt;&lt;br /&gt;What happens when we vary this mechanism? Well, there are a lot of options. We could create a reinforcement-type agent which attempted to learn what inferences to draw and what subgoals to create and when, getting rewarded whenever it meets a goal specified by the programmer. We could create a brute-force system which keeps proving theorems until it happens upon one of the form "P implies G" where P is a fully specified sequence of actions, rather than just a subgoal-- the system then does P. We could produce a system like &lt;a href="ftp://ftp.idsia.ch/pub/techrep/IDSIA-16-00.ps.gz"&gt;Hutter's asymptotically optimal algorithm&lt;/a&gt; by creating a mechanism which searches for plans of action, replacing the current plan when the system finds a proof that switching to a different plan will result in a better outcome.&lt;br /&gt;&lt;br /&gt;The weakest choice is to just require that if the goal "do X" is asserted, and X is a sequence of actions, the system does them immediately. This just means that the system is programmable. (X can be an arbitrary program because it can include branching goto type statements.) Any other choice of system can be modeled in this one by telling it to behave in that way.&lt;br /&gt;&lt;br /&gt;Stronger choices will provide more built-in useful behavior, but will also have more built-in assumptions.&lt;br /&gt;&lt;br /&gt;My desire is to create an abstract framework, not something that pins things down to a specific implementation. So, it seems like what I'm looking for is an abstract "should-ness" concerning what inferences to draw when, and what subgoals to make when.&lt;br /&gt;&lt;br /&gt;Much of the logic discussed under the title of action logic &lt;a href="http://plato.stanford.edu/entries/logic-action/"&gt;in the Stanford  Encyclopedia&lt;/a&gt; is somewhat relevant here, though the logics there are definitely not action logics in the sense I intend: they are just normal axiomatic systems, which must rely on outside inference procedures.&lt;br /&gt;&lt;br /&gt;In general, my approach to AI is very much in the line of probabilistic relational methods: I advocate a move to probabilistic reasoning where just boolean reasoning is present, and to relational methods where just propositional methods are present. In this case, that means that what I see as the logical next step is to research probabilistic generalizations of PLANNER... it seems this would force many of the simplistic methods employed in PLANNER to be "broken open," so to speak, getting out the chaff and determining what can be salvaged and improved for the probabilistic setting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-619208365029595282?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/619208365029595282/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/05/action-logic.html#comment-form' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/619208365029595282'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/619208365029595282'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/05/action-logic.html' title='Action Logic'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-3711190307331889926</id><published>2010-04-14T10:46:00.000-07:00</published><updated>2010-04-14T10:46:50.361-07:00</updated><title type='text'>Postmodernism</title><content type='html'>I've been thinking for a while about why postmodernism might be a tempting intellectual position. There are certain logically sound logical arguments that support particular assertions associated with postmodernism, as well as certain practical issues which might make it tempting. I will try to illustrate my position against postmodernism by giving the best arguments I can muster &lt;i&gt;for &lt;/i&gt;it-- the intention is to show why we can get this far, but no further.&lt;br /&gt;&lt;br /&gt;I won't be too surprised if someone reads this and says "but postmodernism &lt;i&gt;doesn't&lt;/i&gt; try to go any further than that with the argument,"or further, "that's not what postmodernism is at all;" these arguments are based on my impression of postmodernist thought, not based on a formal education in the subject.&lt;br /&gt;&lt;br /&gt;Since this thing would be tl/dr (or worse, tl/dw) if I did it as one post, I'll just post an outline for now; the points below will become links as I write individual posts. (I may also come back and add/delete/edit points, of course.)&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Practical Reasons &lt;/i&gt;&lt;br /&gt;&lt;br /&gt;-making room for others to doubt your beliefs&lt;br /&gt;-the intellectual proliferation of hypotheticals&lt;br /&gt;-Flexibility of language and definitions; anti-prescriptivism&lt;br /&gt;-disagreements are often based on matters of language (differing definitions) rather than matters of fact&lt;br /&gt;-such things as tables, chairs, etc don't exist (strictly speaking)&lt;br /&gt;-Scientific theories are approximations&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Logical Reasons&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;-loeb's theorem&lt;br /&gt;-algorithmic complexity is relative&lt;br /&gt;-We can always "interpret" talk in any logical system or system of axioms as just hypothetical first-order talk (by throwing out the naturalness constraint on interpretations)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-3711190307331889926?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/3711190307331889926/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/04/postmodernism.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/3711190307331889926'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/3711190307331889926'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/04/postmodernism.html' title='Postmodernism'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-4566255075500433048</id><published>2010-04-12T10:52:00.000-07:00</published><updated>2010-04-12T10:52:34.350-07:00</updated><title type='text'>More on Self-Modifying Logic</title><content type='html'>This is a follow-up to &lt;a href="http://lo-tho.blogspot.com/2010/03/self-modifying-logic.html"&gt;this post&lt;/a&gt;, and will simply attempt to fill in some details.&lt;br /&gt;&lt;br /&gt;In order to properly refer to a system as "self modifying logic," one would want the logic to modify itself, not merely to be a system of logic which gets modified by an overseer-type system... of course. I did not make it clear in the previous post how this would be accomplished.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It seems to me that a prerequisite would be what one might call an &lt;i&gt;action logic&lt;/i&gt;: a logic which isn't just a way of representing and manipulating declarative knowledge, but which is also capable of making and executing decisions, including decisions about its own reasoning procedures.&lt;br /&gt;&lt;br /&gt;Lacking a full-fledged action logic, I can only offer a first approximation (loosely inspired by Markus Hutter's &lt;a href="http://www.hutter1.net/ai/pfastprg.htm"&gt;algorithm for all well-defined problems&lt;/a&gt;): We have a set of axioms describing what kind of outcomes are better and worse, and how our current action policy affect them. We have some standing&amp;nbsp; we have a reasoning system which uses some declarative logic and is proving things in an exhaustive manner (Hutter uses &lt;a href="http://www.idsia.ch/%7Ejuergen/mljssalevin/node4.html"&gt;Levin search&lt;/a&gt;). We keep track of the proofs of action-policy values, and at all times use the best policies currently known of.&lt;br /&gt;&lt;br /&gt;And, of course, the theorem proving method (initially an exhaustive search) should ideally be a part of the policy so that it can be modified. (This becomes more like &lt;a href="http://www.idsia.ch/%7Ejuergen/goedelmachine.html"&gt;Godel machines&lt;/a&gt;, which I'm sure I link to often enough.)&lt;br /&gt;&lt;br /&gt;Now, to make this usable, we've got to be selecting policies based on &lt;i&gt;estimated value&lt;/i&gt; rather than provable values; thus the logic should include probabilistic reasoning about the policies based on limited reasoning and limited testing of the methods. (ie, the probabilistic reasoning should be broad enough to admit such spot-tests as &lt;i&gt;evidence&lt;/i&gt;, something that requires a solution to the problem of &lt;a href="http://lesswrong.com/lw/179/counterfactual_mugging_and_logical_uncertainty/"&gt;logical uncertainty&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;So, obviously, many details need filled in here.&lt;br /&gt;&lt;br /&gt;In any case, one of the available actions will be to &lt;i&gt;change the logic itself&lt;/i&gt;, substituting a more useful one.&lt;br /&gt;&lt;br /&gt;The utility of such changes should in principle be derived from the basic utility function (that is, our preference ordering on possible outcomes); however, for the discussion in the previous post, for analysis of possible behaviors of such systems, and possibly for practical implementations, a notion of goodness that applies directly to the logic itself is helpful.&lt;br /&gt;&lt;br /&gt;So, questions:&lt;br /&gt;&lt;br /&gt;--What might a full-featured "action logic" look like?&lt;br /&gt;--Under what conditions will the system tend to add new mathematical truths to the set of axioms? Under what conditions will the system tend to add truth predicates, or in any case, metalanguages?&lt;br /&gt;--Under what conditions will the system satisfy the backwards-compatibility requirement that I mentioned last time?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-4566255075500433048?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/4566255075500433048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/04/more-on-self-modifying-logic.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4566255075500433048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/4566255075500433048'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/04/more-on-self-modifying-logic.html' title='More on Self-Modifying Logic'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-8507507396737723835</id><published>2010-04-01T19:11:00.000-07:00</published><updated>2010-04-10T13:11:24.745-07:00</updated><title type='text'>More on Curiosity</title><content type='html'>Follow up to &lt;a href="http://lo-tho.blogspot.com/2010/03/formalized-curiosity.html"&gt;this&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I took a look at &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.6826&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;this paper&lt;/a&gt; on curiosity, which includes a good review on more recent work than what I had read about when I wrote the previous post. One nice insight that has been made is that it is useful to split up the value function based on &lt;i&gt;actual&lt;/i&gt; reward from the value function based on "exploration bonus". These can then added together to make the final value. One can still think of the exploration bonus in terms of optimism, but another way to think of it is that the system is really just trying to calculate the benefit of exploring a particular option (that is, the &lt;i&gt;learning&lt;/i&gt; benefit), and adding that to the direct benefit of choosing the route.&lt;br /&gt;&lt;br /&gt;In this account, the confidence-interval method mentioned in the last post is seen as a method of estimating the learning benifit of a state as the distance between the most probable average utility and the top of the X%-confidence range for the average utility.&lt;br /&gt;&lt;br /&gt;A related estimate might be the expected information gain...&lt;br /&gt;&lt;br /&gt;It's not yet clear to me how to make an estimate that approximates the &lt;i&gt;true&lt;/i&gt; benefit in the limmit.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-8507507396737723835?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/8507507396737723835/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/04/more-on-curiosity.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8507507396737723835'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/8507507396737723835'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/04/more-on-curiosity.html' title='More on Curiosity'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-1342729653343475312</id><published>2010-03-31T17:44:00.000-07:00</published><updated>2010-03-31T17:50:10.782-07:00</updated><title type='text'>Formalized Curiosity</title><content type='html'>I've been reading up on reinforcement learning lately. An interesting question, perhaps the &lt;i&gt;first&lt;/i&gt; question, is how to balance exploration with exploitation.&lt;br /&gt;&lt;br /&gt;The basic idea is this: if we are faced with a number of options which have unknown probabilistic payoffs, then we can either choose the option that has had the best average payoff so far (ie, &lt;i&gt;exploit&lt;/i&gt; our current knowledge), or choose a different option in order to gain more evidence about how good that option is (ie, &lt;i&gt;explore&lt;/i&gt; the avaliable options).&lt;br /&gt;&lt;br /&gt;In principle, we can compute the optimal choice at a given point in a straightforward way, but this requires looking at the entire future (up to some cutoff we choose) and is highly inefficient. It seems to be critical to actual implementation that we approximate the utility of exploration in a more local way, making exploration an essential element of our decision procedure rather than a derived property that comes from planning far enough ahead. &lt;br /&gt;&lt;br /&gt;Unfortunately, the simplistic strategy of choosing to explore totally at random some fraction of the time, and exploit the rest of the time, seems all-too-common. This fails to take into accound factors such as the current risk of exploration, weighing exploration to favor more promising options, et cetera.&lt;br /&gt;&lt;br /&gt;An interesting class of algorithms for improving upon this is &lt;i&gt;optimistic&lt;/i&gt; greedy methods: act as if you're exploiting, but use optimistic estimates. It looks like there are several ways of fleshing this out; perhaps the simplest is to come up with an interval within which the utility of an option falls with probability X (say, 95% confidence). Our "actual estimate" for the utility might be in the middle of the range, but we can act as if we think the top value is the right one in order to encourage exploration: if an option has not been explored very well, it will have a broad range, so that the top of the range may be higher than the actual current-best-bet even if the middle is pretty far below.&lt;br /&gt;&lt;br /&gt;This &lt;i&gt;can&lt;/i&gt; lead to too little exploration: a string of bad luck associated with an option which is in actuality good can push even the high estimate of the payoff to below the actual payoff of some other option, so that it will never be explored again. This becomes less and less likely the more optimistic we are (ie, the wider our con interval), but it's still a disadvantage.&lt;br /&gt;&lt;br /&gt;One might ask, what intervals should we pick? Well, it seems like it depends how long we are going to be choosing between the particular set of options. If we are only choosing one more time, our intervals should be of width 0-- we should exploit rather than explore. Similarly, if we only have a few more times to choose, we should choose with relatively narrow intervals, doing relatively little exploration; if we have a great many left, we should have very broad intervals, exploring a great deal. (Perhaps I should try to work out the optimal intervals here.)&lt;br /&gt;&lt;br /&gt;So, if we have a known finite number of moves to make, we should move our interval range down to 0 via some formula.&lt;br /&gt;&lt;br /&gt;The interval width can be though of as the "curiosity" of the system at a particular point in time. If we're using a 95% confidence interval, then we are saying that for each option we don't try, we want to be at least 95% certain that it is worse than the option we do try: any less confident and we'd prefer to experiment.&lt;br /&gt;&lt;br /&gt;What if we have an infinite number of moves, and want a strategy which will be guaranteed to converge to the optimal strategy at infinity? (Of course, what we really care about is &lt;i&gt;how quickly&lt;/i&gt; we converge, but let's first worry about converging at all.) I said earlier that, with a fixed confidence, this would be impossible: we can always converge to the wrong thing. However, the higher the confidence we require, the less probable this is. So, we can gradually increase our confidence requirement over time to guarantee convergence. This can be thought of as increasing our epistemic standardsover time: if we've only been doing something for a few minutes, we're ok with only being 75% certain that it's better than the alternatives; but if we've been following a strategy for millions of years, we want to be &lt;i&gt;damn sure&lt;/i&gt; by that point that it's the strategy we shoul dactually have been following.&lt;br /&gt;&lt;br /&gt;There are several paradoxical things going on here:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;While for finite-time problems we want to &lt;i&gt;decrease&lt;/i&gt; our curiosity over time, for infinite-time cases we want to &lt;i&gt;increase&lt;/i&gt; it; the infinite-time case does not appear to be the limit of longer and longer finite-time cases.&lt;/li&gt;&lt;li&gt;Higher epistemic standards (ie, requiring greater confidence) correspond to &lt;i&gt;more&lt;/i&gt; optimism, not less, even though one might think of optimism as a sort of epistemic &lt;i&gt;dis&lt;/i&gt;honesty (pretending the expected value is higher than it is). It's a consequence of differing definitions, not a real contradiction, but I think it's curious.&lt;/li&gt;&lt;/ul&gt;&lt;a href="http://www.cs.washington.edu/research/jair/volume4/kaelbling96a-html/node14.html"&gt;source&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Edit-- Questions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;What are the formulas for optimal curiosity levels&amp;nbsp; in the finite and infinite versions?&lt;/li&gt;&lt;li&gt;Can we make sure that a system using these approximate exploration strategies will still approximate the true optimal strategy as it is given more computation time with which to plan ahead?&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-1342729653343475312?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/1342729653343475312/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/03/formalized-curiosity.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1342729653343475312'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/1342729653343475312'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/03/formalized-curiosity.html' title='Formalized Curiosity'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-5492064416861380494</id><published>2010-03-30T21:30:00.000-07:00</published><updated>2010-03-30T21:30:16.266-07:00</updated><title type='text'>Self-Modifying Logic</title><content type='html'>This post is inspired by the &lt;a href="http://multiverseaccordingtoben.blogspot.com/2010/03/golem-eats-chinese-parent-toward-agi.html"&gt;GOLEM architecture&lt;/a&gt; that Ben Goertzel recently wrote a post about. In discussing the architecture on the Singularity list, we came up with the idea of allowing an artificial intelligence to alter its own utility function computation in a controlled way: it evaluates potential new utility calculations by comparing the outputs to the outputs of the original utility function, looking for computations that are essentially the same but faster.&lt;br /&gt;&lt;br /&gt;Now there are some fun (and potentially important) questions about how to deal with uncertainty in the original utility function, but I won't go into these here.&lt;br /&gt;&lt;br /&gt;The discussion made me think about the following sort of system:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Start with a set of axioms and rules of inference to start with, and if you like, also a set of statements about the world (perhaps sensory data).&lt;/li&gt;&lt;li&gt;Look for new logics which can derive what the old can derive, but possibly more.&lt;/li&gt;&lt;li&gt;Judge these based on some criteria; in particular, shortness of derivations and simplicity of the new logics both seem sensible.&lt;/li&gt;&lt;/ol&gt;This potentially solves my problem of coming up with a system that iteratively learns higher and higher levels of the Tarski hierarchy. If the system keeps augmenting itself with a new truth predicate, then it will keep increasing the efficiency with which it can derive the truths from the initial system (by a result of Goedel for the type-theoretic hierarchy which if I'm not mistaken will hold similarly for the Tarski hierarchy; see his &lt;i&gt;On the Length of Proofs&lt;/i&gt;). This does &lt;i&gt;not&lt;/i&gt; show that the Tarski hierarchy is the &lt;i&gt;best&lt;/i&gt; way of increasing the power of the system, but I am perfectly OK with that.... what I'd like, however, would be some guarantee that (some canonical axiomatization of) each level of the Tarski hierarchy can at least eventually be &lt;i&gt;interpreted&lt;/i&gt; (ie, as we keep adding further extensions, we interpret more of the Tarski hierarchy, without bound). I do not know how to show this, if it's true.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-5492064416861380494?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/5492064416861380494/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/03/self-modifying-logic.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/5492064416861380494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/5492064416861380494'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/03/self-modifying-logic.html' title='Self-Modifying Logic'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-2585073106047945944</id><published>2010-03-22T16:53:00.000-07:00</published><updated>2010-03-23T20:30:25.487-07:00</updated><title type='text'>A Few More Thoughts on Guidance</title><content type='html'>&amp;nbsp;So: if we want to make good inferences, is it enough to look for a programming language which describes good inferences succinctly, and then proceed to carry out actions which have succinct descriptions in that language?&lt;br /&gt;&lt;br /&gt;Specifically, can this strategy be arranged such that it is (in some sense) equivalent with the strategy of searching for the inference algorithm that's got the lowest expected time (or, lowest average time on randomly generated test problems typical of what's been seen so far)?&lt;br /&gt;&lt;br /&gt;If we've got a bunch of trial runs of different inference algorithms, then the absolute best language in the sense that I'm looking for would be one that could state &lt;i&gt;just&lt;/i&gt; the algorithm with the best average time. Less-good languages would be judged by how good the inference algorithms they could state would be. This should make clear part of the divergence from the simple concept of compression: we're not trying to compress all the data, just the "best" data. We've got a range of goodness for our data samples; we want to have very short descriptions for the best cases, but very long ones for the worst cases.&lt;br /&gt;&lt;br /&gt;Another difference is that we don't care in the same way about the simplicity of the result. It might be a good idea to favour simpler languages when we only know how good they are on some data, since shorter ones will probably generalise their behaviour more cleanly to more data. Yet, if we could calculate the actual average runtime on truly average data, we would no longer consider shortness to be a concern; not unless we ran a risk of exhausting the memory. With compression, this is not the case.&lt;br /&gt;&lt;br /&gt;One might still base an effective algorithm on the loose metaphor: perhaps implement the Levin-like search for good inference strategies, and then some compression-like meta-search for strategy programming languages that tend to result in good strategies. However, the basis in compression doesn't appear to be totally rigorous.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-2585073106047945944?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/2585073106047945944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/03/few-more-thoughts-on-guidance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/2585073106047945944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/2585073106047945944'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/03/few-more-thoughts-on-guidance.html' title='A Few More Thoughts on Guidance'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-755046735302346659</id><published>2010-03-22T10:52:00.000-07:00</published><updated>2010-03-23T20:32:20.573-07:00</updated><title type='text'>Compressive Inference Guidance</title><content type='html'>I just did a post on inference guidance, but I've got a lot more thoughts (which may or may not be worth anything).&lt;br /&gt;&lt;br /&gt;The first one is a curiosity I've had for a while: a good way of predicting is to compress sensory information. Can compressing good inference methods lead to a similarly good method of inference?&lt;br /&gt;&lt;br /&gt;The previous post indicated at least two ways in which compression can be relevant: first, the Levin-like search gives more priority to shorter inference guidance methods, so although it's not actually trying to compress anything, the result will tend to be compressive of the solution (and of its proof). One might expect that the more compressive a solution, the better it will generalize. However, note that what we're worried about is speed, not correctness--- an inference method cannot harm the correctness of a result, only the speed with which it is inferred. (It can delay the inference perpetually, though.) So we want the &lt;i&gt;speed &lt;/i&gt;to generalize. A program that is more compressive of the same data tends to take longer to execute.&lt;br /&gt;&lt;br /&gt;Still, I'd expect &lt;i&gt;some &lt;/i&gt;connection between compressiveness and the generalization of speed. Wouldn't you?&lt;br /&gt;&lt;br /&gt;The second definite connection is that it's useful to compress the problems the system has solved so far, in order to anticipate the coming problems and fine-tune to them. One way of thinking about this is that we look for a good &lt;i&gt;language &lt;/i&gt;to describe the problems so far. Then, to generate new problems, we generate random descriptions in that language.&lt;br /&gt;&lt;br /&gt;It would be nice, to apply this to the search for strategies-- rather than looking for a good strategy, look for a good programming language, in which randomly-generated strategies tend to be good! Remember the best strategies, and use them to continue revising the language and improving it. The question is, does this scheme have normative value? Will it systematically approximate the optimal behavior?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-755046735302346659?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/755046735302346659/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/03/compressive-inference-guidance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/755046735302346659'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/755046735302346659'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/03/compressive-inference-guidance.html' title='Compressive Inference Guidance'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-6544443338938630117</id><published>2010-03-22T09:55:00.000-07:00</published><updated>2010-03-22T09:55:20.484-07:00</updated><title type='text'>Guidance of Inference</title><content type='html'>When I first switched to this new blog, I thought that the point was going to be to take it more seriously, writing in a more organised fashion and being more careful to only say things I know to be true.&lt;br /&gt;&lt;br /&gt;The truth is, though, that just makes me not post. A more informal blog would do much better. One advantage of the change is that the blog's title reflects a much broader range of possible topics, such as the previous post that was just some playing with trigonometry.&lt;br /&gt;&lt;br /&gt;So, here are some disorganised thoughts about inference guidance for AI.&lt;br /&gt;&lt;br /&gt;What I want is a normative theory-- something that says "This is how it should be done, in principle." IE, a normative answer. Practical inference systems are of interest, but a guiding principle that tells us what sorts of practical implementations will tend to be good is very important.&lt;br /&gt;&lt;br /&gt;This is somewhat related to my wondering about (normative) systems for handling mathematical uncertainty; the best way to guide inference may also involve, or be involved in, the best way to handle uncertainty about the potential result of an inference.&lt;br /&gt;&lt;br /&gt;The main example of inference I'm thinking of is optimisation or constraint satisfaction.&lt;br /&gt;&lt;br /&gt;Levin search is asymptotically optimal, but does not use the problem description in any way (like genetic programming). Jurgen's "Optimal Ordered Problem Solver" ("Oops," an unfortunate acronym) is optimal in a stronger sense, but still does not use problem descriptions. Hutter's asymptotically optimal algorithm for well-defined problems uses problem descriptions. So, several areas for improvement present themselves. Something that is optimal in Jurgen's sense ("bias-optimal") but which uses problem descriptions rather than searching blindly would be nice. Also, bias-optimality is a strong sort of optimality, but does not capture everything Oops is meant to do: Oops is meant to learn from its experience (hence "ordered problem" solver). Jurgen asserts that it is optimal in a certain sense, but I think there is room for improvement.&lt;br /&gt;&lt;br /&gt;One way of using problem descriptions would be to just hand them to Oops as if it could understand them, and judge its output on that assumption. It would be forced to learn to use the problem descriptions. However, this would be quite inefficient.&lt;br /&gt;&lt;br /&gt;A more interesting way would be to use Levin search at the inference-guidance level. Execute all possible inference-guidance programs in parallel, with the execution time they are given weighted by their simplicity. Solutions would no longer have to be checked, since results would automatically be correct; a proof of correctness would automatically come with the solution.&lt;br /&gt;&lt;br /&gt;Oops could be modified in the same way. (Oops can be thought of as just an efficient implementation of Levin search with added features for learning from previous success.)&lt;br /&gt;&lt;br /&gt;Now, Oops does several things to learn from experience. (I'll use the language of inference guidance, making the assumption that Oops has been modified to work via inference guidance rather than direct solution generation.)&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;It attempts to apply the so-far-sucessful inference guidance program to the new problem, searching extensions of the program if it's not yet a complete program (ie, if the new situation causes the execution to reach the end of the code where before it didn't); half the attention is directed to this, while the other half searches for a fresh solution (to &lt;i&gt;all&lt;/i&gt; problems, not just to the new one).&lt;/li&gt;&lt;li&gt;Inference guidance programs are also allowed to provide search orderings for continuations, so that it's possible that a partial inference guidance program represents a good search heuristic for inference guidance programs; this is particularly useful in the situation mentioned above, when a program turns out to be incomplete for a new example.&lt;/li&gt;&lt;li&gt;New programs are allowed to copy and modify old ones.&lt;/li&gt;&lt;/ol&gt;To me this appears haphazard and hacky, though it has its good points. What can we do for a more normatively forceful strategy of learning?&lt;br /&gt;&lt;br /&gt;Oops takes execution time into account just by the fact that strategies which are faster will tend to be be found more quickly than slower strategies. This is because if it optimised for execution time directly, then it would quickly overmatch: once it found a solution at all, the fastest-executing program would simply be the one that spit out that answer when given that input. The situation may improve with the amount of experience of the system (since after a large number of instances, a lookup table may no longer be the fastest way of computing the answers). Still, this seems wrong; we want the system to use extra cycles to look for inference strategies that are optimized to be quick on probable questions, not just on previously-experienced questions.&lt;br /&gt;&lt;br /&gt;It seems reasonable, then, to search for a probability distribution which would generate the questions observed so far. Using the current-best estimate, the system should look for the quickest solution not just to the current known problems, but to the expected future problems as well.&lt;br /&gt;&lt;br /&gt;This could be done in several ways. Perhaps potential future instances are generated at random from the distribution, and inference methods are tested against them. Perhaps a strategy more like Hutter's asymptotically optimal one is used: the system might try to &lt;i&gt;prove&lt;/i&gt; which strategies will be better. In any case, the objective will be clear: optimize speed for expected problems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-6544443338938630117?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/6544443338938630117/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/03/guidance-of-inference.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/6544443338938630117'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/6544443338938630117'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/03/guidance-of-inference.html' title='Guidance of Inference'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-266928470727008321.post-3926618822298613242</id><published>2010-02-15T22:02:00.000-08:00</published><updated>2010-02-15T22:53:47.774-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='trigonometry'/><title type='text'>Defining Sine</title><content type='html'>I don't know if this is a sighn of things to come, but the first post is not entirely on-topic. :)&lt;br /&gt;&amp;nbsp;I suppose I should link back to my older blog before I begin:&lt;br /&gt;&lt;br /&gt;http://dragonlogic-ai.blogspot.com/&lt;br /&gt;&lt;br /&gt;&amp;nbsp;The move roughly indicates me realizing that I now know too much logic &amp;amp; math to be able to write to a general audience without a fair amount of exposition, and also enough that I want to correct much of what my previous self said.&lt;br /&gt;&amp;nbsp;Anyway.&lt;br /&gt;&amp;nbsp;It's bothered me for some time that the functions sine and cosine are given, in a sense, without definition (in textbooks, that is). We didn't even get a method for calculating these, as with the algebraic functions we'd learned.&amp;nbsp; I know, of course, that they are defined in terms of angles and the unit circle. Angles, however, are not given an algebraic definition either... fortunately, the unit circle is defined:&lt;br /&gt;&lt;br /&gt;x&lt;sup&gt;2&lt;/sup&gt; + y&lt;sup&gt;2&lt;/sup&gt; = 1&lt;br /&gt;&lt;br /&gt;It struck me recently that the derivative of sine&lt;sup&gt;-1&lt;/sup&gt;, which is 1/√(1-x&lt;sup&gt;2&lt;/sup&gt;), looks like it is somehow gotten from the above; specifically, from the equivalent:&lt;br /&gt;&lt;br /&gt;y = ±√(1-x&lt;sup&gt;2&lt;/sup&gt;)&lt;br /&gt;&lt;br /&gt;Rather than look for a proof of this derivative, I decided to see if I could figure it out for myself.&lt;br /&gt;&amp;nbsp;Now, if &lt;i&gt;I&lt;/i&gt; had been given the task of finding the derivative of sine&lt;sup&gt;-1&lt;/sup&gt;, I would have taken the implicit derivative, which gives you the derivative of &lt;i&gt;any&lt;/i&gt; inverse function in terms of the derivative of the original function:&lt;br /&gt;&lt;br /&gt;f(f&lt;sup&gt;-1&lt;/sup&gt;(x)) = x&lt;br /&gt;f(f&lt;sup&gt;-1&lt;/sup&gt;(x))&lt;b&gt;'&lt;/b&gt; = x&lt;b&gt;'&lt;/b&gt;&lt;br /&gt;f&lt;b&gt;'&lt;/b&gt;(f&lt;sup&gt;-1&lt;/sup&gt;(x))f&lt;sup&gt;-1&lt;/sup&gt;&lt;b&gt;'&lt;/b&gt;(x) = 1&lt;br /&gt;f&lt;sup&gt;-1&lt;/sup&gt;&lt;b&gt;'&lt;/b&gt;(x) = 1/ f&lt;b&gt;'&lt;/b&gt;(f&lt;sup&gt;-1&lt;/sup&gt;(x))&lt;br /&gt;&lt;br /&gt;&amp;nbsp;So, for any f, you can find the inverse's derivative if you know the inverse and the derivative.&lt;br /&gt;&amp;nbsp;In particular,&lt;br /&gt;&lt;br /&gt;sine&lt;sup&gt;-1&lt;/sup&gt;&lt;b&gt;'&lt;/b&gt;(x) = 1/cos(sin&lt;sup&gt;-1&lt;/sup&gt;(x))&lt;br /&gt;&lt;br /&gt;Of course, this is not nearly as useful as the derivative given in my textbook (for finding integrals of algebraic functions, that is).&lt;br /&gt;&amp;nbsp;So, the question I ask myself is: &lt;i&gt;How can we define &lt;/i&gt;&lt;i&gt;sine&lt;sup&gt;-1&lt;/sup&gt; &lt;/i&gt;&lt;i&gt;algebraically?&lt;/i&gt;&lt;br /&gt;&amp;nbsp;Intuition: The "angle" is a measurement of the length along the unit circle that must be traveled, starting at (1,0) and going counter-clockwise,&amp;nbsp; to get to a particular &lt;i&gt;x&lt;/i&gt;-location.&lt;br /&gt;&amp;nbsp;From the 2nd form for the equation for the unit circle, it's fairly simple to get a parametric equation system:&lt;br /&gt;&lt;br /&gt;y = ±√t&lt;br /&gt;x = ±√(1-t)&lt;br /&gt;&lt;br /&gt;In order to keep the distance we're moving steady as we increase &lt;i&gt;t&lt;/i&gt;, we want to stick the derivatives of &lt;i&gt;x&lt;/i&gt; and &lt;i&gt;y&lt;/i&gt; in &lt;i&gt;t&lt;/i&gt; into the distance formula:&lt;br /&gt;&lt;br /&gt;&amp;nbsp;√(¼t&lt;sup&gt;-1&lt;/sup&gt; + ¼(1-t)&lt;sup&gt;-1&lt;/sup&gt;)&lt;br /&gt;&lt;br /&gt;Now, the integral of this from t=0 to t=&lt;i&gt;t&lt;sub&gt;1&lt;/sub&gt;&lt;/i&gt; for &lt;i&gt;t&lt;sub&gt;1&lt;/sub&gt;&lt;/i&gt; between 0 and 1 is what we're interested in. This is a measure of the distance we've traveled on the unit circle as t has changed. Denote this integral as a function, D(&lt;i&gt;t&lt;sub&gt;1&lt;/sub&gt;&lt;/i&gt;).&lt;br /&gt;&amp;nbsp;The inverse, D&lt;sup&gt;-1&lt;/sup&gt;(), gives the &lt;i&gt;t&lt;sub&gt;1&lt;/sub&gt;&lt;/i&gt; that we arrive at if we travel a given distance. From this, we can calculate the x-location from the original parametric equation&lt;sub&gt;&lt;i&gt;s: &lt;/i&gt;&lt;/sub&gt;x = ±√(1-&lt;i&gt;t&lt;sub&gt;1&lt;/sub&gt;&lt;/i&gt;)! So, here we have it:&lt;br /&gt;&lt;br /&gt;sin&lt;sup&gt;-1&lt;/sup&gt;(x) = ±√(1-D&lt;sup&gt;-1&lt;/sup&gt;(x))&lt;br /&gt;&lt;br /&gt;We can then proceed to define sin as the inverse of sin&lt;sup&gt;-1&lt;/sup&gt;.&lt;br /&gt;&lt;br /&gt;If anyone can find any mistakes in this, thanks in advance. Rightly speaking, to get an actual purely algebraic definition, I need to go and find that integral. Perhaps another day. :)&lt;br /&gt;&lt;br /&gt;Edit-- Fixed some errors thanks to Daniel Demski.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/266928470727008321-3926618822298613242?l=lo-tho.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://lo-tho.blogspot.com/feeds/3926618822298613242/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://lo-tho.blogspot.com/2010/02/defining-sine.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/3926618822298613242'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/266928470727008321/posts/default/3926618822298613242'/><link rel='alternate' type='text/html' href='http://lo-tho.blogspot.com/2010/02/defining-sine.html' title='Defining Sine'/><author><name>Abram Demski</name><uri>http://www.blogger.com/profile/16505965907380398166</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='24' src='http://1.bp.blogspot.com/-MLqH2-GUpN8/TsRyEhc0PZI/AAAAAAAAAEY/JN8b14qrN54/s220/mini_hood.jpg'/></author><thr:total>0</thr:total></entry></feed>
