Saturday, December 19, 2015

Levels and Levels

A system of levels related to my idea of epistemic/intellectual trust:
  1. Becoming defensive if your idea is attacked. A few questions might be fine, but very many feels like an interrogation. Objections to ideas are taken personally, especially if they occur repeatedly. This is sort of the level where most people are at, especially about identity issues like religion. Intellectuals can hurt people without realizing it when they try to engage people on such issues. The defensiveness is often a rational response to an environment in which criticism very often is an attack, and arguments like this are used as dominance moves.
  2. Competitive intellectualism. Like at level 1, debates are battles and arguments are soldiers. However, at level 2 this becomes a friendly competition rather than a problem. Intelligent objections to your ideas are expected and welcome; you may even take trollish positions in order to fish for them. Still, you're trying to win. Pseudo-legalistic concepts like burden of proof may be employed. Contrarianism is encouraged; the more outrageous the belief you can successfully defend, the better. At this level  of discourse, scientific thought may be conflated with skepticism. The endpoint of this style of intellectualism can be universal skepticism as a result.
  3. Intellectual honesty. Sorting out possibilities. Exploring all sides of an issue. This can temporarily look a lot like level 2, because taking a devil's-advocate position can be very useful. However, you never want to convey stronger evidence than exists. The very idea of arguing one side and only one side, as in level 2, is crazy -- it would defeat the point. The goal is to understand the other person's thinking, get your own thoughts across, and then try to take both forward by thinking about the issue together. You don't "win" and "lose"; all participants in the discussion are trying to come up with arguments that constrain the set of possibilities, while listing more options within that and evaluating the quality of different options. If a participant in the discussion appears to be giving a one-sided arguement for an extended period, it's because they think they have a point which hasn't been understood and they're still trying to convey it properly.

This is more nuanced than the two-level view I articulated previously, but it's still bound to be very simplistic compared to the reality. Discussions will mix these levels, and there are things happening in discussions which aren't best understood in terms of these levels (such as storytelling, jokes...). People will tend to be at different levels for different sets of beliefs, and with different people, and so on. Politics will almost always be at level 1 or 2, while it hardly even makes sense to talk about mathematics at anything but level 3. Higher levels are in some sense better than lower levels, but this should not be taken too far. Each level is an appropriate response to a different situation, and problems occur if you're not appropriately adapting your level of response to the situation. Admitting the weakness of your argument is a kind of countersignaling which can help shift from level 2 to level 3, but which can be ineffective or backfire if the conversation is stuck at level 2 or 1.

Here's an almost unrelated system of levels:
  1. Relying on personal experience and to a less extent anecdotal evidence, as opposed to statistics and controlled studies. (This is usually looked down upon by those with a scientific mindset, but again I'll be arguing that these levels shouldn't be taken as a scale from worse to better.) This is a human bias, since a personal example or story from a friend (or friend of friend) will tend to stick out in memory more vividly than numbers will. Practitioners of this approach to evidence can often be heard saying things like "you can prove anything with statistics" (which is, of course, largely true!).
  2. Relying on science, but only at the level it's conveyed in popular media. This is often really, really misleading. What the science says is often misunderstood, misconstrued, or ignored.
  3. Single study syndrome. Beware the man of one study. The habit/tactic of taking the conclusion of one scientific study as the truth. While looking at the actual studies is better than listening to the popular media, this replicates the same mistake that those who write the popular media articles are usually making. It ignores the fact that studies are often not replicated, and can show conflicting results. Another, perhaps even more important reason why single study syndrome is dangerous is because you can fish for a study to back up almost any view you like. You can do this without even realizing it; if you google terms related to what you are thinking, it will often result in information confirming those things. To overcome this, you've typically got to search for both sides of the argument. But what do you do when you find confirming evidence on both sides?
  4. Surveying science. Looking for many studies and meta-studies. This is, in some sense, the end of the line; unless you're going to break out the old lab coat and start doing science yourself, the best you can do is become acquainted with the literature and make an overall judgement from the disparate opinions there. Unfortunately, this can still be very misleading. A meta-analysis is not just a matter of finding all the relevant studies and adding up what's on one side vs the other, although this much effort is already quite a lot. Often the experiments in different studies are testing for different things. Determining which statistics are comparable will tend to be difficult, and usually you'll end up making somewhat crude comparisons. Even when studies are easily comparable,  due to publication bias, a simple tally can look like overwhelming evidence where in fact there is only chance (HT @grognor for reference). And when an effect is real, it can be due to operational definitions whose relation to real life is difficult to pin down; for example, do cognitive biases which are known to exist in a laboratory setting carry over to real-world decision-making?

Due to the troublesome nature of scientific evidence, the dialog at level 4 can sound an awful lot like level 1 at times. However, keep in mind that level 4 takes a whole lot more effort than level 1. We can put arbitrary amounts of effort into fact-checking any individual belief. While it's easy to criticize the lower levels and say that everyone should be at level 4 all the time (and they should learn to do meta-studies right, darnit!), it's practically impossible to put that amount of effort in all the time. When one does, one is often confronted with a disconcerting labrynth of arguments and refutations on both sides, so that any conclusion you may come to is tempered by the knowledge that many people have been very wrong about this same thing (for surprising reasons).

While you'll almost certainly find more errors in your thinking if you go down that rabbit hole, at some point you've got to stop. For some kinds of beliefs, the calculated point of stopping is quite early; hence, we're justified in staying at level 1 for many (most?) things. It may be easy to underestimate the amount of investigation we need to do, since long-term consequences of wrong beliefs are unpredictable (it's easier to think about only short-term needs) and it's much easier to see the evidence currently in favor of our position than the possible refutations which we've yet to encounter. Nonetheless, only so much effort is justified.

Sunday, November 15, 2015

Intuitionistic Intuitions

I've written quite a few blog posts about the nature of truth over the years. There's been a decent gap, though. This is partly because my standards have increased and I don't wish to continue the ramblings of past-me, and partly because I've moved on to other things. However, over this time I've noticed a rather large shift taking place in my beliefs about these things; specifically my reaction to intuitionist/constructivist arguments.

My feeling in former days was that classical logic is clear and obvious, while intuitionistic logic is obscure. I had little sympathy for the arguments in favor of intuitionism which I encountered. I recall my feeling: "Everything, all of these arguments given, can be understood in terms of classical logic -- or else not at all." My understanding of the meaning of intuitionistic logic was limited to the provability interpretation, which translates intuitionistic statements into classical statements. I could see the theoretical elegance and appeal of the principle of harmony and constructivism as long as the domain was pure mathematics, but as soon as we use logic to talk about the real world, the arguments seemed to fall apart; and surely the point (even when dealing with pure math) is to eventually make useful talk about the world? I wanted to say: all these principles are wonderful, but on top of all of this, wouldn't you like to add the Law of Excluded Middle? Surely it can be said that any meaningful statement is either true, or false?

My thinking, as I say, has shifted. However, I find myself in the puzzling position of not being able to point to a specific belief which has changed. Rather, the same old arguments for intuitionistic logic merely seem much more clear and understandable from my new perspective. The purpose of this post, then, is to attempt to articulate my new view on the meaning of intuitionistic logic.

The slow shift in my underlying beliefs was punctuated by at least two distinct realizations, so I'll attempt to articulate those.

Language Is Incomplete

In certain cases it's quite difficult to distinguish what "level" you're speaking about with natural language. Perhaps the largest example of this is that there isn't the same kind of use/mention distinction which is firmly made in formal logic. It's hard to know exactly when we're just arguing semantics (arguing about the meaning of words) vs arguing real issues. If I say "liberals don't necessarily advocate individual freedom" am I making a claim about the definition of the word liberal, or an empirical claim about the habits of actual liberals? It's unclear out of context, and can even be unclear in context.

My first realization was that the ambiguity of language allows for two possible views about what kind of statements are usually being made:

  1. Words have meanings which can be fuzzy at times, but this doesn't matter too much. In the context of a conversation, we attempt to agree on a useful definition of the word for the discussion we're having; if the definition is unclear, we probably need to sort that out before proceeding. Hence, the normal, expected case is that words have concrete meanings referring to actual things.
  2. Words are social constructions whose meanings are partial at the best of times. Even in pure mathematics, we see this: systems of axioms are typically incomplete, leaving wiggle room for further axioms to be added, potentially ad infinitum. If we don't pin down the topic of discourse precisely in math, how can we think that's the case in typical real-world cases? Therefore, the normal, expected case is that we're dealing with only incompletely-specified notions. Because our statements must be understood in this context, they have to be interpreted as mostly talking about these constructions rather than talking about the real world as such.
This is undoubtedly a false dichotomy, but helped me see why one might begin to advocate intuitionistic logic. I might think that there is always a fact of the matter about purely physical items such as atoms and gluons, but when we discuss tables and chairs, such entities are sufficiently ill-defined that we're not justified in acting as if there is always a physical yes-or-no sitting behind our statements. Instead, when I say "the chair is next to the table" the claim is better understood as indicating that understood conditions for warranted assertibility have been met. Likewise, if I say "the chair is not next to the table" it indicates that conditions warranting denial have been met. There need not be a sufficiently precise notion available so that we would say the chair "is either next to the table or not" -- there well may be cases when we would not assent to either judgement.

After thinking of it this way, I was seeing it as a matter of convention -- a tricky semantic issue somewhat related to use/mention confusion.

Anti-Realism Is Just Rejection of Map/Territory Distinctions

Anti-realism is a position which some (most?) intuitionists take. Now, on the one hand, this sort of made sense to me: my confusion about intuitionism was largely along the lines "but things are really true or false!", so it made a certain kind of sense for the intuitionist reply to be "No, there is no real!". The intuitionists seemed to retreat entirely into language. Truth is merely proof; and proof in turn is assertability under agreed-upon conventions. (These views are not necessarily what intuitionists would say exactly, but it's the impression I somehow got of them. I don't have sources for those things.)

If you're retreating this far, how do you know anything? Isn't the point to operate in the real world, somewhere down the line?

At some point, I read this facebook post by Brienne, which got me thinking:

One of the benefits of studying constructivism is that no matter how hopelessly confused you feel, when you take a break to wonder about a classical thing, the answer is SO OBVIOUS. It's like you want to transfer just the pink glass marbles from this cup of water to that cup of water using chopsticks, and then someone asks whether pink marbles are even possible to distinguish from blue marbles in the first place, and it occurs to you to just dump out all the water and sort through them with your fingers, so you immediately hand them a pink marble and a blue marble. Or maybe it's more like catching Vaseline-coated eels with your bare hands, vs. catching regular eels with your bare hands. Because catching eels with your bare hands is difficult simpliciter. Yes, make them electric, and that's exactly what it's like to study intuitionism. Intuitionism is like catching vaseline-coated electric eels with your bare hands.
Posted by Brienne Yudkowsky on Friday, September 25, 2015

I believe she simply meant that constructivism is hard and classical logic is easy by comparison. (For the level of detail in this blog post, constructivism and intuitionism are the same.) However, the image with the marbles stuck with me. Some time later, I had the sudden thought that a marble is a constructive proof of a marble. The intuitionists are not "retreating entirely into language" as I previously thought. Rather, almost the opposite: they are rejecting a strict brain/body separation, with logic happening only in the brain. Logic becomes more physical.

Rationalism generally makes a big deal of the map/territory distinction. The idea is that just as a map describes a territory, our beliefs describe the world. Just as a map must be constructed by looking at the territory if it's to be accurate, our beliefs must be constructed by looking at the world. The correspondence theory of truth holds that a statement or belief is true or false based on a correspondence to the world, much as a map is a projected model of the territory it depicts, and is judged correct or incorrect with respect to this projection. This is the meaning of the map.

In classical logic, this translates to model theory. Logical sentences correspond to a model via an interpretation; this determines their truth values as either true or false. How can we understand intuitionistic logic in these terms? The standard answer is Kripke semantics, but while that's a fine formal tool, I never found it helped me understand the meaning of intuitionistic statements. Kripke semantics is a many-world interpretation; the anti-realist position seemed closer to no-world-at-all. I now see I was mistaken.

Anti-realism is not rejection of the territory. Anti-realism is rejection of the map-territory correspondence. In the case of a literal map, such a correspondence makes sense because there is a map-reader who interprets the map. In the case of our beliefs, however, we are the only interpreter. A map cannot also contain its own map-territory correspondence; that is fundamentally outside of the map. A map will often include a legend, which helps us interpret the symbols on the map; but the legend itself cannot be explained with a legend, and so on ad infinitum. The chain must bottom out somehow, with some kind of semantics other than the map-territory kind.

The anti-realist provides this with constructive semantics. This is not based on correspondence. The meaning of a sentence rests instead in what we can do with it. In computer programming terms, meaning is more like a pointer: we uncover the reference by a physical operation of finding the location in memory which we've been pointing to, and accessing its contents. If we claim that 3*7=21, we can check the truth of this statement with a concrete operation. "3" is not understood as a reference to a mysterious abstract entity known as a "number"; 3 is the number. (Or if the classicist insists that 3 is only a numeral, then we accept this, but insist that it is clear numerals exist but unclear that numbers exist.)

A proof worked out on a sheet of paper is a proof; it does not merely mean the proof. It is not a set of squiggles with a correspondence to a proof.

How does epistemology work in this kind of context? How do we come to know things? Well...

Bayesianism Needs No Map

The epistemology of machine learning has been pragmatist all along: all models are wrong; some are useful. A map-territory model of knowledge plays a major role in the way we think of modeling, but in practice? There is no measurement of map-territory correspondence. What matters is goodness-of-fit and generalization error. In other words, a model is judged by the predictions it makes, not by the mechanisms which lead to those predictions. We tend to expect models which make better predictions to have internal models close to what's going on in the external world, but there is no precise notion of what this means, and none is required. The theorems of statistical learning theory and Bayesian epistemology (of which I am aware) do not make use of a map-territory concept, and the concept is not missed.

It's interesting that formal Bayesian epistemology relies so little on a map/territory distinction. The modern rationalist movement tends to advocate both rather strongly. While Bayesianism is compatible with map-territory thinking, it doesn't need it or really encourage it. This realization was rather surprising to me.

Sunday, June 14, 2015

Associated vs Relevant

Also cross-posted to LessWrong.

The List of Nuances (which is actually more of a list of fine distinctions - a fine distinction which only occurred to its authors after the writing of it) has one glaring omission, which is the distinction between associated and relevant. A List of Nuances is largely a set of reminders that we aren't omniscient, but it also serves the purpose of listing actual subtleties and calling for readers to note the subtleties rather than allowing themselves to fall into associationism, applying broad cognitive clusters where fine distinctions are available. The distinction between associated and relevant is critical to this activity.

An association can be anything related to a subject. To be relevant is a higher standard: it means that there is an articulated argument connecting to a question on the table, such that the new statement may well push the question one way or the other (perhaps after checking other relevant facts). This is close to the concept of value of information.

Whether something is relevant or merely associated can become confused when epistemic defensiveness comes into play. From A List of Nuances:
10. What You Mean vs. What You Think You Mean
  1. Very often, people will say something and then that thing will be refuted. The common response to this is to claim you meant something slightly different, which is more easily defended.
    1. We often do this without noticing, making it dangerous for thinking. It is an automatic response generated by our brains, not a conscious decision to defend ourselves from being discredited. You do this far more often than you notice. The brain fills in a false memory of what you meant without asking for permission.

As mentioned in Epistemic Trust, a common reason for this is when someone says something associated to the topic at hand, which turns out not to be relevant.

There is no shame in saying associated things. In a free-ranging discussion, the conversation often moves forward from topic to topic by free-association. All of the harm here comes from claiming that something is relevant when it is merely associated. Because this is often a result of knee-jerk self-defense, it is critical to repeat: there is no shame in saying something merely associated with the topic at hand!

It is quite important, however, to spot the difference. Association-based thinking is one of the signs of a death spiral, as a large associated memeplex reinforces itself to the point where it seems like a single, simple idea. A way to detect this trap is to try to write down the idea in list form and evaluate the different parts. If you can't explicitly articulate the unseen connection you feel between all the ideas in the memeplex, it may not exist.

Utilizing the power of associations is a powerful tool for creating a good story (although, see item #3 here for a counterpoint). Repeating themes can create a powerful feeling of relevance, which may be good for convincing people of a memeplex. Furthermore, association is a wonderful exploratory tool. However, it can turn into an enemy of articulated argument; for this reason, it is important to tread carefully (especially in one's own mind).

Wednesday, June 10, 2015

Epistemic Trust: Clarification

Cross-posted to LessWrong Discussion.

A while ago, I wrote about epistemic trust. The thrust of my argument was that rational argument is often more a function of the group dynamic, as opposed to how rational the individuals in the group are. I assigned meaning to several terms, in order to explain this:

Intellectual honesty: being up-front not just about what you believe, but also why you believe it, what your motivations are in saying it, and the degree to which you have evidence for it.

Intellectual-Honesty Culture: The norm of intellectual honesty. Calling out mistakes and immediately admitting them; feeling comfortable with giving and receiving criticism.

Face Culture: Norms associated with status which work contrary to intellectual honesty. Agreement as social currency; disagreement as attack. A need to save face when one's statements turn out to be incorrect or irrelevant; the need to make everyone feel included by praising contributions and excusing mistakes.

Intellectual trust: the expectation that others in the discussion have common intellectual goals; that criticism is an attempt to help, rather than an attack. The kind of trust required to take other people's comments at face value rather than being overly concerned with ulterior motives, especially ideological motives. I hypothesized that this is caused largely by ideological common ground, and that this is the main way of achieving intellectual-honesty culture.

There are several important points which I did not successfully make last time.
  • Sometimes it's necessary to play at face culture. The skills which go along with face-culture are important. It is generally a good idea to try to make everyone feel included and to praise contributions even if they turn out to be incorrect. It's important to make sure that you do not offend people with criticism. Many people feel that they are under attack when engaged in critical discussion. Wanting to change this is not an excuse for ignoring it.
  • Face culture is not the error. Being unable to play the right culture at the right time is the error. In my personal experience, I've seen that some people are unable to give up face-culture habits in more academic settings where intellectual honesty is the norm. This causes great strife and heated arguments! There is no gain in playing for face when you're in the midst of an honesty culture, unless you can do it very well and subtly. You gain a lot more face by admitting your mistakes. On the other hand, there's no honor in playing for honesty when face-culture is dominant. This also tends to cause more trouble than it's worth.
  • It's a cultural thing, but it's not just a cultural thing. Some people have personalities much better suited to one culture or the other, while other people are able to switch freely between them. I expect that groups move further toward intellectual honesty as a result of establishing intellectual trust, but that is not the only factor. Try to estimate the preferences of the individuals you're dealing with (while keeping in mind that people may surprise you later on).

Wednesday, June 3, 2015

Simultaneous Overconfidence and Underconfidence

Follow-up to this and this. Prep for this meetup. Cross-posted to LessWrong.
Eliezer talked about cognitive bias, statistical bias, and inductive bias in a series of posts only the first of which made it directly into the LessWrong sequences as currently organized (unless I've missed them!). Inductive bias helps us leap to the right conclusion from the evidence, if it captures good prior assumptions. Statistical bias can be good or bad, depending in part on the bias-variance trade-off. Cognitive bias refers only to obstacles which prevent us from thinking well.

Unfortunately, as we shall see, psychologists can be quite inconsistent about how cognitive bias is defined. This created a paradox in the history of cognitive bias research. One well-researched and highly experimentally validated effect was conservatism, the tendency to give estimates too middling, or probabilities too near 50%. This relates especially to integration of information: when given evidence relating to a situation, people tend not to take it fully into account, as if they are stuck with their prior. Another highly-validated effect was overconfidence, relating especially to calibration: when people give high subjective probabilities like 99%, they are typically wrong with much higher frequency.

In real-life situations, these two contradict: there is no clean distinction between information integration tasks and calibration tasks. A person's subjective probability is always, in some sense, the integration of the information they've been exposed to. In practice, then, when should we expect other people to be under- or over- confident?

Simultaneous Overconfidence and Underconfidence

The conflict was resolved in an excellent paper by Ido Ereve et al which showed that it's the result of how psychologists did their statistics. Essentially, one group of psychologists defined bias one way, and the other defined it another way. The results are not really contradictory; they are measuring different things. In fact, you can find underconfidence or overconfidence in the same data sets by applying the different statistical techniques; it has little or nothing to do with the differences between information integration tasks and probability calibration tasks. Here's my rough drawing of the phenomenon (apologies for my hand-drawn illustrations):

Overconfidence here refers to probabilities which are more extreme than they should be, here illustrated as being further from 50%. (This baseline makes sense when choosing from two options, but won't always be the right baseline to think about.) Underconfident subjective probabilities are associated with more extreme objective probabilities, which is why the slope tilts up in the figure. Overconfident similarly tilts down, indicating that the subjective probabilities are associated with less-extreme objective probabilities. Unfortunately, if you don't know how the lines are computed, this means less than you might think. Ido Ereve et al show that these two regression lines can be derived from just one data-set. I found the paper easy and fun to read, but I'll explain the phenomenon in a different way here by relating it to the concept of statistical bias and tails coming apart.

The Tails Come Apart

Everyone who has read Why the Tails Come Apart will likely recognize this image:
The idea is that even if X and Y are highly correlated, the most extreme X values and the most extreme Y values will differ. I've labelled the difference the "curse" after the optimizer's curse: if you optimize a criteria which is merely correlated with the thing you actually want, you can expect to be disappointed.

Applying the idea to calibration, we can say that the most extreme subjective beliefs are almost certainly not the most extreme on the objective scale. That is: a person's most confident beliefs are almost certainly overconfident. A belief is not likely to have worked its way up to the highest peak of confidence by merit alone. It's far more likely that some merit but also some error in reasoning combined to yield high confidence.

In what follows, I'll describe a "soft version" which shows the tails coming apart gradually, rather than only talking about the most extreme points.
Statistical Bias

Statistical bias is defined through the notion of an estimator. We have some quantity we want to know, X, and we use an estimator to guess what it might be. The estimator will be some calculation which gives us our estimate, which I will write as X^. An estimator is derived from noisy information, such as a sample drawn at random from a larger population. The difference between the estimator and the true value, X^-X, would ideally be zero; however, this is unrealistic. We expect estimators to have error, but systematic error is referred to as bias.

Given a particular value for X, the bias is defined as the expected value of X^-X, written EX(X^-X). An unbiased estimator is an estimator such that EX(X^-X)=0 for any value of X we choose.

Due to the bias-variance trade-off, unbiased estimators are not the best way to minimize error in general. However, statisticians still love unbiased estimators. It's a nice property to have, and in situations where it works, it has a more objective feel than estimators which use bias to further reduce error.

Notice, the definition of bias is taking fixed X; that is, it's fixing the quantity which we don't know. Given a fixed X, the unbiased estimator's average value will equal X. This is a picture of bias which can only be evaluated "from the outside"; that is, from a perspective in which we can fix the unknown X.

A more inside-view of statistical estimation is to consider a fixed body of evidence, and make the estimator equal the average unknown. This is exactly inverse to unbiased estimation:

In the image, we want to estimate unknown Y from observed X. The two variables are correlated, just like in the earlier "tails come apart" scenario. The average-Y estimator tilts down because good estimates tend to be conservative: because I only have partial information about Y, I want to take into account what I see from X but also pull toward the average value of Y to be safe. On the other hand, unbiased estimators tend to be overconfident: the effect of X is exaggerated. For a fixed Y, the average Y^ is supposed to equal Y. However, for fixed Y, the X we will get will lean toward the mean X (just as for a fixed X, we observed that the average Y leans toward the mean Y). Therefore, in order for Y^ to be high enough, it needs to pull up sharply: middling values of X need to give more extreme Y^ estimates.

If we superimpose this on top of the tails-come-apart image, we see that this is something like a generalization:

Wrapping It All Up

The punchline is that these two different regression lines were exactly what yields simultaneous underconfidence and overconfidence. The studies in conservatism were taking the objective probability as the independent variable, and graphing people's subjective probabilities as a function of that. The natural next step is to take the average subjective probability per fixed objective probability. This will tend to show underconfidence due to the statistics of the situation.

The studies on calibration, on the other hand, took the subjective probabilities as the independent variable, graphing average correct as a function of that. This will tend to show overconfidence, even with the same data as shows underconfidence in the other analysis.

From an individual's standpoint, the overconfidence is the real phenomenon. Errors in judgement tend to make us overconfident rather than underconfident because errors make the tails come apart so that if you select our most confident beliefs it's a good bet that they have only mediocre support from evidence, even if generally speaking our level of belief is highly correlated with how well-supported a claim is. Due to the way the tails come apart gradually, we can expect that the higher our confidence, the larger the gap between that confidence and the level of factual support for that belief.

This is not a fixed fact of human cognition pre-ordained by statistics, however. It's merely what happens due to random error. Not all studies show systematic overconfidence, and in a given study, not all subjects will display overconfidence. Random errors in judgement will tend to create overconfidence as a result of the statistical phenomena described above, but systematic correction is still an option.

Sunday, March 29, 2015

Good Bias, Bad Bias

I had a conceptual disagreement with a couple of friends, and I'm trying to spell out what I meant here in order to continue the discussion.

The statistical definition of bias is defined in terms of estimators. Suppose there's a hidden value, Theta, and you observe data X whose probability distribution is dependent on Theta, with known P(X|Theta). An estimator is a function of the data which gives you a hopefully-plausible value of Theta.

An unbiased estimator is an estimator which has the property that, given a particular value of Theta, the expected value of the estimator (expectation in P(X|Theta)) is exactly Theta. In other words: our estimate may be higher or lower than Theta due to the stochastic relationship between X and Theta, but it hits Theta on average. (In order for averaging to make sense, we're assuming Theta is a real number, here.)

The Bayesian view is that we have a prior on Theta, which injects useful bias in our judgments. A Bayesian making statistical estimators wants to minimize loss. Loss can mean different things in different situations; for example, if we're estimating whether a car is going hit us, the damage done by wrongly thinking we are safe is much larger than the damage done by wrongly thinking we're not. However, if we don't have any specific idea about real-world consequences, it may be reasonable to assume a squared-error loss so that we are trying to get our estimated Theta to match the average value of Theta.

Even so, the Bayesian choice of estimator will not be unbiased, because Bayesians will want to minimize the expected loss accounting for the prior, which means looking at the expectation in P(X|Theta)*P(Theta). In fact, we can just look at P(Theta|X). If we're minimizing squared error, then our estimator would be the average Theta in P(Theta|X), which is proportional to P(X|Theta)P(Theta).

Essentially, we want to weight our average by the prior over Theta because we decrease our overall expected loss by accepting a lot of statistical bias for values of Theta which are less probable according to our prior.

So, a certain amount of statistical bias is perfectly rational.

Bad bias, to a Bayesian, refers to situations when we can predictably improve our estimates in a systematic way.

One of the limitations of the paper reviewed last time was that it didn't address good vs bad bias. Bias, in that paper, was more or less indistinguishable from bias in the statistical sense. Detangling things we can improve from things which we want would require a deeper analysis of the mathematical model, and of the data.

Saturday, March 28, 2015

A Paper on Bias

I've been reading some of the cognitive bias literature recently.

First, I dove into Toward a Synthesis of Cognitive Biases, by Martin Hilbert: a work which claims to explain how eight different biases observed in the literature are an inevitable result of noise in the information-processing channels in the brain.

The paper starts out with what it calls the conservatism bias. (The author complains that the literature is inconsistent about naming biases, both giving one bias multiple names and using one name for multiple biases. Conservatism is what is used for this paper, but this may not be standard terminology. What's important is the mathematical idea.)

The idea behind conservatism is that when shown evidence, people tend to update their probabilities more conservatively than would be predicted by probability theory. It's as if they didn't observe all the evidence, or aren't taking the evidence fully into account. A well-known study showed that subjects were overly conservative in assigning probabilities to gender based on height; an earlier study had found that the problem is more extreme when subjects are asked to aggregate information, guessing the gender of a random selection of same-sex individuals from height. Many studies were done to confirm this bias. A large body of evidence accumulated which indicated that subjects irrationally avoided extreme probabilities, preferring to report middling values.

The author construed conservatism very broadly. Another example given was: if you quickly flash a set of points on a screen and ask subjects to estimate their number, then subjects will tend to over-estimate the number of a small set of points, and under-estimate the number of a large set of points.

The hypothesis put forward in Toward a Synthesis is that conservatism is a result of random error in the information-processing channels which take in evidence. If all red blocks are heavy and all blue blocks are light, but you occasionally mix up red and blue, you will conclude that most red blocks are heavy and most blue blocks are light. If you are trying to integrate some quantity of information, but some of it is mis-remembered, small probabilities will become larger and large will become smaller.

One thing that bothered me about this paper was that it did not directly contrast processing-error conservitism with the rational conservatism which can result from quantifying uncertainty. My estimate of the number of points on a screen should tend toward the mean if I only saw them briefly; this bias will increase my overall accuracy rate. It seems that previous studies established that people were over-conservative compared to the rational amount, but I didn't take the time to dig up those analyses.

All eight biases explained in Toward a Synthesis were effectively consequences of conservatism in different ways.

  • Illusory correlation: Two rare events X and Y which are independent appear correlated as a result of their probabilities being inflated by conservatism bias. I found this to be the most interesting application. The standard example of illusory correlation is stereotyping of minority groups. The race is X, and some rare trait is Y. What was found was that stereotyping could be induced in subjects by showing them artificial data in which the traits were entirely independent of the races. Y could be either a positive or a negative trait; illusory correlation occurs either way. The effect that conservatism has on the judgements will depend on how you ask the subject about the data, which is interesting, but illusory correlation emerges regardless. Essentially, because all the frequencies are smaller within the minority group, the conservatism bias operates more strongly; the trait Y is inflated so much that it's seen as being about 50-50 in that group, whereas the judgement about its frequency in the majority group is much more realistic.
  • Self-Other Placement: People with low skill tend to overestimate their abilities, and people with high skill tend to underestimate theirs; this is known as the Dunning-Kruger effect. This is a straightforward case of conservatism. Self-other placement refers to the further effect that people tend to be even more conservative about estimating other people's abilities, which paradoxically means that people of high ability tend to over-estimate the probability that they are better than a specific other person, despite the Dunning-Kruger effect; ans similarly, people of low ability tend to over-estimate the probability that they are worse as compared with specific individuals, despite over-estimating their ability overall. The article explains this as a result of having less information about others, and hence, being more conservative. (I'm not sure how this fits with the previously-mentioned result that people get more conservative as they have more evidence.)
  • Sub-Additivity: This bias is a class of inconsistent probability judgements. The estimated probability of an event will be higher if we ask for the probability of a set of sub-events, rather than merely asking for the overall probability. From WikipediaFor instance, subjects in one experiment judged the probability of death from cancer in the United States was 18%, the probability from heart attack was 22%, and the probability of death from "other natural causes" was 33%. Other participants judged the probability of death from a natural cause was 58%. Natural causes are made up of precisely cancer, heart attack, and "other natural causes," however, the sum of the latter three probabilities was 73%, and not 58%. According to Tversky and Koehler (1994) this kind of result is observed consistently. The bias is explained with conservativism again. The smaller probabilities are inflated more by the conservatism bias than the larger probability is, which makes their sum much more inflated than the original event.
  • Hard-Easy Bias: People tend to overestimate the difficulty of easy tasks, and underestimate the difficulty of hard ones. This is straightforward conservatism, although the paper framed it in a somewhat more complex model (it was the 8th bias covered in the paper, but I'm putting it out of order in this blog post).

That's 5 biases down, and 3 to go. The article has explained conservatism as a mistake made by a noisy information-processor, and explains 4 other biases as consequences of conservatism. So far so good.

Here's where things start to get... weird.

Simultaneous Overestimation and Underestimation

Bias 5 is termed exaggerated expectation in the paper. This is a relatively short section which reviews a bias dual to conservatism. Conservatism looks at the statistical relationship from the evidence, to the estimate formed in the brain. If there is noise in the information channel connecting the two, then conservatism is a statistical near-certainty.

Similarly, we can turn the relationship around. The conservatism bias was based on looking at P(estimate|evidence). We can turn it around with Bayes' Law, to examine P(evidence|estimate). If there is noise in one direction, there is noise in the other direction. This has a surprising implication: the evidence will be conservative with respect to the estimate, by essentially the same argument which says that the estimate will tend to be conservative with respect to the evidence. This implies that (under statistical assumptions spelled out in the paper), our estimates will tend to be more extreme than the data. This is the exaggerated expectation effect.

If you're like me, at this point you're saying what???

The whole idea of conservatism was that the estimates tend to be less extreme than the data! Now "by the same argument" we are concluding the opposite?

The section refers to a paper about this, so before moving further I took a look at that reference. The paper is Simultaneous Over- and Under- Confidnece: the Role of Error in Judgement Process by Erev et. al. It's a very good paper, and I recommend taking a look at it.

Simultaneous Over- and Under- Estimation reviews two separate strains of literature in psychology. A large body of studies in the 1960s found systematic and reliable underestimation of probabilities. This revision-of-opinion literature concluded that it was difficult to take the full evidence into account to change your beliefs. Later, many studies on calibration found systematic overestimation of probabilities: when subjects are asked to give probabilities for their beliefs, the probabilities are typically higher than their frequency of being correct.

What is going on? How can both of these be true?

One possible answer is that the experimental conditions are different. Revision-of-opinion tests give a subject evidence, and then test how well the subject has integrated the evidence to form a belief. Calibration tests are more like trivia sessions; the subject is asked an array of questions, and assigns a probability to each answer they give. Perhaps humans are stubborn but boastful: slow to revise their beliefs, but quick to over-estimate the accuracy of those beliefs. Perhaps this is true. It's difficult to test this against the data, though, because we can't always distinguish between calibration tests and revision-of-opinion tests. All question-answering involves drawing on world knowledge combined with specific knowledge given in the question to arrive at an answer. In any case, a much more fundamental answer is available.

The Erev paper points out that revision-of-opinion experiments used different data analysis. Erev re-analysed the data for studies on both sides, and found that the statistical techniques used by revision-of-opinion researchers found underconfidence, while the techniques of calibration researchers found overconfidence, in the same data-set!

Both techniques compared the objective probability, OP, with the subject's reported probability, SP. OP is the empirical frequency, while SP is whatever the subject writes down to represent their degree of belief. However, revision-of-opinion studies started with a desired OP for each situation and calculated the average SP for a given OP. Calibration literature instead starts with the numbers written down by the subjects, and then asks how often they were correct; so, they're computing the average OP for a given SP.

When we look at data and try to find functions from X to Y like that, we're creating statistical estimators. A very general principle is that estimators tend to be regressive: my Y estimate will tend to be closer to the Y average than the actual Y. Now, in the first case, scientists were using X=OP and Y=SP; lo and behold, they found it to be regressive. In later decades, they took X=SP and Y=OP, and found that to be regressive! From a statistical perspective, this is plain and ordinary business as usual. The problem is that one case was termed under-confidence and the other over-confidence, and they appeared from those names to be contrary to one another.

This is exactly what the Toward a Synthesis paper was trying to get across with the reversed channel, P(estimate|evidence) vs P(evidence|estimate).

Does this mean that the two biases are mere statistical artifacts, and humans are actually fairly good information systems whose beliefs are neither under- nor over- confident? No, not really. The statistical phenomena are real: humans are both under- and over-confident in these situations. What Toward a Synthesis and Simultaneous Over- and Under- Confidence are trying to say is that these are not mutually inconsistent, and can be accounted for by noise in the information-processing system of the brain.

Both papers propose a model which accounts for overconfidence as the result of noise during the creation of an estimate, although they are put in different terms. The next section of Toward a Synthesis is about overconfidence bias specifically (which it sees as a special case of exaggerated expectations, as I understand them; the 7th bias to be examined in the paper, for those keeping count). The model shows that even with accurate memories (and therefore the theoretical ability to reconstruct accurate frequencies), an overconfidence bias should be observed (under statistical conditions outlined in the paper). Similarly, Simultaneous Over-and Under- confidence constructs a model in which people have perfectly accurate probabilities in their heads, and the noise occurs when they put pen to paper: their explicit reflection on their belief adds noise which results in an observed overconfidence.

Both models also imply underconfidence. This means that in situations where you expect perfectly rational agents to reach 80% confidence in a belief, you'd expect rational agents with noisy reporting of the sort postulated to give estimates averaging lower (say, 75%). This is the apparent underconfidence. On the other hand, if you are ignorant of the empirical frequency and one of these agents tells you that it is 80%, then it is you who is best advised to revise the number down to 75%.

This is made worse by the fact that human memories and judgement are actually fallible, not perfect, and subject to the same effects. Information is subject to bias-inducing-noise at each step of the way, from first observation, through interpretation and storage in the brain, modification by various reasoning processes, and final transmission to other humans. In fact, most information we consume is subject to distortion before we even touch it (as I discussed in my previous post). I was a bit disappointed when the Toward a Synthesis paper dismissed the relevance of this, stating flatly "false input does not make us irrational".

Overall, I find Toward a Synthesis of Cognitive Biases a frustrating read and recommend the shorter, clearer Simultaneous Over- and Under- Confidence as a way to get most of the good ideas with less of the questionable ones. However, that's for people who already read this blog post and so have the general idea that these effects can actually explain a lot of biases. By itself, Simultaneous Over- and Under- Confidence is one step away from dismissing these effects as mere statistical artifacts. I was left with the impression that Erev doesn't even fully dismiss the model where our internal probabilities are perfectly calibrated and it's only the error in conscious reporting that's causing over- and under- estimation to be observed.

Both papers come off as quite critical of the state of the research, and I walk away from these with a bitter taste in my mouth: is this the best we've got? The extend of the statistical confusion observed by Erev is saddening, and although it was cited in Toward a Synthesis, I didn't get the feeling that it was sharply understood (another reason I recommend the Erev paper instead). Toward a Synthesis also discusses a lot of confusion about the names and definitions of biases as used by different researchers,which is not quite as problematic, but also causes trouble.

A lot of analysis is still needed to clear up the issues raised by these two papers. One problem which strikes me is the use of averaging to aggregate data, which has to do with the statistical phenomenon of simultaneous over- and under- confidence. Averaging isn't really the right thing to do to a set of probabilities to see whether it has a tendency to be over or under a mark. What we really want to know, I take it, is whether there is some adjustment which we can do after-the-fact to systematically improve estimates. Averaging tells us whether we can improve a square-loss comparison, but that's not the notion of error we are interested in; it seems better to use a proper scoring rule.

Finally, to keep the reader from thinking that this is the only theory trying to account for a broad range of biases: go read this paper too! It's good, I promise.

Monday, March 16, 2015

The Ordinary Web of Lies

One of the basic lessons in empiricism is that you need to consider how the data came to you in order to use it as evidence for or against a hypothesis. Perhaps you have a set of one thousand survey responses, answering questions about income, education level, and age. You want to draw conclusions about the correlations of these variables in the United States. Before we do so, we need to ask how the data was collected. Did you get these from telephone surveys? Did you walk around your neighborhood and knock on people's doors? Perhaps you posted the survey on Amazon's Mechanical Turk? These different possibilities give you samples from very different populations.

When we obtain data in a way that does not evenly sample from the population we are trying to study, this is called selection bias. If not accounted for, selection effects can cause you to draw just about any conclusion, regardless of the truth.

In modern society, we consume a very large amount of information. Practically all of that information is highly filtered. Most of this filtering is designed to nudge your beliefs in specific directions. Even when the original authors engage in intellectual honesty, we usually see something as a result of a large, complex filter imposed by society (for example, social media). Even when scientists are perfectly unbiased, journalists can choose to cite only the studies which support their perspective.

I have cultivated what I think is a healthy fear of selection effects. I would like to convey to the reader a visceral sense of danger, because it's so easy to be trapped in a web of false beliefs based on selection effects.

A Case Study

Consider this article, Miracles of the Koran: Chemical Elements Indicated in the Koran. A Muslim roommate showed this to me when I voiced skepticism about the miraculous nature of the Koran. He suggested that there could be no ordinary explanation of such coincidences. (Similar patterns have been found in the Bible, a phenomenon which has been named the Bible Code.) I decided to try to attempt an honest analysis of the data to see what it led to.

Take a look at these coincidences. On their own, they are startling, right? When I first looked at these, I had the feeling that they were rather surprising and difficult to explain. I felt confused.

Then I started to visualize the person who had written this website. I supposed that they were (from their own perspective) making a perfectly honest attempt to record patterns in the Koran. They simply checked each possibility they thought of, and recorded what patterns they found.

There are 110 elements on the periodic table. The article discusses the placement (within a particular Sura, the Iron Sura) of Arabic letters which correspond (roughly) to the element abbreviations used on the Periodic Table. For example, the first coincidence noted is that the first occurrence of the Arabic equivalent of "Rn" is 86 letters from the beginning of the verse, and the atomic number of the element Rn is 86. The article notes similar coincidences with atomic weight (as opposed to atomic number), the number of letters from the end of the verse (rather than the beginning), the number of words (rather than number of letters), and several other variations.

Notice that simply looking at the number of characters from the beginning and the end, we double the chances of corresponding to the atomic number. Similarly, looking for atomic weights as well as atomic numbers doubles the chances. Each extra degree of freedom we allow multiplies the chances in this way.

I couldn't easily account for all the possible variations the article's author might have looked for. However, I could restrict myself to one class of patterns and see how much the data looked like chance.

Even restricting myself to one particular class of patterns, I did not know enough of the statistics of the Arabic language to come up with a real Bayesian analysis of the data. I made some very, very rough assumptions which I didn't write down and no longer recall. I estimated the number of elements which would follow the pattern by chance, and my estimate came very close to the number which the article actually listed.

I have to admit, whatever my analysis was, it was probably quite biased as well. It's likely that I added assumptions in a way which was likely to get me the answer I wanted, although I felt I was not doing that. Even supposing that I didn't, I did stop doing math once the numbers looked like chance, satisfied with the answer. This in itself creates a bias. I could certainly have examined some of my assumptions more closely to make a better estimate, but the numbers said what I wanted, so I stopped questioning.

Nonetheless, I do think that the startling coincidences are entirely explained by the strong selection effect produced by someone combing the Koran for patterns. Innocently reporting patterns which fit your theory, with no intention to mislead, can produce startling arguments which appear at first glance to very strongly support your point. The most effective, convincing versions of these startling arguments will get shared widely on the internet and other media (so long as there is social incentive to spread the argument).

If you're not accounting for selection bias, then trying to respond to arguments with rational consideration makes you easy to manipulate. Your brain can be reprogrammed simply by showing it the most convincing arguments in one direction and not the other.

Everything is Selection Bias

Selection processes filter everything we see. We see successful products and not unsuccessful ones. We hear about famous people, which greatly biases our perception of how to get rich. We filter our friends quite a bit, perhaps in ways we don't even realize, and then often we trick ourselves into wrong conclusions about typical people based on the people we've chosen as friends.

No matter what data you're looking at, it was sampled from some distribution. It's somewhat arbitrary to think that selecting from university students is biased, but that selecting evenly from Amaricans is not. Indeed, university professors have far more incentive to understand the psychology of the student population! What matters is being aware of the selection process which got you the data, and accounting for that when trying to draw conclusions.

Even biological evolution can be seen as a selection effect. Selective pressure takes a tiny minority of the genes, and puts those genes into the whole population. This is a kind of self-fulfilling selection effect, weirder than simple selection bias. It's as if the rock stars in one generation become the common folk of the next.

The intuition I'm trying to get across is: selection effects are something between a physical force and an agent. Like an agent, selection effects optimize for particular outcomes. Like a physical force, selection effects operate automatically, everywhere, without requiring a guiding hand to steer them. This makes them a dangerous creature.

Social Constructs

Social reality is a labyrinth of mirrors reflecting each other. All the light ultimately comes from outside the maze, but the mirrors can distort it any way they like. The ordinary web of lies is my personal term for this. Many people will think of religion, but it goes far beyond this. When society decides a particular group is the enemy, they become the enemy. When society deems words or concepts uncouth, they are uncouth. I call these lies, but it's not what we ordinarily mean by dishonest. It's terrifyingly easy to distort reality. Even one person, alone, will tend to pick and choose observations in a self-serving way. When we get together in groups, we have to play the game: selecting facts to use as social affirmations or condemnations, selecting arguments to create consensus... it's all quite normal.

This all has to do with the concept of hyperstition (see Lemurian Time War) and hyperreality. Hyperstition refers to superstition which makes itself real. Hyperreality refers to our inability to distinguish certain fictions from reality, and the way in which our fictional, constructed world tends to take primacy over the physical world. Umberto Eco illustrates this nicely in his book Focault's Pendulum, which warns of the deadly danger in these effects.

The webcomic The Accidental Space Spy explores alien cultures as a way of illustrating evolutionary psychology. One of the races, the Twolesy, has evolved strong belief in magic wizards. These wizards command the towns. Whoever doubts the power of a wizard is killed. Being that it has been this way for many generations, the Twolesy readily hallucinate magic. Whatever the wizards claim they can do, the Twolesy hallucinate happening. Whatever other Twolesy claim is happening, they hallucinate as well. Twolesy who do not hallucinate will not be able to play along with the social system very effectively, and are likely to be killed.

Similarly with humans. Our social system relies on certain niceties. Practically anything, no matter how not about signaling it is, becomes a subject for signaling. Those who are better at filtering information to their advantage have been chosen by natural selection for generations. We need not consciously know what we're doing -- it seems to work best when we fool ourselves as well as everyone else. And yes, this goes so far as to allow us to believe in magic. There are mentalists who know how to fool our perceptions and consciously develop strategies to do so, but equally well, there are Wiccans and the like who have similar success by embedding themselves in the ordinary web of lies.

Something which surprised me a bit is that when you try to start describing rationality techniques, people will often object to the very idea of truth-oriented dialog. Truth-seeking is not the first thing on people's minds in everyday conversation, and when you raise it to their awareness, it's not obvious that it should be. Other things are more important.

Imagine a friend has experienced a major loss. Which is better: frank discussion of the mistakes they made, or telling them that it's not really their fault and anyway everything will work out for the best in the end? In American culture at least, it can be rude to let on that you think it might be their fault. You can't honestly speculate about that, because they're likely to get their feelings hurt. Only if you're reasonably sure you have a point, and if your relationship is close enough that they will not take offense, could you say something like that. Making your friend feel better is often more important. By convincing them that you don't think it's their fault, you strengthen the friendship by signalling to them that you trust them. (In Persian culture, I'm given to understand, it's the opposite way: everyone should criticize each other all the time, because you want to make your friends think that you know better than them.)

When the stakes are high, other things easily become more important than the truth.

Notice the consequences, though: the mistakes with high consequences are exactly the ones you want to thoroughly debug. What's important is not whether it's your fault or no; what matters is whether there are different actions you should take to forestall disaster, next time a similar situation arises.

What, then? Bad poetry to finish it off?

Beware, beware, the web of lies;
the filters twist the truth, and eyes
are fool'd too well; designed to see
what'ere the social construct be!

We the master, we the tool,
that spin the thread and carve the spool
weave the web and watch us die!