## Sunday, March 29, 2015

I had a conceptual disagreement with a couple of friends, and I'm trying to spell out what I meant here in order to continue the discussion.

The statistical definition of bias is defined in terms of estimators. Suppose there's a hidden value, Theta, and you observe data X whose probability distribution is dependent on Theta, with known P(X|Theta). An estimator is a function of the data which gives you a hopefully-plausible value of Theta.

An unbiased estimator is an estimator which has the property that, given a particular value of Theta, the expected value of the estimator (expectation in P(X|Theta)) is exactly Theta. In other words: our estimate may be higher or lower than Theta due to the stochastic relationship between X and Theta, but it hits Theta on average. (In order for averaging to make sense, we're assuming Theta is a real number, here.)

The Bayesian view is that we have a prior on Theta, which injects useful bias in our judgments. A Bayesian making statistical estimators wants to minimize loss. Loss can mean different things in different situations; for example, if we're estimating whether a car is going hit us, the damage done by wrongly thinking we are safe is much larger than the damage done by wrongly thinking we're not. However, if we don't have any specific idea about real-world consequences, it may be reasonable to assume a squared-error loss so that we are trying to get our estimated Theta to match the average value of Theta.

Even so, the Bayesian choice of estimator will not be unbiased, because Bayesians will want to minimize the expected loss accounting for the prior, which means looking at the expectation in P(X|Theta)*P(Theta). In fact, we can just look at P(Theta|X). If we're minimizing squared error, then our estimator would be the average Theta in P(Theta|X), which is proportional to P(X|Theta)P(Theta).

Essentially, we want to weight our average by the prior over Theta because we decrease our overall expected loss by accepting a lot of statistical bias for values of Theta which are less probable according to our prior.

So, a certain amount of statistical bias is perfectly rational.

Bad bias, to a Bayesian, refers to situations when we can predictably improve our estimates in a systematic way.

One of the limitations of the paper reviewed last time was that it didn't address good vs bad bias. Bias, in that paper, was more or less indistinguishable from bias in the statistical sense. Detangling things we can improve from things which we want would require a deeper analysis of the mathematical model, and of the data.