Thursday, November 27, 2014

P-values and Chaos Worlds

In First Aid for P-Values, Black Belt Bayesian discusses how a Bayesian can interpret the p-value to get some information. He references an article which argues that this can shift the frame of the discussion in a useful way, improving the nature of the statistical arguments without significantly changing the methodology. It emphasizes the role of evidence in shifting beliefs progressively, as opposed to proof/disproof.

While this does seem like a useful tool, it still leaves us with the problems of null hypothesis testing. One problem is that the null hypothesis is sometimes not very plausible. Arguing from a point of total randomness is an odd thing to do. What would we expect to see if the world was a chaotic place with no patterns? Hm, reality doesn't match that? Ok, well, our hypothesis is better than maximum entropy. Good!

Scott Alexander makes this error in a post which he explicitly predicted he'd regret writing. (Epistemic Warning: This is, perhaps, among the smaller problems with the post. A larger problem is that it makes readers think in simplistic tribes. Another possible problem is that it risks the same error it calls out. There's a reason he said he'd regret it.) He's discussing how strongly our friends and acquaintances are filtered in terms of beliefs:

And I don’t have a single one of those people in my social circle. It’s not because I’m deliberately avoiding them; I’m pretty live-and-let-live politically, I wouldn’t ostracize someone just for some weird beliefs. And yet, even though I probably know about a hundred fifty people, I am pretty confident that not one of them is creationist. Odds of this happening by chance? 1/2^150 = 1/10^45 = approximately the chance of picking a particular atom if you are randomly selecting among all the atoms on Earth.
 He goes on to use this number a couple more times as an indication of the strength of filtering:

I inhabit the same geographical area as scores and scores of conservatives. But without meaning to, I have created an outrageously strong bubble, a 10^45 bubble. Conservatives are all around me, yet I am about as likely to have a serious encounter with one as I am a Tibetan lama.
And:
A disproportionate number of my friends are Jewish, because I meet them at psychiatry conferences or something – we self-segregate not based on explicit religion but on implicit tribal characteristics. So in the same way, political tribes self-segregate to an impressive extent – a 1/10^45 extent, I will never tire of hammering in – based on their implicit tribal characteristics. 
 The problem is that this is a world-of-chaos-and-fire hypothesis he's comparing to. The number makes the strength of the filter incredible-sounding, almost physically implausible. But, that's just what you get when you use a bad model! Note that the "strength" would keep getting more extreme as we examine more data (just as a p-value gets extreme with more data, unless the null hypothesis is actually true).

It's not like there is a baseline world where everything is completely random, and an extra physical force on top of this which puts things into nonrandom configurations. (Except, perhaps, in the sense that everything is heading toward thermodynamic equilibrium.) We do not form associates with people randomly. It would be much more meaningful to compare possibly-realistic models and the level of friend filtering which they imply.

I'm not trying to call out Slate Star Codex here. That particular post happened to be an epistemic landmine, yes, but this mistake is easy to make and fairly common. What's interesting to me is the difference between what arguments feel meaningful vs actually are meaningful.

6 comments:

  1. > Note that the "strength" would keep getting more extreme as we examine more data (just as a p-value gets extreme with more data, unless the null hypothesis is actually true).

    Well, yeah. Wouldn't we expect our belief that the parameter is 0 or anything close to 0 to decrease as Scott keeps getting new non-conservative friends? The likelihood of that draw of friends is very small for ~0 strength of filter.

    I guess I'm not seeing what the big problem is here.

    ReplyDelete
  2. What I'm saying is that the strength of a filter isn't measured very well by the plausibility of a model where association is random.

    A better notion of "strength" would measure the power of the filter. It could be something like a ratio of the political-party frequency in the population to the political-party frequency in population associated with.

    ReplyDelete
  3. > It could be something like a ratio of the political-party frequency in the population to the political-party frequency in population associated with.

    Isn't that basically isomorphic? After all, his number is being calculated based similarly wasn't it? (something like 0.5^friends) What's the difference between calculating a ratio based on the population fraction and a likelihood based on the population fraction?

    ReplyDelete
  4. I don't see how.

    The mistake is very similar to mixing up statistical significance and effect size.

    The probability of the observed associations under the assumption of random association has little to do with the strength of the filter.

    The filter effect he's describing does seem surprisingly strong, but the surprise is not qualitatively *astronomical*, and neither is the filter strength. Such a number is basically coming out of a meaningless computation (meaningless with respect to the flow of the argument).

    ReplyDelete
  5. Let me try that again: imagine it as a binomial problem with a uniform prior; so as Scott gains friends, and each friend is liberal and not conservative, the unlikeliness of the bias parameter being 50% (equal odds of liberal or conservative) increases, and the estimated bias increases. How is this wrong? Are you arguing that no matter how many friends Scott obtains, all liberal, our beliefs about the bias should not change? How is noting the extremely low confidence put on 50% after updating ~150 times wrong?

    ReplyDelete
  6. This sounds perfectly fine. This approach is much better in that it measures strength in a realistic and intuitively plausible way.

    ReplyDelete