Monday, February 11, 2013

Median Expectation & Robust Utility

I've been talking recently about robust statistics, and the consequences of replacing means with medians. However, I've only looked at this in a fairly limited way, asking about one particular distribution (the bell curve). Mean values are everywhere in statistics; perhaps to a greater degree than you realize, because we often refer to the mean value as the "expected value". It's a simple alias for the same thing, but that may be easy to forget when we are taking expectations everywhere.

In some sense, the "expectation" seems to be a more basic concept than the "mean". We could think of the mean as simply one way of formalizing the intuitive notion of expected value. What happens if we choose a different formalization? What if we choose the median?

The post on altering the bell curve is (more or less) an exploration of what happens to some of classical statistics if we do this. What happens to Bayesian theory?

The foundations of Bayesian statistics are really not touched at all by this. A Bayesian does not rely as heavily on "statistics" in the way a frequentist statistician does. A statistic is a number derived from a dataset which gives some sort of partial summary. We can look at mean, variance, and higher moments; correlations; and so on. We distinguish between the sample statistic (the number derived from the data at hand) and the population statistic (the "true" statistic which we could compute if we had all the examples, ever, of the phenomenon we are looking at). We want to estimate the population statistics, so we talk about estimators; these are numbers derived from the data which are supposed to be similar to the true values. Unbiased estimators are an important concept: ways of estimating population statistics whose expected values are exactly the population statistics.

These concepts are not exactly discarded by Bayesians, since they may be useful approximations. However, to a Bayesian, a distribution is a more central object. A statistic may be a misleading partial summary. The mean (/mode/median) is sort of meaningless when a distribution is multimodalCorrelation does not imply... much of anything (because it assumes a linear model!). Bayesian statistics still has distribution parameters, which are directly related to population statistics, but frequentist "estimators" are not fundamental because they only provide point estimates. Fundamentally, it makes more sense to keep a distribution over the possibilities, assigning some probability to each option.

However, there is one area of Bayesian thought where expected value makes a great deal of difference: Bayesian utility theory. The basic law of utility theory is that we choose actions so as to maximize expected value. Changing the definition of "expected" would change everything! The current idea is that in order to judge between different actions (or plans, policies, designs, et cetera) we look at the average utility achieved with each option, according to our probability distribution over the possible results. What if we computed the median utility rather than the average? Let's call this "robust utility theory".

From the usual perspective, robust utility would perform worse: to the extent that we take different actions, we would get a lower average utility. This begs the question of whether we care about average utility or median utility, though. If we are happy to maximize median utility, then we can similarly say that the average-utility maximizers are performing poorly by our standards.

At first, it might not be obvious that the median is well-defined for this purpose. The median value coming from a probability distribution is defined to be the median in the limit of infinite independent samples from that distribution, though. Each case will contribute instances in proportion to its probability. What we end up doing is lining up all the possible consequences of our choice in order of utility, with a "width" determined by the probability of each, and taking the utility value of whatever consequence ends up in the middle. So long as we are willing to break ties somehow (as is usually needed with the median), it is actually well-defined more often than the mean! We avoid problems with infinite expected value. (Suppose I charge you to play a game where I start with a $1 pot, and start flipping a coin. I triple the pot every time I get heads. Tails ends the game, and I give you the pot. Money is all you care about. How much should you be willing to pay to play?)

Since the median is more robust than the mean, we also avoid problems dealing with small-probability but disproportionately high-utility events. The typical example is Pascal's Mugging. Pascal walks up to you and says that if you don't give him your wallet, God will torture you forever in hell. Before you object, he says: "Wait, wait. I know what you are thinking. My story doesn't sound very plausible. But I've just invented probability theory, and let me tell you something! You have to evaluate the expected value of an action by considering the average payoff. You multiply the probability of each case by its utility. If I'm right, then you could have an infinitely large negative payoff by ignoring me. That means that no matter how small the probability of my story, so long as it is above zero, you should give me your wallet just in case!"

A Robust Utility Theorist avoids this conclusion, because small-probability events have a correspondingly small effect on the end result, no matter how high a utility we assign.

Now, a lot of nice results (such as the representation theorem) have been derived for average utilities over the years. Naturally, taking a median utility might do all kinds of violence to these basic ideas in utility theory. I'm not sure how it would all play out. It's interesting to think about, though.


  1. VNM-utility is what happens when we assume an agent's preferences are consistent in certain ways and then ask "Are those preferences encodable as maximizing the expectation of something? Does the operation Prefer(distribution_1 over outcomes, distribution_2 over outcome) factor in a nice way like that?"

    It seems backwards to say "hey how about we take that encoding and apply a different inverse and see what comes out?". Not a bad question, interesting, but a bit backwards... I say discard utility and start from scratch.

    If I really wanted to figure out what I might mean by "maximize median utility", I'd say something like "can I encode an interestingly large set of preference functions using a median somehow?"

    What median? The median of the probability distribution over outcomes? That's just Maximum Likelihood. The median of the product of some function of an outcome and the probability of that outcome? There's a question, I guess... Suppose we try to encode an agent's preferences as a decision algorithm that reads "make the i-th decision, choosing i by argmax{median(probability(outcome_j|decision_i)*f(outcome_j|decision_i) over all outcome_j)}".

    My intuition is that this doesn't work... first, I'll have to make outcomes really granular or else decisions that shift probabilities minorly will have no effect on the median at all. Second, it feels like since the median operation isn't linear in any good way, I won't be able to decompose the preferences and tease out the function f...

  2. You're right, it is backwards in one view of things. I don't think that's the only way of seeing it, though: in "Thinking about Acting", Pollock argues that the VNM view is backwards.

    I don't understand the "what median?" paragraph. The computation I am referring to is: median utility := the highest utility which we have at least a 50% chance of equaling/exceeding. (IE: the median is the utility such that we have an equal chance of going above it or below it, with some way of breaking ties when this is not sufficiently well-defined.)

    The question that VNM utility theory answers is: under what conditions can a set of gambling preferences be interpreted as mean-utility maximization (fixed utilities on deterministic outcomes & gamble's values interpolated from these via the probabilities)?

    In the VNM formulation, I'm breaking continuity and independence. Now that I understand there are alternatives, I don't find these axioms to be particularly compelling. So, it seems as if median utility encodes a possible set of preferences on gambles.

    I doubt median utility captures as broad/useful a set of preferences as mean utility. It captures a different set, though.

  3. We could think of mean-max and median-max as two different systems of moving from preferences over deterministic outcomes to preferences over gambles. Some immediate facts about median maximization:

    1) Only the ordering of preferences matters; we don't need to assign numeric utilities to outcomes anymore. (Since I already mentioned Pollock, I should mention that he argued against this; he thought numeric utilities were necessary for efficiency. Indeed, finding the median is more computationally difficult.)

    2) If we are choosing between two gambles, we use the preference ordering of the median outcomes. 50% is always a tip-over point: if some outcome is more than 50% probable, the gamble will simply take on that value. This is not true about mixing two gambles together: there is nothing nice we can say based on the median value of the two gambles and the knowledge that one has probability over 50%.