Dustin Boswell - Brain Dumps about Computers, Programming, and Everything Else
dustwell.com About Me Past Academic Work Photo Album Links I Use
Articles by Me Subscribe: RSS
Oil ETFs leak money Yes/No proposition bets div, span, and CSS "display:" Pair programming + screen Dear VirginAmerica.com SSH keys in 2 easy steps "An hour" vs. "A hour" How to hash passwords Snapshotting with Rsync MacBook Pro Sharp Edge Fixing Your Flaky Internet How X-over-SSH works Drinking Distilled Water Politician != Decider Understanding iTunes files My Audi A4 Gas Mileage djb-dns installation Vim Cheat Sheet
Maximum Likelihood vs. Expected Value April 5, 2011

Suppose you pulled a slot machine 5 times, and won 1 time. What's your best estimate for the true underlying payout probability $p$?

This seemingly innocent question is actually quite involved. You might think the answer is $\frac{1}{5}$ but it's actually $\frac{2}{7}$.

(Note: we're making the usual assumption of a uniform prior for $p$, which just means "before we saw any data, we thought any value of $p$ was equally likely.")

Maximum Likelihood Estimate

The maximum likelihood estimate (MLE) of a parameter is the value whose likelihood is highest. If you're looking at the probability density function (PDF) of that parameter, the MLE is simply the highest point on the curve (i.e. the mode).

For a binomial, the MLE of $p$ is indeed $\frac{k}{n}$ (where you saw $k$ "successes" out of $n$ attempts), which is $\frac{1}{5}$ in this case.

Expected Value

However, the MLE is different than the expected value of $p$. If you're looking at the PDF of a parameter, the expected value is the mean of that curve. For a binomial, the expected value turns out to be $\frac{k+1}{n+2}$. (This is also known as Laplace's Rule of Succession.)

For the data above ($k=1$, $n=5$) you get $\frac{2}{7}$. Yes, this seems strange, but this will be a better estimate of $p$ than $\frac{1}{5}$.

Nitty Gritty Math Details

The posterior distribution for a binomial parameter $p$ takes the shape of a beta distribution, which is a function $Beta(x; \alpha, \beta)$. When the data is $k$ successes out of $n$ attempts, the parameters to the function are $\alpha=k+1$ and $\beta=(n-k)+1$.

The function $Beta(x; \alpha, \beta)$ has the following properties: $$mean = \frac{\alpha}{\alpha + \beta} = \frac{k+1}{n+2}$$ and $$mode = \frac{\alpha - 1}{\alpha + \beta + 2} = \frac{k}{n}$$

Lo and behold, when you plug in $k=1$ and $n=5$ you get $\frac{2}{7}$ for the mean, and $\frac{1}{5}$ for the mode.

Head-to-head Matchup

But don't take my word for it. You can simulate it yourself with the Python code below:
import random

def flip_coin(p_heads):
        if random.random() < p_heads: return 'H'
        else: return 'T'

def compute_errors():
        # pick a true underlying 'p' uniformly, and generate data
        p_heads = random.random()
        NUM_COINS = 5
        coins = [flip_coin(p_heads) for x in xrange(NUM_COINS)]

        # come up with estimates of 'p'
        mle_p = coins.count('H') / float(NUM_COINS)
        laplace_p = (coins.count('H') + 1) / float(NUM_COINS + 2)

        # return their errors
        return (abs(mle_p - p_heads), abs(laplace_p - p_heads))

win_count = {'mle': 0, 'laplace': 0}
mle_errors = []
laplace_errors = []
for x in xrange(1000000):
        (mle_error, laplace_error) = compute_errors()
        if mle_error <= laplace_error:
                win_count["mle"] += 1
        else:
                win_count["laplace"] += 1

        mle_errors.append(mle_error)
        laplace_errors.append(laplace_error)

print "win_counts:",  win_count
print "average mle_error:", sum(mle_errors) / len(mle_errors)
print "average laplace_error:", sum(laplace_errors) / len(laplace_errors)

When I run this, I get

win_counts: {'laplace': 569967, 'mle': 430033}
average mle_error: 0.142
average laplace_error: 0.123

Which means the formula $\frac{k+1}{n+2}$ was closer 57% of the time, and had a smaller average error than $\frac{k }{n}$ does.

WTF?!

If you're having a hard time coming to grips with this reality, let me try to explain some intuition behind why $\frac{k}{n}$ is a bad estimator.

The basic problem is that $\frac{k}{n}$ tends to "believe" extreme data too easily. If we see 0 wins out of 5 attempts, $\frac{k}{n}$ will conclude that the exact value of $p$ must be 0. Of course, this is extremely unlikely. It's more likely that the true value of $p$ is $> 0$, and that 0-out-of-5 was just an unlucky streak. (Similarly for when the data is 5-out-of-5 wins.)

blog comments powered by Disqus