The Kelly Criterion — alaricstephen.com

Jun 9

Jun 9 The Kelly Criterion

I recently watched Breaking Bad and there is a scene (minor spoiler) where Walter White is trying to learn enough blackjack to convince people that he had found a system of counting cards as an excuse for all the money coming in from the meth business. At one point he talks through his logic for bidding a particular way on his hand using a whole lot of technobable, including the use of the phrase “according to the Kelly Criterion.” I'd never heard of it before, so after a bit of googling and some algebraic manipulation, I present it as simply as I can.

There was an experiment done where 61 participants were given $25 each and they were asked to place bets on a coin that they knew was weighted so that it showed heads with a probability of 0.6. There was a cap on the winnings of $250 and they only had half an hour to play, which meant that they had around 300 flips at most. The question is, how would you bid to try to maximise your winnings?

As always with these things, the humans did badly even with the odds on their side. 17 of the participants managed to go bust and 18 of the 61 bet their entire initial stake on one flip. Even worse, 40 of the participants actually bid on tails at least once. Awful.

There were some successes though and the average take home pot was $91, with 13 people taking home the maximum $250. But what is optimal? Let's look at some of the extremes. Betting most of your pot could lose you everything on one unlucky tails so we should try to mitigate the variance by spreading our bets over multiple flips. On the other hand, bidding 1 cent at a time would be very safe, but we could easily have run out of time before we've made much money. Going up a few hundred cents would not be using enough of the pot to maximise our winnings. Clearly there is an actual answer that lays somewhere in between.

The other thing to realise is that our answer should scale. As our pot grows then it gives us a bigger buffer against accidently going broke. The natural solution here is that we should always bet some fraction of our pot, so when we win we bet higher and when we lose we bet lower, but always with the same balance between safety and high growth.

The answer turns out to be that you should bid (2p-1)w where p is the probability that you win and w is the wealth at the beginning of the game. You can think of 2p-1 as what fraction of the wealth we are bidding. Trying this with the numbers from the game above we get that 2p-1=2*0.6-1=0.2 which means we want to bid 20% of the pot at each stage.

Let's try to prove that this answer is optimal. If you bid (2p-1)w once then if you win you will have an extra 2(2p-1)w on top of the rest of your pot which is (1-(2p-1))w=(2-2p)w. Putting that together we have 2(2p-1)w+(2-2p)w=2pw. If you lose you have just the remainder of your pot:(2-2p)w=2(1-p)w. Pleasingly, if you increase the number of times you bid in a row, then it doesn't matter in which order you win or you lose. If you bet n times and you win k of them, then you will end up with 2ⁿp^k(1-p)^n-kw. Each win gives you an extra power of p and each lose gives you an extra power of 1-p, with both giving you a factor of 2.

Now for the calculus to prove this is optimal. Instead of bidding 2p-1 times your wealth, lets instead disturb it by bidding 2p-1+h times your wealth. The idea is that we will show that the growth rate is fastest when h=0. The version of the previous formula doesn't change very much. We still get a bracket appearing for each win and a bracket for each lose, but with added h's. Your wealth after n plays and k wins is:(2p+h)^k[2(1-p)-h]^n-kw.

We are trying to find the maximum of something as we change h, so lets take the derivative with respect to h and set it to zero. After a bit of chain rule and product rule manipulation we get k(2p+h)^k-1[2(1-p)-h]^n-kw-(n-k)(2p+h)^k[2(1-p)-h]^n-k=0. Therefore we have the rather nicer function of k[2(1-p)-h]=(n-k)(2p+h) which we can solve for h=2(k/n-p). But we can think about what those concepts actually mean. k/n is the long term number of wins over the number of plays. The limit of that as n tends to infinity should be the probability of winning which is p. Therefore, as n tends to infinity, h tends to 0. Thuse 2p-1 was the optimum fraction of w to bet.

All of this can be generalised further, although the proof, while on the same lines as ours, gets a lot more messy. We were doing a bet with odds of 1 to 1, which means that if you bid £1 and win then you would win £1 plus your £1 stake. A bid with odds of b to 1 means that you would win £b plus your £1 stake.

The general amount you should bid for a starting wealth of £w with a probability of success of p of winning and odds of b to 1 is (p(b+1)-1)w/b. Notice that if b<(1-p)/p then the recommended amount to bid is negative, i.e. you should take the other end of bet (you should be the house), because the odds are against you. This formula is often used in assessing options for the stock market, although there are some variations on it which decrease the chance of going bankrupt (which also take a hit in growth rate) which are more popular at the moment.