Extended Kelly Criterion

Handling multiple bets and payoff uncertainties

November 24, 2025 · John Peach


Ask NotebookLM

Introduction

Niko Tosa and a couple of his friends walked into the Ritz Club in London one night, sat down at the roulette table and broke the bank. By watching the relative positions of the ball and wheel they accurately predicted the winning pocket, and they did it without using any “cheating” methods that were used by the Eudaemons who built computers in their shoes to track the ball and wheel. Tosa won by improving the expected value of his bets in his favor. He was not only adept at predicting the outcome, but he must have studied his probability theory very carefully.

Security personnel at the Ritz reviewed the tapes of Tosa’s play, but never found evidence of any predictive software, so perhaps he was just an exceptional athlete who could sense where the ball was going. This article won’t show you how to gain an edge at roulette or any other casino game, but we’ll show you how to optimally manage your money.

Figure 1.

Niko Tosa and his roulette strategy.

Beside predicting the outcome of each roll, Tosa likely kept careful track of how much he was betting each game. Bet too little, and he wouldn’t increase his stake at an optimum rate, but if he bet too much, he risked losing it all.

We’re not going to give you a method to pick stock market winners or encourage you to gamble, but we’ll show you how to invest wisely by managing your money. The mathematics in this article is dense, so if you’re more interested in experimenting with the outcome, go to the Github repository and download the Julia code. See the section Code for this article below for the link.

Quick Start Guide

For Mathematical Readers: Work through each section sequentially—the complexity builds deliberately.

For Experimenters: Jump to the “Experiments to Try” section, download the code, and start playing. Return to the theory when you want to understand why something works.

For Practitioners: Focus on “Discrete Case: Multiple Simultaneous Bets” and “Transaction Fees and Costs”—these have immediate practical applications.

A Review of the Kelly Criterion

In the Kelly Criterion, we showed how to optimize betting by wagering exactly the right amount of money based on the probability of winning, the expected returns, and the size of your stake at the time of the bet. Recall that if the probability of winning is p,p, the fraction of your initial capital S0S_0 that you bet is 0b10 \leq b \leq 1, and you win ww units per unit bet, then your wealth after nn bets will be

Sn=S0(1b+bw)pn(1b)(1p)nS_n = S_0(1-b+bw)^{pn}(1-b)^{(1-p)n}

where the first term represents wins and the second losses. Letting Rn=SnS0R_n = \frac{S_n}{S_0} and taking the limit as n1n \rightarrow 1, the expected rate of return per bet is

R=(1b+bw)p(1b)1pR = (1 - b + bw)^p(1-b)^{1-p}

and if we calculate the derivative with respect to bb, and set dRdb=0\frac{dR}{db} = 0, we find that the optimal bet fraction is

b=pw1w1.b = \frac{pw - 1}{w-1.}

Suppose we want to make two bets at the same time, with expected win probabilities of p1p_1 and p2p_2, and returns w1w_1 and w2w_2 on bet fractions b1b_1 and b2b_2. How should the bets be allocated to optimize the return?

Since p1p_1 and p2p_2 are probabilities, then 0p1,p210 \leq p_1,p_2 \leq 1, and similarly 0b1,b210 \leq b_1,b_2 \leq 1 because they represent the fractions of the total current cash on hand available to bet. A further constraint is that b1+b2<1b_1 + b_2 < 1 since we can’t bet more than the total on hand.

A final constraint is that the expectation must be greater than one for each bet, so p1w1>1p_1 w_1 > 1 and p2w2>1p_2 w_2 > 1.

Let B=1b1b2B = 1 - b_1 - b_2, which is the fraction of the capital on hand after making the two bets, and R1=b1w1,R2=b2w2R_1 = b_1w_1, R_2 = b_2w_2, which are the amounts returned on each bet for winning outcomes. Next, let

S00(b1,b2)=(B)(1p1)(1p2)both loseS01(b1,b2)=(B+R2)(1p1)p2b1 loses, b2 winsS10(b1,b2)=(B+R1)p1(1p2)b1 wins, b2 losesS11(b1,b2)=(B+R1+R2)p1p2both win\begin{aligned} S_{00}(b_1,b_2) &= (B)^{(1-p_1)(1-p_2)} \qquad \qquad \text{both lose} \\ S_{01}(b_1,b_2) &= (B+R_2)^{(1-p_1)p_2} \qquad \text{$b_1$ loses, $b_2$ wins}\\ S_{10}(b_1,b_2) &= (B+R_1)^{p_1(1-p_2)} \qquad \text{$b_1$ wins, $b_2$ loses} \\ S_{11}(b_1,b_2) &= (B+R_1+R_2)^{p_1 p_2} \qquad \text{both win}\\ \end{aligned}

Letting S(b1,b2)=S00(b1,b2)S01(b1,b2)S10(b1,b2)S11(b1,b2)S(b_1,b_2) = S_{00}(b_1,b_2) \cdot S_{01}(b_1,b_2) \cdot S_{10}(b_1,b_2) \cdot S_{11}(b_1,b_2) we need to find b1b_1 such that

S(b1,b2)b1=0.\frac{\partial S(b_1,b_2)}{\partial b_1} = 0.

Since

Sb1=Sb1log(S)\frac{\partial S}{\partial b_1} = S \frac{\partial}{\partial b_1} \log(S)

the value of b1b_1 that optimizes SS reduces to a function of the sum of the terms, which is cubic in b1b_1, but the coefficients of the polynomial are too much for Mathematica to find a closed-form solution.

Figure 2.

Coefficients of third degree polynomial of b1b_1.

Adding in a third bet would generate a quartic polynomial, and beyond that, Abel showed that polynomials of degree five or higher have no closed form solutions (see The Sum of the Sum of Some Numbers). Still, the maximum value of SS exists as seen in the surface plot:

Figure 3.

Surface plot of SS for (p1,w1)=(0.4,3),(p2,w2)=(0.5,5)(p_1,w_1) = (0.4,3),(p_2,w_2) = (0.5,5).

We also need to consider cases where the payoff ww is a continuous probability distribution. This would be the case if you invest in the stock market and the price of the stock could, in principle, rise indefinitely, or fall to zero when the company goes bankrupt. Conversely, if you short a stock, there is no effective bottom to the losses you could incur.

For continuous probability distributions, the normal distribution is often used, but others are possible and may better represent the circumstances. We also need to consider other costs such as brokerage or exchange fees, bid-ask spreads, taxes on short-term gains, slippage, and even travel costs to casinos. These extra fees reduce the optimal betting amount and require a more conservative strategy. Many have advocated for a fractional Kelly approach where the amount bet is reduced to a percentage of the calculated optimum.

In this article, we’ll extend the Kelly Criterion in four steps:

Discrete Case: Multiple Simultaneous Bets

Let’s first consider the case where the payoffs for two bets are known, and we can estimate the probabilities of winning each independently. Even though there may not be closed-form solutions for this problem, we can still find a very close numerical approximation to the optimum values for the bet fractions b1b_1 and b2b_2. In the surface plot of S(b1,b2)S(b_1,b_2), there is a maximum, and excellent numerical methods have been developed to find it. In fact, in many cases, the amount bet needs to be rounded to the nearest integer of some minimum allowable amount. For example, you couldn’t bet a fraction of a chip at a roulette table, or buy fractions of a stock.

Let’s generalize the equations to allow for an arbitrary number nn of simultaneous bets. Then

B=1k=1nbkB = 1 - \sum_{k=1}^n b_k

is the fraction of the capital remaining after the bet, and Rk=bkwkR_k = b_k w_k is the return amount if the bet pays. Each bet is binary - either it wins with probability pkp_k or it loses with probability 1pk1-p_k, and so there are 2n2^n possible combinations of win/loss outcomes for the nn bets. Let pp be the vector of probabilities of winning and ww be the vector of payoffs per unit bet. Now, if we define R=[p1w1,p2w2,,pnwn]=pwR = [p_1 w_1, p_2 w_2, \ldots, p_n w_n] = p \odot w and the index vector to be the binary representation of an integer between 00 and 2n12^n-1, then

SI=(B+RIT)Π(pI)+((1p)¬I)S_I = (B + R I^T)^{\Pi (p \odot I) + ((1-p) \odot \neg I)}

where \odot is the Hadamard, Hadamard Product: Also called the element-wise product, this operation multiplies corresponding elements of two vectors or matrices. For example, [1,2,3][4,5,6]=[4,10,18][1,2,3] \odot [4,5,6] = [4,10,18]. Note that this is different from matrix multiplication or dot products. or elementwise product of vectors.

Suppose n=3n=3, so the index vector II has values ranging from 0231=70 \ldots 2^3-1 = 7 expressed in base 22 as I={000,001,010,011,100,101,110,111}.I = \{ 000, 001, 010, 011, 100, 101, 110, 111 \}. Using this representation, we can construct each of the terms of S.S. For example, suppose we want to generate SI5=S101.S_{I_5} = S_{101}. In this case

RI5T=[p1w1p2w2p3w3][101]=p1w1+p3w3.R I_5^T = \begin{bmatrix} p_1 w_1 & p_2 w_2 & p_3 w_3 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} = p_1 w_1 + p_3 w_3.

The exponent becomes

Π([p1p2p3][101])([(1p1)(1p2)(1p3)][010])Π[p10p3]+[(0(1p2)0]Π[p1(1p2)p3]=p1(1p2)p3.\begin{aligned} & \Pi (\begin{bmatrix} p_1 & p_2 & p_3 \end{bmatrix} \odot \begin{bmatrix} 1 & 0 & 1 \end{bmatrix})(\begin{bmatrix} (1-p_1) & (1-p_2) & (1 - p_3) \end{bmatrix} \odot \begin{bmatrix} 0 & 1 & 0 \end{bmatrix}) \\ & \Pi \begin{bmatrix} p_1 & 0 & p_3 \end{bmatrix} + \begin{bmatrix} (0 & (1-p_2) & 0 \end{bmatrix} \\ & \Pi \begin{bmatrix} p_1 & (1-p_2) & p_3 \end{bmatrix} \\ & = p_1(1-p_2)p_3. \end{aligned}

Thus,

SI5=(B+p1w1+p3w3)p1(1p2)p3S_{I_5} = (B + p_1 w_1 + p_3 w_3)^{ p_1(1-p_2)p_3}

representing the case when the first and third bets paid, but the second lost. By constructing the function SS this way, we can store copies for various values of nn, and then optimize on the particular estimates for pp and ww at the moment we want to place a bet.

The Julia module kelly_disc.jl contains functions for optimizing the discrete case. Suppose you have two opportunities where p1=0.3,w1=4p_1 = 0.3, w_1 = 4, and p2=0.25,w2=6.p_2 = 0.25, w_2 = 6. If they were played independently, then the optimal bet fractions would be

b1=p1w11w11=0.34141=0.0666b2=p2w21w21=0.256161=0.1.\begin{aligned} b_1 &= \frac{p_1 w_1 - 1}{w_1 - 1} = \frac{0.3 \cdot 4 - 1}{4 - 1} = 0.0666 \\ b_2 &= \frac{p_2 w_2 - 1}{w_2 - 1} = \frac{0.25 \cdot 6 - 1}{6 - 1} = 0.1. \end{aligned}

When playing both bets simultaneously, the betting fractions change slightly:


============================================================
DISCRETE KELLY OPTIMIZATION RESULTS
============================================================

Problem Setup:
  Number of bets: 2
  Transaction fee: 0.00%

Bet Parameters:
  Bet 1: p=0.3000, w=4.0000, E[p·w]=1.2000
  Bet 2: p=0.2500, w=6.0000, E[p·w]=1.5000

Optimal Bet Fractions:
  b_1 = 0.064500 (6.45%)
  b_2 = 0.099154 (9.92%)

Total bet: 0.163655 (16.37%)
Cash held: 0.836345 (83.63%)

Max E[log(wealth)]: 0.028532
Converged: true  Iterations: 3  Method: lbfgs
============================================================

The long-term growth rate is 2.85%2.85\% per bet, so if you play 100100 similar bets, your initial stake will grow by (1+0.0285)100=16.6(1 + 0.0285)^{100} = 16.6 times.

The third input parameter to DiscreteKelly is the transaction cost, which we set to zero in the example above, but should be considered in an actual betting situation. Suppose you’re playing roulette (see Roulette Physics) and you can reliably predict where the ball will fall within an octant. It might become obvious that you’re playing several numbers that are adjacent on the wheel, so to cover this, you could throw a few chips randomly onto other numbers, which would be a transaction cost. You might also include travel expenses or taxes as a transaction cost, but this would be amortized over your stay at the casino.

The DiscreteKelly function should be restricted to a handful of simultaneous bets because the number of terms in the solution equation doubles with each new bet. If you have nn discrete bets, then the number of terms will be 2n2^n since each combination of win/lose needs to be included. For example, if you have 1010 bets, then the equation will have 210=10242^{10} = 1024 terms.

Discrete Case: Uncertain Probabilities

Unlike the previous case, where we assumed a fixed probability of winning and a known payoff, Niko Tosa didn’t know the probability exactly, but knew only a distribution. The game is still binomial - you win with probability pp and lose with probability 1p1-p, but your knowledge of pp itself is uncertain. If you use a fixed value of pp, then you are assuming more information about the outcome than you actually have.

If you assume that the distribution is a beta distribution, Beta Distribution: A flexible probability distribution defined on the interval [0,1], controlled by two parameters (α\alpha and β\beta) that determine its shape. When both parameters equal 1, it becomes a uniform distribution. then you can begin to estimate the parameters by collecting your win/loss results. The beta distribution is defined in the interval [0,1][0,1] by

xα1(1x)β1B(α,β)\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}

where

B(α,β)=Γ(α)ΓβΓ(α+β)B(\alpha,\beta) = \frac{\Gamma(\alpha)\Gamma{\beta}}{\Gamma(\alpha + \beta)}

and Γ\Gamma is the Gamma function.

Figure 4.

Beta distribution PDF.

Start with an initial guess for the probability pBeta(α0,β0)p \sim \text{Beta}(\alpha_0, \beta_0), called the prior. With no knowledge of how well you will be able to predict the outcome, set α0=138,  β0=3738\alpha_0 = \frac{1}{38}, \; \beta_0 = \frac{37}{38}, which gives equal likelihood to every pocket for an American wheel (change to 37 pockets for European). Now, begin collecting data, counting the number of wins kk and the total number of attempts nn, which will improve your estimate of the true distribution

pdataBeta(α0+k,β0+nk)p | \text{data} \sim \text{Beta}(\alpha_0+k,\beta_0+n-k)

where pdatap|\text{data} means “the probability given collected data”. The empirically derived probability peffp_{\text{eff}} is the mean of the distribution,

peff=E[pdata]=α0+kα0+β0+n.p_{\text{eff}} = \mathbb{E}[p|\text{data}] = \frac{\alpha_0 + k}{\alpha_0 + \beta_0 + n}.

An even better way to estimate the probability distribution would be to record how far off your guess is for the winning pocket. Build a vector of distances between the estimated target pocket and the true pocket, q=[q3,q2,q1,q0,q1,q2,q3]q = [q_{-3},q_{-2},q_{-1},q_0,q_1,q_2,q_3], where qiq_{-i} represents ii pockets early, q0q_0 is exactly right, and qiq_i is ii pockets late. Now you can use the Dirichlet distribution Dirichlet Distribution: A multivariate generalization of the Beta distribution used when modeling multiple categories that must sum to 1. Think of it as modeling probabilities for multiple outcomes simultaneously, such as the likelihood of a roulette ball landing in different groups of pockets., which is a multivariate extension of the Beta distribution, to develop a more complete model of the error pattern. As in the case with the Beta distribution, start with an estimated prior qDirichlet(α0)q \sim \text{Dirichlet}(\alpha_0) and collect ckc_k counts of each offset value to estimate the posterior

qdataDirichlet(α0,k+ck)Beta(α0,0+c0+k0(α0+ck)).\begin{aligned} q|\text{data} &\sim \text{Dirichlet}(\alpha_{0,k} + c_k) \\ &\sim \text{Beta}\left(\alpha_{0,0} + c_0 + \sum_{k \neq 0}(\alpha_0 + c_k) \right). \end{aligned}

Using this method provides a better estimate of the probabilities over a range of adjacent pockets. For the Dirichlet priors, the best initial estimate is α0,k=1/38\alpha_{0,k} = 1/38 for all kk.

We can compare how the uncertain probabilities affect the betting fraction to the prior case where we knew the probabilities of winning exactly. Let αi=spi\alpha_i = s p_i and βi=s(1pi),  i=1,2,\beta_i = s(1-p_i), \; i=1,2, for some small value of s,s, such as s=10,s=10, so the resulting probabilities are very uncertain. Keep the payoffs the same as previously,

s = 10.0       # weak prior, much more uncertainty about p
α1, β1 = s*p1, s*(1-p1)
α2, β2 = s*p2, s*(1-p2)
p_dists = [Beta(α1, β1), Beta(α2, β2)]

and then run the multi-Bayes example,


[0.0645002348517128, 0.09915443802025861]  total bet = 0.1636546728719714
Multi-bet Bayesian Kelly result:
  b_1 = 0.064500 (6.45%)
  b_2 = 0.099154 (9.92%)
  Cash held = 0.836345 (83.63%)
  Total bet = 0.163655 (16.37%)
  Max E[log(wealth)] = 0.028532
  Converged: true  Iterations: 8  Method: lbfgs_bayes_multi

which gives the same result as the discrete case. For a single bet with probability pp and log-utility, Log-Utility: Using logarithms to measure wealth changes captures the diminishing marginal value of money—doubling your wealth from 1M1M to 2M2M doesn’t feel as significant as doubling from 10K10K to 20K20K. This aligns with how humans actually perceive financial gains. the conditional expected log-wealth is

g(p;b)=plogWwin(b)+(1p)logWlose(b).g(p;b)=p\log W_{win}(b)+(1−p)\log W_{lose}(b).

If PP is random with some distribution (e.g., Beta), the Bayesian expected log-wealth is

EP[g(P;b)]=EP[Plog(Wwin)+(1P)log(Wlose)]=E[P]log(Wwin)+(1E[P])log(Wlose).\mathbb{E}_P[g(P;b)]=\mathbb{E}_P[P \log (W_{win})+(1−P) \log (W_{lose})]=\mathbb{E}[P]\log (W_{win})+(1−\mathbb{E}[P]) \log (W_{lose}).

so the dependence on the distribution of PP vanishes except for the mean peff=E[P].p_{\text{eff}} = \mathbb{E}[P]. If you wanted to include a variance term, you could define the objective function as

maxb{EP[g(P;b)]λVarP[g(P;b)]}\max_{\substack{b}} \left\{ \mathbb{E}_P [g(P;b)] - \lambda \text{Var}_P[g(P;b)] \right\}

where λ\lambda is a measure of your risk tolerance. Alternatively, you could run Monte Carlo simulations Monte Carlo Methods: Computational techniques that use repeated random sampling to obtain numerical results. Named after the famous casino, these methods are particularly useful when analytical solutions are difficult or impossible to find. with data collected in situ to optimize λ,\lambda, or use the method described in Distributional Robust Kelly Gambling by Qingyun Sun and Stephen Boyd.

Continuous Case: Uncertain Returns

What if we don’t know the exact return, but only its probability distribution? Instead of a fixed return ww, the payoff is a function p(w)p(w) where pp is a probability density function. In this example, the payoff reaches a maximum when w=1.75w = 1.75, with a probability of 0.8,0.8, but ww could also be 11 with probability 0.250.25 and falls away to zero for w<0w < 0 or w>3.5.w > 3.5.

Figure 5.

Payoff returns as a continuous distribution.

For a single bet with a continuous probability distribution, the wealth after betting is

R=1b+bw.R = 1 - b + bw.

The objective function is,

g(b)=E[logR]=p(w)log(1b+bw)dwg(b) = \mathbb{E}[\log R] = \int_{-\infty}^{\infty} p(w) \log(1 - b + bw) dw

and we want to maximize g(b).g(b). This is similar to Claude Shannon’s Information Entropy discussed in The Kelly Criterion,

H(X)=i=1nP(xi)log2P(xi)H(X) = - \sum_{i=1}^n P(x_i) \log_2P(x_i)

where P(xi)P(x_i) is the probability of receiving message xix_i, and XX is the vector of messages, X=[x1,x2,,xn]X=[x_1,x_2,…,x_n].

For multiple simultaneous bets with bet fractions b1,b2,,bnb_1, b_2, \ldots, b_n and payoffs w1,w2,,wnw_1, w_2, \ldots, w_n the wealth WW after one session is

W=1i=1nbi+i=1nbiwi.W = 1 - \sum_{i=1}^n b_i + \sum_{i=1}^n b_iw_i.

The objective function for multiple bets becomes

G(b1,b2,,bn)=E[logR]=E[log(1i=1nbi+i=1nbiwi)]G(b_1, b_2, \ldots, b_n) = \mathbb{E}[\log R] = \mathbb{E} \left[ \log \left( 1 - \sum_{i=1}^n b_i + \sum_{i=1}^n b_iw_i \right) \right]

with the constraints

bi0,  i=1nbi1.b_i \geq 0, \; \sum_{i=1}^n b_i \leq 1.

Multiple Continuous Kelly

Just as in the discrete case, we can extend the continuous case to include multiple simultaneous bets. If the bets are known to be independent, as might be the case if day-trading in unrelated industries, then we could invest in multiple trades simultaneously to improve the overall probability of success.

In the example above with two bets, (p1,w1)=(0.3,4.0)(p_1, w_1) = (0.3,4.0) and (p2,w2)=(0.25,6.0)(p_2, w_2) = (0.25,6.0) we found that the optimal fractions were (b1,b2)=(0.0645,0.0991).(b_1,b_2) = (0.0645, 0.0991). When the probabilities are continuous, the input parameters are dependent on the expected payoff values and the probability distribution. For the discrete case, the expected value and variance are

E[R]=pwVar(R)=p(1p)w2\begin{aligned} \mathbb{E}[R] &= pw \\ \text{Var}(R) &= p(1-p)w^2 \end{aligned}

where RR is the gross return:

For the first bet,

μR1=E[R1]=p1w1=0.34=1.2σR1=Var(R1)=p1(1p1)w12=0.3(10.3)42=3.36\begin{aligned} \mu_{R_1} &= \mathbb{E}[R_1] = p_1w_1 = 0.3 \cdot 4 = 1.2 \\ \sigma_{R_1} &= \text{Var}(R_1) = p_1(1-p_1)w_1^2 = 0.3 (1 - 0.3) \cdot 4^2 = 3.36 \end{aligned}

and for the second bet μR2=1.5\mu_{R_2} = 1.5 and Var(R2)=6.75.\text{Var}(R_2) = 6.75.

For a continuous distribution, the log-normal works well because it is bounded below by zero, so the payoff is never negative.

Figure 6.

Log-normal distributions.

The expected value RR for a log-normal distribution is E[R]=eμ+12σ2,\mathbb{E}[R] = e^{\mu + \frac{1}{2}\sigma^2}, and the variance is Var(R)=(eσ21)e2μ+σ2.\text{Var}(R) = \left( e^{\sigma^2} - 1 \right)e^{2 \mu + \sigma^2}. Then

ϕ=1+σR2μR2=eσ2σ2=ln(1+σR2μR2)\phi = 1 + \frac{\sigma_R^2}{\mu_R^2} = e^{\sigma^2} \Rightarrow \sigma^2 = \ln\left( 1 + \frac{\sigma_R^2}{\mu_R^2} \right)

and

μ=ln(μR)12σ2.\mu = \ln(\mu_R) - \frac{1}{2} \sigma^2.

(See the Appendix for details on the variable ϕ\phi.) For the first bet, μR1=1.12\mu_{R_1} = 1.12 and σR1=3.36\sigma_{R_1} = 3.36 so

ϕ1=1+3.361.223.333σ12=ln(ϕ1)1.20397μ1=ln(1.2)12σ120.182320.601990.41966\begin{aligned} \phi_1 &= 1 + \frac{3.36}{1.2^2} \approx 3.333 \\ \sigma_1^2 &= \ln(\phi_1) \approx 1.20397 \\ \mu_1 &= \ln(1.2) - \frac{1}{2} \sigma_1^2 \approx 0.18232−0.60199 \approx −0.41966 \end{aligned}

and for the second bet μR2=1.5\mu_{R_2} = 1.5 and σR2=6.75\sigma_{R_2} = 6.75 so σ221.38629\sigma_2^2 \approx 1.38629 and μ20.28768.\mu_2 \approx −0.28768.


===============================================================
Number of bets: 2
Transaction fee: 0.00%
Correlation: Independent

Bet parameters:
  Bet 1: LogNormal{Float64}, μ=1.2000, σ=1.8330
  Bet 2: LogNormal{Float64}, μ=1.5000, σ=2.5981

Optimal bet fractions:
  b_1 = 0.029763 (2.98%)
  b_2 = 0.037038 (3.70%)

Total bet: 0.066801 (6.68%)
Cash held: 0.933199 (93.32%)

Max E[log(wealth)]: 0.019828
Converged: true  Iterations: 0  Method: mc_fixed
===============================================================

Notice that the bet fractions for the continuous distribution are much smaller (2.98% and 3.70%) than for the discrete case (6.45% and 9.92%). In the discrete case, the payoff is certain and only the probability of winning is uncertain, while in the continuous case, the payoff might be any value greater than zero.

The log-normal distribution has a heavy tail near zero and a much smaller tail that extends on to infinity, so the expected value is lower. The continuous Kelly reduces the bet size to reflect the reduced expected payoff.

Continuous Case: Uncertain Probabilities and Returns

In this final extension of the Kelly Criterion, we consider the case that both the probability of winning and the payoff are taken from probability distributions. The probability of winning is taken from a Beta distribution, and the payoff is a univariate distribution, and is implemented in the module kelly_general_bayes.jl, and extends the concepts developed in the previous models. This version lets you place multiple simultaneous bets allocated optimally to increase your total capital at the maximum rate, and, as in previous models, lets you include fees or betting costs.

Traditional Kelly models assume you know the probability of winning and the payoff exactly, but we can now turn both into distributions that better reflect how you would approach an investment or betting opportunity. By modeling distributions, we have a rigorous basis for adjusting the betting fractions. For example, if we use the probabilities and payoffs from the first example above,

# Fixed means and payoffs:
p1_mean, p2_mean = 0.30, 0.25
gross_w1, gross_w2 = 4.0, 6.0

and compare a strong prior, sA=1000,s_A = 1000, to a weak prior, sB=2,s_B = 2, in two scenarios, we see that the weak prior causes the optimal fractions to be reduced by over 15%

Scenario A (strong prior): b = [0.06565483237670092, 0.0990407051985005]  total bet = 0.16469553757520142
Scenario B (weak prior):  b = [0.044006266789879726, 0.09526125397014709]  total bet = 0.13926752076002683
Fraction reduction in total bet = 15.44%

In most cases, the reduction would likely be even greater since the expected values are high in this example, E[p1w1]=0.34.0=1.2,  E[p2w2]=0.256.0=1.5.\mathbb{E}[p_1 w_1] = 0.3 \cdot 4.0 = 1.2, \; \mathbb{E}[p_2 w_2] = 0.25 \cdot 6.0 = 1.5. With strong conviction (tight probability prior) and near-known payoff distributions, the model recovers something very close to classic Kelly fractions, but with weak conviction (diffuse probability priors) or large payoff variance, the optimal fractions naturally shrink, reflecting risk from estimation error or payoff tail risk.

With multiple simultaneous bets, you can allocate capital not just by edge but also by uncertainty, return variability, and covariance if enabled. Here’s how to use this model:

  1. Define your model beliefs

    • Choose for each bet ii: a prior distribution for pip_i (e.g., Beta(α,β)), a win and loss return distribution (e.g., LogNormal(μ,σ)).

    • Optionally, a correlation matrix if bet returns are dependent.

  2. Instantiate the model

k = GeneralBayesianKelly(p_dists      = [ … ],
                         return_dists = [ … ],
                         loss_dists   = [ … ],
                         fee          = f,
                         correlation  = Corr)
  1. Optimize for stake fractions
res = optimize_gen_bayes(k; method=:lbfgs, n_samples=…, rng=…)

The result res.b gives the vector of optimal bet fractions, res.cash the un-bet cash fraction, and res.objective the achieved expected log-wealth.

  1. Inspect and plot Use plot_objective_gen_bayes to explore how expected log-wealth varies with each bib_i, or compare different prior strengths and payoff variances to study their effect on allocation.

The kelly_general_bayes.jl lets you model real-world conditions much more accurately than the original single bet, fixed probability, and fixed payoff model.

You can also run Monte Carlo simulations,

include("kelly_general_bayes.jl")
using .KellyGeneralBayes
using Random

# Base model
(k, res_kelly) = run_examples_general()

# Make some alternative strategies to compare:
# e.g., half-Kelly and a "conservative" tweak
res_half = GenBayesKellyResult(0.5 .* res_kelly.b,
                               1.0 - k.fee - sum(0.5 .* res_kelly.b),
                               res_kelly.objective, true, 0, :manual_half)

res_zero = GenBayesKellyResult([0.0, 0.0], 1.0 - k.fee, 0.0, true, 0, :cash_only)

fig = compare_strategies_monte_carlo(
    k,
    ["Full Kelly" => res_kelly,
     "Half Kelly" => res_half];
    n_iters=3000,
    n_horizon=200,
    bands=((0.10,0.90),(0.25,0.75)),
    bins=40,
    rng=MersenneTwister(123)
)

display(fig)

which shows the difference between a conservative half-Kelly and the full model:

Figure 7.

Wealth trajectories.

Transaction Fees and Costs

You need to consider various costs that might reduce your overall winnings. While playing roulette, you might spread a few “dummy” bets around the table to hide the fact that your system is consistently selecting a few winning pockets. For day trading, there are brokerage fees and taxes to consider, among others. In the code, we simply subtract the fee from available capital: B=1bifeeB = 1 - \sum b_i - \text{fee} to account for these losses. Some extra costs you should consider are,

You should also consider the possibility of naturally occurring long strings of losses. Imagine playing a game involving a coin toss in which heads is a win and tails is a loss. The law of large numbers says that over many tosses, the number of heads will approximately equal the number of tails, but there will always be long runs of tails that could happen at any time. If the run is long enough, no matter how large your initial capital, you will go bust. This is known as gambler’s ruin, and we need to account for the possibility of such an occurrence.

⚠️ Common Pitfalls When Using Kelly

  1. Overestimating your edge - The #1 error. Kelly assumes you know probabilities accurately.
  2. Ignoring correlation - Betting on multiple correlated outcomes isn’t really diversification.
  3. Forgetting transaction costs - Even small fees dramatically reduce optimal bet sizes.
  4. Not rebalancing - Kelly fractions need updating as your wealth changes, preferably after each transaction.
  5. Using arithmetic means - Kelly requires geometric thinking about compound growth.

Possible Applications

Several investment or betting opportunities could benefit from the extended Kelly-Criterion framework:

Quantitative trading strategies / algorithmic strategies

​ Many algorithmic trading or systematic strategies have the following characteristics:

Sports betting or wagering markets

In sports betting, you may estimate true win probabilities of teams or players (based on models/data) and compare them with bookmaker odds. There is inherent uncertainty in your probability estimates, and payoffs are known (or nearly known) when you win, but you can treat them as distributions if you include, e.g., parlay possibilities.

Options or derivatives trades with modelled edge

In options trading, you may have a view that an option is mispriced (so you estimate a probability of a favorable outcome), and you have an estimate of payoff distribution (the option payoff is a known structure, but the underlying volatility or tail risk is uncertain).

Summary

These extensions to the original Kelly Criterion bring you much closer to realistic betting/investing opportunities by modeling uncertain probabilities of winning and uncertain payoff amounts, optimizing over multiple simultaneous bets, and including transaction fees. The models check that your total bet fraction is never greater than one, and each bet has an expected value greater than one.

These methods only optimize the betting fractions, so you still need to accurately estimate the chances of winning and how much you expect to make on each investment. Many investments could benefit from applying the extended Kelly Criterion, but be sure to thoroughly understand each and be careful when developing the distributions used.

Where to Go From Here

The mathematics we’ve covered transforms the simple Kelly Criterion into a practical tool for real-world decision-making. But understanding the theory is just the beginning—the real value comes from experimentation and application.

Start conservative, then explore: Begin with the discrete case using small, hypothetical portfolios. Watch how the optimal fractions change as you adjust probabilities and payoffs. Notice how uncertainty in your estimates naturally reduces the recommended bet sizes—this is the model protecting you from overconfidence.

Build intuition through simulation: Run Monte Carlo simulations comparing full Kelly to half-Kelly strategies. You’ll see that while full Kelly maximizes long-term growth, it also experiences dramatic drawdowns. This visceral understanding of the volatility-growth tradeoff is something equations alone can’t teach.

Test with real data: Apply these models to historical data—stock prices, sports betting odds, or even board game probabilities. The gap between theoretical optimal betting and practical constraints will become immediately apparent. Transaction costs matter. Estimation error matters. The frequency of rebalancing matters.

Remember the limits: These models optimize how much to bet, not what to bet on. No amount of mathematical sophistication can turn a bad edge into a good one. The Kelly Criterion’s most important lesson might be knowing when not to bet at all.

The Julia code accompanying this article gives you a laboratory for exploring these ideas. Whether you’re managing an investment portfolio, analyzing strategic decisions, or just curious about optimal resource allocation, the extended Kelly framework offers a rigorous foundation for thinking about risk, uncertainty, and growth.

Start experimenting. Start small. And most importantly, start learning from what the models tell you about your own assumptions.

Experiments to Try

Ready to dig deeper? Here are hands-on experiments to build your intuition:

Beginner Experiments

  1. The Overconfidence Test

    • Set up two bets with p₁=0.3, w₁=4 and p₂=0.25, w₂=6
    • Compare optimal fractions using strong prior (s=1000) vs weak prior (s=10)
    • Question: How much do your bet sizes shrink when you’re less certain?
  2. The Correlation Explorer

    • Create two bets with identical expected values
    • Run optimizations with correlation = 0, 0.5, and 0.9
    • Question: How does correlation between bets affect diversification benefits?
  3. The Transaction Cost Reality Check

    • Take any optimal betting strategy
    • Gradually increase the fee parameter from 0% to 5%
    • Question: At what fee level does the strategy become unprofitable?

Intermediate Experiments

  1. Discrete vs Continuous Comparison

    • Set up the same scenario using both discrete and continuous models
    • Compare the optimal fractions (you’ll see continuous gives smaller bets)
    • Question: Why does uncertainty in payoff reduce bet size more than uncertainty in probability?
  2. The Gambler’s Ruin Simulator

    • Start with $1,000 and a favorable bet (p=0.55, w=2)
    • Use full Kelly fractions for 1,000 rounds
    • Run 100 simulations and plot the distribution of outcomes
    • Question: How many simulations went bust despite positive expected value?
  3. Half-Kelly vs Full Kelly Battle

    • Run 10,000 iterations comparing both strategies
    • Track: median wealth, maximum drawdown, probability of 50% loss
    • Question: Is the extra volatility of full Kelly worth the higher growth rate?

Advanced Experiments

  1. Multi-Asset Portfolio Optimization

    • Create 5 independent betting opportunities with varying risk/reward
    • Optimize simultaneously vs optimizing each independently
    • Question: How much does simultaneous optimization improve expected growth?
  2. Bayesian Learning Simulation

    • Start with a weak prior (Beta(1,1)) for win probability
    • Simulate 100 bets, updating your belief distribution after each
    • Watch optimal fractions evolve as your estimates improve
    • Question: How many observations until your bets stabilize?
  3. Stress Testing with Fat Tails

    • Replace log-normal return distribution with Student’s t-distribution (df=3)
    • Compare optimal fractions to the log-normal case
    • Question: How should you adjust betting when extreme outcomes are more likely?
  4. The Rebalancing Frequency Study

    • Set up a two-bet continuous strategy
    • Compare: rebalance every bet vs every 10 bets vs never rebalance
    • Question: How much growth do you sacrifice by not rebalancing?

Real-World Application Challenge

  1. Your Own Portfolio Analysis
    • Take 3-5 investments you’re considering (or currently hold)
    • Estimate probability distributions for each using historical data
    • Run the general Bayesian Kelly optimization
    • Compare recommended fractions to standard “equal weight” or “market cap weight”
    • Question: How different are Kelly-optimal weights from conventional wisdom?

Debugging Your Intuition

  1. The Impossible Bet Exercise
    • Try to optimize a bet with p=0.4, w=2 (expected value < 1)
    • Watch the optimizer recommend b=0
    • Try p=0.5, w=2.1 (barely favorable)
    • Question: How close to break-even must you be before Kelly says “don’t bet”?

Documentation and Sharing

For each experiment:

The best way to learn Kelly optimization isn’t through equations—it’s through breaking the models, stress-testing assumptions, and discovering edge cases. Start experimenting today.

Glossary

Code for this article

The complete Julia implementation is available at Extended Kelly Criterion — Julia Implementation. The code for each section is

This repository provides a full suite of four standalone Kelly-optimization models, progressing from classical fixed-probability betting to general Bayesian optimization with uncertain probabilities and uncertain return distributions.

Each module is self-contained: optimization routines, Monte Carlo tools, plotting utilities, and examples are implemented within each .jl file—no external utility files are required.

Software

References

Image credits


Appendix

What is ϕ\phi?

In the equation

ϕ=1+σR2μR2=eσ2,\phi = 1 + \frac{\sigma_R^2}{\mu_R^2} = e^{\sigma^2},

the symbol ϕ\phi is just a temporary variable used to simplify the algebra. It represents the multiplicative ratio

ϕ=E[R2]E[R]2. \phi = \frac{\mathbb{E}[R^2]}{\mathbb{E}[R]^2}.

For a lognormal distribution, this ratio always equals eσ2e^{\sigma^2}.

So:

Derivation from log-normal moments

Let RR be log-normal:

R=eX,XN(μ,σ2).R = e^X,\qquad X \sim N(\mu,\sigma^2).

1. Compute the first and second moments

E[R]=eμ+12σ2E[R2]=e2μ+2σ2\begin{aligned} \mathbb{E}[R] &= e^{\mu + \frac{1}{2}\sigma^2} \\ \mathbb{E}[R^2] &= e^{2\mu + 2\sigma^2} \end{aligned}

2. Form the ratio

E[R2]E[R]2=e2μ+2σ2e2(μ+σ2/2).\frac{\mathbb{E}[R^2]}{\mathbb{E}[R]^2} = \frac{e^{2\mu + 2\sigma^2}}{e^{2(\mu + \sigma^2/2)}}.

Simplify the exponent:

Difference:

(2μ+2σ2)(2μ+σ2)=σ2.(2\mu + 2\sigma^2) - (2\mu + \sigma^2) = \sigma^2.

Thus

E[R2]E[R]2=eσ2.\frac{\mathbb{E}[R^2]}{\mathbb{E}[R]^2} = e^{\sigma^2}.

So we define this ratio:

ϕ:=E[R2]E[R]2=eσ2.\phi := \frac{\mathbb{E}[R^2]}{\mathbb{E}[R]^2} = e^{\sigma^2}.

Connect ϕ\phi to mean and variance

Variance identity:

Var(R)=E[R2]E[R]2.\mathrm{Var}(R) = \mathbb{E}[R^2] - \mathbb{E}[R]^2.

Solve for E[R2]\mathbb{E}[R^2]:

E[R2]=Var(R)+E[R]2.\mathbb{E}[R^2] = \mathrm{Var}(R) + \mathbb{E}[R]^2.

Divide by E[R]2\mathbb{E}[R]^2:

E[R2]E[R]2=Var(R)E[R]2+1.\frac{\mathbb{E}[R^2]}{\mathbb{E}[R]^2} = \frac{\mathrm{Var}(R)}{\mathbb{E}[R]^2} + 1.

But the left side is ϕ\phi. Therefore:

ϕ=1+σR2μR2.\phi = 1 + \frac{\sigma_R^2}{\mu_R^2}.

So we have both expressions:

ϕ=1+σR2μR2=E[R2]E[R]2=eσ2.\begin{aligned} \phi = 1 + \frac{\sigma_R^2}{\mu_R^2} = \frac{\mathbb{E}[R^2]}{\mathbb{E}[R]^2} = e^{\sigma^2}. \end{aligned}

Summary