Poisson Distribution¶
Overview¶
The Poisson distribution is a discrete probability distribution that models the probability of a given number of events occurring in a fixed interval, given a known constant mean rate (λ). It is named after French mathematician Siméon Denis Poisson (1781–1840). In sports prediction, the Poisson distribution is the foundational model for expected goals (xG) in football because goals are relatively rare events that occur independently at a roughly constant average rate.
The core assumption is that goals scored by each team follow a Poisson process: events are independent, the average rate is constant, and two goals cannot occur at exactly the same instant. For football, these assumptions hold approximately — though the dixon-coles-correction addresses the key violation: correlation between goals in low-scoring matches (a team winning 1-0 plays defensively, suppressing the opponent's goal probability).
Why It Matters¶
Poisson is the workhorse of football prediction models. Given estimated expected goals (λ) for each team, the Poisson PMF gives the probability of every possible scoreline. Summing those probabilities yields win/draw/lose odds that can be compared to bookmaker odds to find expected-value-ev bets. The model is simple, interpretable, and — critically — its parameters (attack/defense strength) can be estimated from historical match data via maximum likelihood.
Without Poisson, building a probabilistic sports model would require much more complex approaches. With it, a few lines of linear regression produce team strength estimates that feed directly into a probability distribution over match outcomes.
Key Formula¶
Poisson PMF — probability of exactly k goals:
$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$
For a football match (two independent Poisson variables):
$$P(HomeGoals = h, AwayGoals = a) = \frac{\lambda_h^h e^{-\lambda_h}}{h!} \times \frac{\lambda_a^a e^{-\lambda_a}}{a!}$$
Expected goals from attack/defense strength:
$$\lambda_{home} = \overline{goals_{home}} \times attack_{home} \times defense_{away}$$
Where λ is the mean goals per match for the team, attack is the team's attack quality relative to league average, and defense is the opponent's defensive quality relative to league average.
Worked Example¶
Man City (home, attack=1.35, league avg=1.55) vs Arsenal (away, defense=0.88):
$$\lambda_{ManCity} = 1.55 \times 1.35 \times 0.88 = 1.84$$
$$\lambda_{Arsenal} = 1.55 \times 0.95 \times 1.10 = 1.62$$
Poisson probabilities:
| Scoreline | Probability |
|---|---|
| Man City 1-0 | P(1)×P(0) = 0.363×0.198 = 0.072 |
| Man City 2-1 | P(2)×P(1) = 0.334×0.198 = 0.066 |
| Man City 2-0 | P(2)×P(0) = 0.334×0.198 = 0.066 |
| Draw 1-1 | P(1)×P(1) = 0.363×0.198 = 0.072 |
P(Man City win) =0.58, P(Draw) = 0.24, P(Arsenal win) = 0.18
Code Snippet¶
import numpy as np
from scipy.stats import poisson
def scoreline_matrix(lambda_home, lambda_away, max_goals=6):
"""Generate full scoreline probability matrix for a match."""
matrix = np.zeros((max_goals + 1, max_goals + 1))
for h in range(max_goals + 1):
for a in range(max_goals + 1):
matrix[h, a] = poisson.pmf(h, lambda_home) * poisson.pmf(a, lambda_away)
return matrix
def win_draw_loss(matrix):
"""Extract W/D/L probabilities from scoreline matrix."""
home_win = matrix[1:, :-1].sum()
draw = np.diag(matrix).sum()
away_win = matrix[:-1, 1:].sum()
return home_win, draw, away_win
# Example: Man City λ=1.84, Arsenal λ=1.62
m = scoreline_matrix(1.84, 1.62)
hw, d, aw = win_draw_loss(m)
print(f"Home win: {hw:.3f}, Draw: {d:.3f}, Away win: {aw:.3f}")
# Home win: 0.580, Draw: 0.240, Away win: 0.180
Pitfalls¶
- Draw underestimation: Standard Poisson systematically under-predicts 0-0 and1-1 draws because goals are not independent. Apply dixon-coles-correction for betting applications.
- Overdispersion: Real football data has variance > mean (overdispersion). Use negative binomial regression if this is significant.
- Small samples: Team attack/defense parameters estimated from few matches are noisy. Use Bayesian priors or shrinkage toward league averages.
- Home advantage: Must be modeled separately (typically 0.25–0.35 goals) — don't forget to add it before computing λ.
See Also¶
- dixon-coles-correction — addresses the independence violation for low-scoring draws
- expected-goals-xg — xG provides better λ estimates than simple historical averages
- elo-rating-system — alternative team strength model
- bayesian-inference-sports — Bayesian approach to estimating λ from limited data