Poisson-ELO Ensemble¶

Overview¶

The Poisson-ELO ensemble combines two complementary modeling approaches — Poisson regression for goal-scoring rates and ELO rating systems for team strength — via a weighted average to produce match predictions. The intuition is that Poisson models capture the stochastic nature of football scoring (goals are rare, independent events fitting a Poisson process), while ELO captures systematic strength differences and form dynamics. Blending them reduces the weaknesses of each approach.

The Poisson model estimates expected goals (λ) for each team, incorporating home advantage and attack/defense strength parameters. The ELO model produces pairwise win probabilities from rating differences. The ensemble combines their 1X2 probability outputs with weights determined by backtesting performance.

Why It Matters¶

The ensemble matters because:
1. Complementarity: Poisson captures goal-scoring mechanics; ELO captures opponent-adjusted strength
2. Reduced overfitting: Each model alone may overfit to specific patterns; blending reduces this
3. Better calibration: Each model's raw probabilities are typically overconfident; ensembling improves calibration
4. Adaptive weighting: The optimal blend weight varies by league — can be tuned per tournament

Architecture¶

Step 1: Poisson layer
- Estimate λ_home and λ_away using historical match data
- Parameters: home attack strength, away defense strength, home advantage term
- Output: P(Home), P(Draw), P(Away) via Poisson score distribution

Step 2: ELO layer
- Maintain team ratings updated after each match
- Compute expected score from rating difference
- Convert to 1X2 probabilities via calibration
- Output: P(Home), P(Draw), P(Away)

Step 3: Weighted ensemble

P_ensemble(Home) = w × P_poisson(Home) + (1−w) × P_elo(Home)
P_ensemble(Draw) = w × P_poisson(Draw) + (1−w) × P_elo(Draw)
P_ensemble(Away) = w × P_poisson(Away) + (1−w) × P_elo(Away)

Weight w tuned via walk-forward validation (typically 0.4–0.7)
Normalize outputs to sum to 1.0

Worked Example¶

Man City (ELO1850) vs Arsenal (ELO 1820), home advantage = 65

ELO layer:
E_home = 1 / (1 + 10^(−(1850−1820−65)/400)) = 0.558
P_elo = {home: 0.558, draw: 0.24, away: 0.202}

Poisson layer (λ_home=1.84, λ_away=1.62):
P_poisson = {home: 0.580, draw: 0.240, away: 0.180}

Ensemble (w=0.6):
P = {home: 0.6×0.580 + 0.4×0.558 = 0.571, draw: 0.6×0.240 + 0.4×0.240 = 0.240, away: 0.6×0.180 + 0.4×0.202 = 0.189}

Code Snippet¶

import numpy as np
from scipy.stats import poisson

def poisson_elo_ensemble(home_team, away_team, poisson_probs, elo_probs, weight=0.6):
    """Combine Poisson and ELO probability outputs."""
    ensemble = {
        "home": weight * poisson_probs["home"] + (1 - weight) * elo_probs["home"],
        "draw": weight * poisson_probs["draw"] + (1 - weight) * elo_probs["draw"],
        "away": weight * poisson_probs["away"] + (1 - weight) * elo_probs["away"],
    }
    total = sum(ensemble.values())
    for k in ensemble:
        ensemble[k] /= total
    return ensemble

def optimize_weight(validation_matches, poisson_model, elo_model):
    """Walk-forward weight optimization."""
    best_weight, best_score = 0.5, float("inf")
    for w in np.arange(0, 1.05, 0.05):
        logloss_sum = 0
        for home, away, result in validation_matches:
            p_poisson = poisson_model.predict(home, away)
            p_elo = elo_model.predict(home, away)
            p_ensemble = poisson_elo_ensemble(home, away, p_poisson, p_elo, w)
            if result == "home": logloss_sum -= np.log(p_ensemble["home"])
            elif result == "draw": logloss_sum -= np.log(p_ensemble["draw"])
            else: logloss_sum -= np.log(p_ensemble["away"])
        if logloss_sum < best_score:
            best_score, best_weight = logloss_sum, w
    return best_weight

Pitfalls¶

Weight optimization on small samples: World Cup (64 matches) is too small to reliably optimize ensemble weights. Borrow weights from league-level optimization.
Both models need calibration: Raw outputs from Poisson and ELO are typically overconfident. Calibrate before ensembling.
Dixon-Coles enhancement: Apply DC correction to Poisson λ to account for low-scoring draw underestimation.
Stale ELO: ELO ratings need regular updates. Stale ELO produces worse predictions.