Bayesian Inference in Sports Prediction¶

Overview¶

Bayesian inference is a method of statistical inference where probabilities are updated as new evidence becomes available. In sports prediction, it allows modelers to start with prior beliefs about team strength (from historical data, ELO ratings, etc.) and update those beliefs based on new match results.

The core mechanism is Bayes' theorem: P(θ|data) ∝ P(data|θ) × P(θ). Starting with a prior distribution over team strength, observing match results (data) produces a posterior distribution that better reflects true team quality. This is particularly powerful for sports betting because: (a) rich historical data exists to form priors, (b) match results arrive sequentially enabling continuous updating, and (c) uncertainty quantification is naturally incorporated.

Why It Matters¶

Bayesian approaches are foundational to modern sports prediction because:
1. Small sample handling: World Cup group stage has only 3 matches per team — Bayesian priors prevent wild parameter estimates from small samples
2. Uncertainty quantification: Posterior distributions give confidence intervals, not just point estimates — essential for Kelly staking
3. Hierarchical models: Share information across teams (e.g., by region) to improve estimates for teams with few matches
4. Natural combination: Multiple information sources (ELO, xG, head-to-head) combine via Bayes' theorem into a single posterior

The key advantage over frequentist approaches is that Bayesian methods make full use of prior information, which is critical when data is limited.

Key Formula¶

Bayes' theorem:

$$P(\theta | data) = \frac{P(data | \theta) \times P(theta)}{P(data)}$$

Where P(θ) = prior, P(data|θ) = likelihood, P(data) = marginal likelihood (normalizing constant), P(θ|data) = posterior.

For Poisson team strength (conjugate Gamma prior):

If prior: λ ~ Gamma(α, β) and data: goals ~ Poisson(λ), then posterior: λ | data ~ Gamma(α + Σgoals, β + n)

Posterior predictive: Integrate over all possible λ values weighted by posterior probability to get prediction.

Worked Example¶

Prior: Brazil's goal-scoring rate λ_Brazil ~ Gamma(α=10, β=1) (mean=10, variance=10)

Observed data: Brazil's last 5 matches: 3, 2, 4, 1, 2 goals (total=12)

Posterior: λ_Brazil | data ~ Gamma(α'=10+12=22, β'=1+5=6) → mean=22/6=3.67

Prediction: Use posterior predictive distribution — average Poisson probability over all λ values weighted by Gamma posterior.

Code Snippet¶

import numpy as np
from scipy.stats import beta, gamma, norm

def bayesian_team_strength(observed_results, prior_mean=0, prior_var=400**2):
    """
    Bayesian update for team strength from observed match results.
    observed_results: list of (opponent_rating, margin) tuples
    prior_mean: prior strength estimate (e.g., ELO rating)
    prior_var: prior variance
    """
    prior_precision = 1 / prior_var
    n = len(observed_results)
    like_var = 300**2  # margin variance per game
    like_precision = n / like_var

    post_precision = prior_precision + like_precision
    post_mean = (prior_mean * prior_precision + sum(
        prior_mean + m for _, m in observed_results
    ) * like_precision / n) / post_precision
    post_var = 1 / post_precision
    return post_mean, post_var

def posterior_win_prob(home_strength, home_var, away_strength, away_var):
    """Compute P(home wins) from posterior distributions."""
    diff_mean = home_strength - away_strength
    diff_var = home_var + away_var
    from scipy.stats import norm
    return norm.cdf(diff_mean / np.sqrt(diff_var))

# Example
home_mean, home_var = bayesian_team_strength([(1800, 1), (1750, 2)], prior_mean=1700, prior_var=400**2)
away_mean, away_var = bayesian_team_strength([(1700, -1), (1720, 0)], prior_mean=1700, prior_var=400**2)
prob = posterior_win_prob(home_mean, home_var, away_mean, away_var)
print(f"P(home win): {prob:.1%}")

Pitfalls¶

Prior sensitivity: Results can depend heavily on prior choice. Use informative priors from historical data and test sensitivity.
Computational cost: MCMC methods (PyMC, Stan) for complex models add significant computational overhead.
Conjugate prior availability: Not all distributions have conjugate priors — numerical methods required for general cases.
Calibration needed: Bayesian posterior estimates can still be miscalibrated — always check with calibration-plots and brier-score.