Dixon–Coles Python Walkthrough (dashee87)

Summary

The dashee87 blog post "Predicting Football Results With Statistical Modelling: Dixon-Coles and Time-Weighting" provides a complete, readable Python implementation of the Dixon–Coles model from first principles. The author builds up from basic Poisson to the full DC correction with time-weighting, using real football data and matplotlib visualizations. This is one of the clearest practical implementations available for understanding how DC works in code.

The post explicitly references Maher (1982) as the foundation and extends the model with time-weighting, which Dixon and Coles proposed but which is often omitted in simplified implementations. The author demonstrates the effect of the rho parameter on low-scoring draws and shows how time-weighting affects parameter estimates.

Key Concepts

  • Building from Maher (1982): The post starts with the basic independent Poisson model and shows exactly where DC adds its correction
  • Time-weighting implementation: Full implementation of the exponential decay weighting: w = exp(-δ × days_since_match)
  • Maximum likelihood in code: Shows the negative log-likelihood function and scipy.optimize.minimize usage
  • rho estimation: Shows how rho is estimated alongside attack/defense parameters via MLE
  • Visualization: Shows the effect of time-weighting on team strength estimates and the effect of rho on low-scoring draw probabilities
  • Practical considerations: Discusses choosing δ (decay parameter), handling sparse data, and convergence issues in optimization

Key Code Patterns

from scipy.optimize import minimize
from scipy.stats import poisson
import numpy as np

def dc_log_likelihood(params, home_goals, away_goals, match_days, rho):
    """Negative log-likelihood for Dixon-Coles model."""
    alpha = params[:n_teams]      # attack parameters
    beta = params[n_teams:2*n_teams]  # defense parameters
    gamma = params[2*n_teams]    # home advantage

    ll = 0
    for i in range(len(home_goals)):
        h, a = home_goals[i], away_goals[i]
        lam = np.exp(alpha[home_team[i]] - beta[away_team[i]] + gamma)
        mu = np.exp(alpha[away_team[i]] - beta[home_team[i]])

        # Time weight
        days_elapsed = (max_day - match_days[i]).days
        weight = np.exp(-delta * days_elapsed)

        # Base Poisson
        p_base = poisson.pmf(h, lam) * poisson.pmf(a, mu)

        # DC correction
        if h <= 1 and a <= 1 and not (h == 1 and a == 1):
            p = p_base * (1 - rho)
        elif h == 1 and a == 1:
            p = p_base * (1 + rho)
        else:
            p = p_base

        ll += weight * np.log(p)

    return -ll

# Optimization
result = minimize(dc_log_likelihood, init_params, args=(...),
                  method='L-BFGS-B',
                  bounds=[(0.1, 5)] * (2*n_teams + 1) + [(-0.5, 0.5)])

Notes

  • This is the best practical Python implementation reference for the Dixon–Coles model — the existing dixon-coles-correction.md note's code is similar but this blog provides more narrative context
  • The time-weighting implementation is particularly valuable for World Cup modeling where recent form matters
  • The post demonstrates the effect of rho empirically: shows that without rho,0-0 draws are under-predicted by ~15%
  • Author references both Maher (1982) and Dixon–Coles (1997) as foundations — good for understanding the academic lineage
  • This source adds a worked example with real data visualization that the academic paper doesn't provide