Sports Models Kb
Compiled knowledge base — 62 articles
Sources
API-Football
API-Football (operated by api-sports.io) is one of the most comprehensive football data APIs available, covering over 1,200 competitions worldwide. It provides fixtures, match results, player and team…
Arpad Elo — The Rating of Chessplayers, Past& Present
Arpad Elo's 1978 book "The Rating of Chessplayers, Past & Present" is the original authoritative source on the Elo rating system, written by its inventor. The book provides the complete mathematical d…
Backtesting Framework
A backtesting framework systematically evaluates a sports betting model by simulating how it would have performed on historical data. The framework must mimic real deployment conditions: using only in…
Bayesian Inference in Sports Prediction
Bayesian inference is a method of statistical inference where probabilities are updated as new evidence becomes available. In sports prediction, it allows modelers to start with prior beliefs about te…
Bradley–Terry Model
The Bradley–Terry model is a probability model for predicting the outcome of pairwise comparisons between items, teams, or objects. Given two items i and j with positive strength parameters p_i and p_…
Brier (1950) — Original Paper Reference
Glenn W. Brier's 1950 paper "Verification of a Forecast Expressed in Terms of Probability" (Monthly Weather Review, Vol. 78, No. 1, pp. 1–3) introduced the Brier score as a proper scoring rule for eva…
Brier Score
The Brier score is a strictly proper scoring rule for measuring the accuracy of probabilistic predictions. For binary outcomes, it is the mean squared error between predicted probabilities and actual …
Building Reproducible xG Models from StatsBomb Open Data
This academic paper (available on ResearchGate) presents a reproducible xG modeling pipeline using StatsBomb's open data, comparing logistic regression and mixed-effects models for xG prediction. The …
Calibration Plots
Calibration plots (also called reliability diagrams or calibration curves) are visual tools for checking whether a model's predicted probabilities match actual outcome frequencies. A perfectly calibra…
Closing Line Value (CLV)
Closing line value (CLV) is the difference between the odds a bettor locked in when placing a bet and the final odds the sportsbook offered right before the game started. It is one of the most importa…
De-vigging (Removing the Vigorish)
De-vigging (also called "removing the vig" or "de-minting") is the process of converting bookmaker odds with built-in margin into "fair" odds that reflect the true implied probabilities. The sportsboo…
Dixon& Coles (1997) — Original Paper
Mark J. Dixon and Stuart G. Coles' 1997 paper "Modelling Association Football Scores and Inefficiencies in the Football Betting Market" (Journal of the Royal Statistical Society: Series C, Vol. 46, No…
Dixon–Coles Correction
The Dixon–Coles model is a statistical framework for modeling association football (soccer) match outcomes. It was introduced by Mark Dixon and Sam Coles in their 1997 paper "Modelling Association Foo…
Dixon–Coles Python Walkthrough (dashee87)
The dashee87 blog post "Predicting Football Results With Statistical Modelling: Dixon-Coles and Time-Weighting" provides a complete, readable Python implementation of the Dixon–Coles model from first …
Elo Rating System
The Elo rating system is a method for calculating the relative skill levels of players in zero-sum two-player games. It was invented by Hungarian-American chess master and physics professor Arpad Elo …
Expected Goals (xG)
Expected goals (xG) is a statistical metric in association football that assigns a probability to each shot resulting in a goal. By summing these probabilities across a match, season, or set of shots,…
Expected Value (EV)
Expected value (EV) in sports betting is the average amount a bettor expects to win or lose per unit wagered if they placed the same bet many times. It is the fundamental mathematical concept behind p…
FIFA Elo Ranking Methodology
The FIFA World Ranking system uses an Elo-based methodology (officially called the "SUM method") for international football, adapted from the classic Elo system with several important modifications: n…
FiveThirtyEight NBA Elo Methodology
FiveThirtyEight published detailed methodology documents for their sports prediction models, including the NBA Elo system. Their approach adapts the classic Elo formula with sport-specific modificatio…
Football-Data.co.uk
Football-Data.co.uk is a free resource providing computer-ready historical football results, match statistics, and betting odds data for 22+ European football divisions across 25+ seasons, dating back…
Forecasting: Principles and Practice — Time Series Cross-Validation
"Forecasting: Principles and Practice" (3rd edition, Hyndman & Athanasopoulos) is the authoritative open-source textbook on forecasting, and its chapter on time series cross-validation is the definiti…
Glicko-2 Original Paper (Mark Glickman)
Mark Glickman's original Glicko-2 paper (available at glicko.net) provides the complete mathematical specification of the Glicko-2 rating system, including the iterative algorithm for estimating the r…
Glicko-2 Rating System
The Glicko rating system was invented by Mark Glickman in 1995 as an improvement on the Elo rating system. Its principal innovation is the introduction of a "rating deviation" (RD) — a measure of a pl…
Half-Kelly
Half-Kelly is the practice of betting 50% of the mathematically optimal Kelly fraction. It is the industry standard for sports betting applications because it strikes a balance between growth maximiza…
Kaggle Walk-Forward Validation Notebook
This Kaggle notebook provides a practical Python implementation of walk-forward validation for time series, demonstrating the technique on financial data with expanding and rolling windows. The notebo…
Kelly (1956) — A New Interpretation of Information Rate
John Larry Kelly Jr.'s 1956 paper "A New Interpretation of Information Rate" (Bell System Technical Journal, Vol. 35, pp. 917–926), available via Princeton's website, is the original publication intro…
Kelly Criterion
The Kelly criterion (or Kelly bet/Kelly strategy) is a formula for sizing a sequence of bets to maximize the long-term expected value of the logarithm of wealth (equivalently, to maximize the long-ter…
Kelly Criterion Blog (Stanford)
This Stanford blog post provides one of the clearest accessible explanations of the Kelly criterion, walking through the original Kelly1956 paper's key insights with illuminating examples. The author …
Line Movement & Steam Moves
Line movement refers to how sportsbook odds change over time from when they open until the market closes (game time). These changes reflect new information (injuries, weather, lineup changes), betting…
Log-Loss (Cross-Entropy)
Log-loss (also called cross-entropy loss or logarithmic loss) is a scoring rule that measures the quality of probabilistic predictions. For binary outcomes, it is the negative average of log-probabili…
Maher (1982) — Modelling Association Football Scores
M.J. Maher's 1982 paper "Modelling Association Football Scores" (Statistica Neerlandica, Vol. 36, pp. 109–118) is the foundational academic work establishing the independent Poisson model for football…
Market Efficiency (Sharp vs. Soft Bookmakers)
Market efficiency in sports betting refers to how accurately and quickly bookmaker odds reflect the true probabilities of outcomes. An efficient market prices odds such that the sum of implied probabi…
Massey Ratings
Massey ratings are a method of ranking sports teams based on their performance history, invented by Kenneth Massey. Unlike Elo which updates incrementally after each game, Massey ratings solve a syste…
ML-KULeuven soccer_xg — Open-Source xG Implementation
The soccer_xg GitHub repository by KU Leuven's ML group is an open-source Python package for training and analyzing expected goals (xG) models using the SPADL (Spatial Action Data Format) event stream…
Multi-Sport Prediction Stack
A multi-sport prediction stack is a unified modeling architecture that shares infrastructure, feature engineering, and calibration logic across football (soccer), NFL, NBA, and other sports. Rather th…
OddsJam
OddsJam is a sports betting data platform built specifically for serious sports bettors and model builders. Unlike broad sports media APIs, OddsJam is focused on odds comparison, market efficiency ana…
Overfitting in Sports Models
Overfitting in sports prediction models occurs when the model learns the noise and specific details of the training data to such an extent that it negatively impacts performance on new, unseen data. I…
Pinnacle API
Pinnacle (formerly Pinnacle Sports) is a sharp bookmaker known for offering the most efficient odds in the industry, particularly for football, American sports, and tennis. Their business model is hig…
Pinnacle CLV — Official Educational Article
Pinnacle's official betting education article on Closing Line Value (CLV) is the authoritative reference for why CLV is the gold standard metric for validating sports betting model quality. The articl…
Pinnacle No-Vig Price (NVP) — De-vigging Guide
The Pinnacle Odds Dropper's guide on de-vigging Pinnacle odds is a practical, worked-example guide to removing the vigorish from Pinnacle's sharp odds to get fair probabilities. Pinnacle is the sharpe…
Poisson Distribution
The Poisson distribution is a discrete probability distribution that models the probability of a given number of events occurring in a fixed interval of time, given a known constant mean rate and inde…
Poisson-ELO Ensemble
The Poisson-ELO ensemble combines two complementary modeling approaches — Poisson regression for goal-scoring rates and ELO rating systems for team strength — via a weighted average to produce match p…
Polymarket
Polymarket is a decentralized prediction market built on the Polygon blockchain where users trade shares in the outcomes of events. Each market has a binary outcome (e.g., "Team A wins the World Cup: …
scikit-learn Probability Calibration Documentation
scikit-learn's official documentation on probability calibration is the authoritative reference for implementing calibration plots and post-hoc calibration methods in Python. It covers: (1) the calibr…
scipy.stats.poisson — Official Documentation
The scipy.stats.poisson module is the standard Python implementation for working with Poisson distributions and is the backbone of all Poisson-based sports prediction models. It provides pmf(k, mu) fo…
Sportradar
Sportradar is a professional-grade sports data company that provides official statistics, live game data, player tracking, and betting odds to media companies, sportsbooks, leagues, and data-driven bu…
The Odds API
The Odds API is a sports betting odds aggregation API that provides real-time and historical odds from over 80 bookmakers across major sports. It is one of the most widely used data sources for sports…
Unabated CLV Calculator& Education
Unabated is a sports betting analytics platform that provides CLV calculators and educational content emphasizing CLV as the primary validation metric for betting models. Their key message: CLV is a f…
Upwork Intake: Sports Prediction MVP
This is the primary project intake document describing a World Cup sports betting prediction system commissioned via Upwork. The client wants an automated value-bet identification system combining a P…
Value Bet Identification
A value bet is a wager where the bettor's estimated probability of an outcome exceeds the bookmaker's implied probability, creating positive expected value (+EV). The identification of value bets is t…
Walk-Forward Validation
Walk-forward validation (also called walk-forward optimization or rolling forward) is the gold standard for validating sports betting models. Unlike k-fold cross-validation which randomly splits data,…
Why CLV Is the Sharps' Preferred Metric (Pinnacle Odds Dropper)
The Pinnacle Odds Dropper's article on Closing Line Value explains why professional bettors ("sharps") use CLV as their primary validation metric instead of ROI. The article's central thesis: the clos…
Why Most AI Betting Models Fail in Football
This article from PerformanceOdds analyzes why most AI/machine learning betting models fail in football prediction, with a primary focus on overfitting as the root cause. The author argues that the co…
Concepts
Bayesian Inference in Sports Prediction
2026-06-09 · "concept" · internal:kb-synthesis
Bayesian inference is a method of statistical inference where probabilities are updated as new evidence becomes available. In sports prediction, it allows modelers to start with prior beliefs about te…
Bradley–Terry Model
2026-06-09 · "concept" · internal:kb-synthesis
The Bradley–Terry model is a probability model for predicting the outcome of pairwise comparisons between items, teams, or objects. Given two items i and j with positive strength parameters p_i and p_…
Concepts Index
bayesian-inference-sports — Bayesian Inference in Sports Prediction: method of statistical inference updating probabilities as new evidence becomes available; foundational to modern sports prediction …
Dixon–Coles Correction
2026-06-09 · "concept" · internal:kb-synthesis
The Dixon–Coles model is a statistical framework for modeling association football match outcomes, introduced by Mark Dixon and Sam Coles in their 1997 paper "Modelling Association Football Scores and…
ELO Rating System
2026-06-09 · "concept" · internal:kb-synthesis
The ELO rating system is a method for calculating the relative skill levels of players in zero-sum two-player games. It was invented by Hungarian-American chess master Arpad Elo in 1959–1960, replacin…
Expected Goals (xG)
2026-06-09 · "concept" · internal:kb-synthesis
Expected goals (xG) is a statistical metric in association football that assigns a probability to each shot resulting in a goal. By summing these probabilities across a match, season, or set of shots,…
Glicko-2 Rating System
2026-06-09 · "concept" · internal:kb-synthesis
The Glicko-2 rating system was invented by Mark Glickman in 1995 as an improvement on the elo-rating-system. Its principal innovation is the introduction of a "rating deviation" (RD) — a measure of a …
Massey Ratings
2026-06-09 · "concept" · internal:kb-synthesis
Massey ratings are a method of ranking sports teams based on their performance history, invented by Kenneth Massey. Unlike elo-rating-system which updates incrementally after each game, Massey ratings…
Poisson Distribution
2026-06-09 · "concept" · internal:kb-synthesis
The Poisson distribution is a discrete probability distribution that models the probability of a given number of events occurring in a fixed interval, given a known constant mean rate (λ). It is named…