Football-Data.co.uk¶
Overview¶
Football-Data.co.uk is a free resource providing computer-ready historical football results, match statistics, and betting odds data for 22+ European football divisions across 25+ seasons, dating back to 1993/94. It is the gold standard free dataset for building and backtesting football prediction models. Data is available as CSV files organized by league and season, downloadable directly from the website.
All data is free — no API key, no registration, no rate limits. The dataset is widely cited in academic and hobbyist sports modeling work. The main limitation is that data is historical and batch-download only — there is no real-time or API access.
Data Coverage¶
| Data Type | Coverage | Historical Depth |
|---|---|---|
| Match results (FT/HT) | 22+ divisions | From 1993/94 (~31 seasons) |
| Match statistics | Major leagues | From2000/01 (~24 seasons) |
| Betting odds | 10+ bookmakers | From 2000/01 (~24 seasons) |
| Closing odds | Pinnacle + market avg | From 2005/06 (~19 seasons) |
| Asian handicap odds | Major leagues | From 2005/06 |
Leagues covered: Premier League, Championship, League 1/2, Scottish Premiership, Bundesliga 1/2, La Liga 1/2, Serie A/B, Ligue 1/2, Eredivisie, Primeira Liga, Jupiler League, and more.
File Format¶
CSV files follow a consistent naming convention: E0.csv (Premier League), E1.csv (Championship), D1.csv (Bundesliga), etc.
Results file columns: Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR
Odds file columns: Date, HomeTeam, AwayTeam, B365H, B365D, B365A (Bet365 odds)
Closing odds columns have 'C' suffix: B365C, B365DC, B365AC
Pinnacle closing odds: PC columns
Python Integration¶
import pandas as pd
def load_football_data(seasons, league='E0'):
"""Load Premier League results and closing odds."""
dfs = []
for s in seasons:
url = f"https://www.football-data.co.uk/england/{s}{league}.csv"
df = pd.read_csv(url)
df["season"] = s
dfs.append(df)
return pd.concat(dfs, ignore_index=True)
# Load multiple seasons
seasons = ["2425", "2324", "2223"]
df = load_football_data(seasons, 'E0')
# Closing odds
close_cols = [c for c in df.columns if c.endswith("C")]
print(f"Closing odds columns: {close_cols}")
# ['B365C', 'B365DC', 'B365AC', 'BCC', 'BCD', 'BCA', 'PC', ...]
Notes¶
- The closing odds (columns with
Csuffix) are the most valuable dataset for backtesting — they represent the market consensus just before kickoff - Pinnacle closing odds (PC columns) are the best benchmark for CLV validation
- Note: As of mid-2025, Pinnacle's public API has become unreliable. Use Bet365 or market average closing odds as a proxy for sharp closing lines
- No shot-level xG data — match stats (shots, shots on target) can approximate xG at team level
- Data is updated weekly during the season — not real-time
- No API: download files manually or use a scraping script