Football-Data.co.uk

Overview

Football-Data.co.uk is a free resource providing computer-ready historical football results, match statistics, and betting odds data for 22+ European football divisions across 25+ seasons, dating back to 1993/94. It is the gold standard free dataset for building and backtesting football prediction models. Data is available as CSV files organized by league and season, downloadable directly from the website.

All data is free — no API key, no registration, no rate limits. The dataset is widely cited in academic and hobbyist sports modeling work. The main limitation is that data is historical and batch-download only — there is no real-time or API access.

Data Coverage

Data Type Coverage Historical Depth
Match results (FT/HT) 22+ divisions From 1993/94 (~31 seasons)
Match statistics Major leagues From2000/01 (~24 seasons)
Betting odds 10+ bookmakers From 2000/01 (~24 seasons)
Closing odds Pinnacle + market avg From 2005/06 (~19 seasons)
Asian handicap odds Major leagues From 2005/06

Leagues covered: Premier League, Championship, League 1/2, Scottish Premiership, Bundesliga 1/2, La Liga 1/2, Serie A/B, Ligue 1/2, Eredivisie, Primeira Liga, Jupiler League, and more.

File Format

CSV files follow a consistent naming convention: E0.csv (Premier League), E1.csv (Championship), D1.csv (Bundesliga), etc.

Results file columns: Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR

Odds file columns: Date, HomeTeam, AwayTeam, B365H, B365D, B365A (Bet365 odds)
Closing odds columns have 'C' suffix: B365C, B365DC, B365AC
Pinnacle closing odds: PC columns

Python Integration

import pandas as pd

def load_football_data(seasons, league='E0'):
    """Load Premier League results and closing odds."""
    dfs = []
    for s in seasons:
        url = f"https://www.football-data.co.uk/england/{s}{league}.csv"
        df = pd.read_csv(url)
        df["season"] = s
        dfs.append(df)
    return pd.concat(dfs, ignore_index=True)

# Load multiple seasons
seasons = ["2425", "2324", "2223"]
df = load_football_data(seasons, 'E0')

# Closing odds
close_cols = [c for c in df.columns if c.endswith("C")]
print(f"Closing odds columns: {close_cols}")
# ['B365C', 'B365DC', 'B365AC', 'BCC', 'BCD', 'BCA', 'PC', ...]

Notes

  • The closing odds (columns with C suffix) are the most valuable dataset for backtesting — they represent the market consensus just before kickoff
  • Pinnacle closing odds (PC columns) are the best benchmark for CLV validation
  • Note: As of mid-2025, Pinnacle's public API has become unreliable. Use Bet365 or market average closing odds as a proxy for sharp closing lines
  • No shot-level xG data — match stats (shots, shots on target) can approximate xG at team level
  • Data is updated weekly during the season — not real-time
  • No API: download files manually or use a scraping script