Massey Ratings

Overview

Massey ratings are a method of ranking sports teams based on their performance history, invented by Kenneth Massey. Unlike elo-rating-system which updates incrementally after each game, Massey ratings solve a system of linear equations using all available game results simultaneously to produce team ratings.

The Massey method forms the basis for many computer ranking systems and was part of the Bowl Championship Series (BCS) formula in college football. The website masseyratings.com provides computer ratings for virtually every sport.

Massey himself notes that computer ratings are designed to measure past performance, not necessarily to predict future outcomes — a critical distinction for betting applications.

Why It Matters

Massey ratings provide a simple, robust baseline for team strength that:
1. Uses all data simultaneously: Less sensitive to recent form swings than ELO
2. Margin of victory included: Uses actual score differences, not just win/loss
3. No tuning parameters: Unlike ELO's K-factor, Massey has no hyperparameters to tune
4. Easy to compute: Just one linear solve, no iteration required

For betting models, Massey ratings can serve as a team strength input but should be combined with ELO or Poisson for prediction because the linear system approach is designed for ranking, not probability estimation.

Key Formula

Game equation:

$$R_i - R_j = M_{ij}$$

Where R_i = rating of team i, R_j = rating of team j, M_ij = margin of victory.

System of equations (matrix form):

$$X \cdot r = m$$

Least squares solution:

$$r = (X^T X)^{-1} X^T m$$

Normalized constraint: Usually add Σr_i = 0 (ratings sum to zero) or fix one team's rating.

Worked Example

Three teams with results:
- Brazil beat Argentina 2-1 (margin +1)
- Brazil lost to Germany 1-3 (margin -2)
- Argentina beat Germany 2-0 (margin +2)

System:
- Brazil - Argentina = 1
- Brazil - Germany = -2
- Argentina - Germany = 2

Solving gives:
- Brazil ≈ +0.33
- Argentina ≈ -0.67
- Germany ≈ -2.67

Interpretation: Germany is the weakest of the three; Brazil is the strongest.

Code Snippet

import numpy as np

def massey_ratings(games):
    """
    games: list of (home_team, away_team, home_score, away_score)
    Returns: dict of {team: rating}
    """
    teams = sorted(set([g[0] for g in games] + [g[1] for g in games]))
    team_idx = {t: i for i, t in enumerate(teams)}
    n_teams = len(teams)
    n_games = len(games)

    X = np.zeros((n_games, n_teams))
    m = np.zeros(n_games)

    for i, (home, away, hs, as_) in enumerate(games):
        X[i, team_idx[home]] = 1
        X[i, team_idx[away]] = -1
        m[i] = hs - as_

    # Add constraint: sum of ratings = 0
    X_aug = np.vstack([X, np.ones(n_teams)])
    m_aug = np.append(m, 0)

    r, _, _, _ = np.linalg.lstsq(X_aug, m_aug, rcond=None)
    return {t: r[team_idx[t]] for t in teams}

# Example
games = [('Brazil', 'Argentina', 2, 1), ('Brazil', 'Germany', 1, 3), ('Argentina', 'Germany', 2, 0)]
ratings = massey_ratings(games)
print(ratings)
# {'Brazil': 0.333, 'Argentina': -0.667, 'Germany': -2.667}

Pitfalls

  • Designed for order, not prediction: Massey himself explicitly states the ratings measure past performance, not predictive ability.
  • Margin of victory noise: In football, goal margins are noisy (a2-1 win and a 5-0 win both count as one win). Consider clipping blowouts.
  • No uncertainty measure: Unlike glicko-2, Massey provides no confidence interval on ratings.
  • No form tracking: Ratings incorporate all historical games equally. Recent form changes are not captured.

See Also