Glicko-2 Rating System

Overview

The Glicko-2 rating system was invented by Mark Glickman in 1995 as an improvement on the elo-rating-system. Its principal innovation is the introduction of a "rating deviation" (RD) — a measure of a player's rating accuracy. A player's RD is one standard deviation of their true strength estimate. Glicko-2 further improves by adding a rating volatility (σ) that measures the degree of expected rating fluctuation based on how erratic a player's performances are.

The RD is central to the system's value: after a game, the rating change is smaller when the player's RD is already low (rating is accurate) and when the opponent's RD is high (opponent's true rating is uncertain, so less information is gained). RD also increases over time during inactivity, reflecting growing uncertainty about a team's current strength.

Glicko-2 is used by Chess.com, Lichess, Dota 2, Counter-Strike 2, Guild Wars 2, and other competitive games. It is in the public domain.

Why It Matters

For sports betting models, Glicko-2's uncertainty quantification is valuable because:
1. Small-sample teams: A national team with only 3 World Cup matches has high RD — the model should weight those results less than a team with 50 matches
2. Form dynamics: A team that has been playing erratically (high σ) has ratings that change more dramatically with new results
3. Inactivity handling: International breaks create gaps where team strength changes (injuries, new coach) but no matches occur — RD grows to reflect this

Where elo-rating-system treats all games equally regardless of confidence, Glicko-2 naturally downweights uncertain ratings.

Key Formula

RD increase over inactivity:

$$RD' = \min(RD_{max}, \sqrt{RD_{old}^2 + c^2 \times t})$$

Where c is a constant (≈30–50), t = rating periods since last game, RD_max ≈ 350 for new/uncertain players.

Rating update:

$$r' = r + Q \times \sum_{i=1}^{m} g(RD_i) \times (s_i - E)$$

Where Q = 1/ln(10)/400, g(RD_i) = 1/√(1 + 3×Q²×RD_i²/π²), E = 1/(1 + 10^(-Q×(r-r_i)/400))

Scale conversion (Glicko to Glicko-2):

$$r_{g2} = (r - 1500) / 173.7178, \quad RD_{g2} = RD / 173.7178$$

Worked Example

Team A: rating=1500, RD=200, σ=0.06 (consistent player)
Team B: rating=1500, RD=350, σ=0.10 (erratic player, uncertain)

Both play against a 1600-rated opponent and win.

Team A's update (low RD, low σ):
- g(RD_opponent) ≈ 0.5 (opponent is uncertain)
- Rating change is moderate; RD decreases only slightly

Team B's update (high RD, high σ):
- g(RD_opponent) ≈ 0.9 (opponent is more certain)
- Rating change is larger; RD decreases more substantially

After 1 win: Team A=1520, RD=195; Team B=1550, RD=280

Code Snippet

import math

class Glicko2:
    def __init__(self, rating=1500, rd=350, volatility=0.06, tau=0.5):
        self.rating = rating
        self.rd = rd
        self.volatility = volatility
        self.tau = tau

    def scale(self):
        mu = (self.rating - 1500) / 173.7178
        phi = self.rd / 173.7178
        return mu, phi

    def expected_score(self, mu, mu_j):
        return 1 / (1 + math.pow(10, -1 * (mu - mu_j) / math.sqrt(2 * math.pow(30, 2))))

    def update(self, opponents_ratings, opponents_rds, scores):
        """Update after a rating period with m games."""
        mu, phi = self.scale()
        v_sum = delta_sum = 0
        for r_j, rd_j, s in zip(opponents_ratings, opponents_rds, scores):
            mu_j = (r_j - 1500) / 173.7178
            phi_j = rd_j / 173.7178
            g = 1 / math.sqrt(1 + 3 * math.pow(phi_j, 2) / (math.pow(30, 2) * math.pow(math.pi, 2)))
            E = 1 / (1 + math.pow(10, -1 * (mu - mu_j) / (math.sqrt(2) * 30)))
            v_sum += math.pow(g, 2) * E * (1 - E)
            delta_sum += g * (s - E)
        v = 1 / v_sum
        delta = v * delta_sum
        phi_star = math.sqrt(math.pow(phi, 2) + math.pow(self.volatility, 2))
        phi_new = 1 / math.sqrt(1 / math.pow(phi_star, 2) + 1 / v)
        mu_new = mu + math.pow(phi_new, 2) * delta_sum
        self.rating = 173.7178 * mu_new + 1500
        self.rd = 173.7178 * phi_new
        return self.rating, self.rd

# Example
player = Glicko2(rating=1500, rd=200, volatility=0.06)
new_rating, new_rd = player.update([1600], [200], [1.0])
print(f"New rating: {new_rating:.1f}, New RD: {new_rd:.1f}")

Pitfalls

  • Volatility estimation is complex: The iterative algorithm for finding σ is the most involved part of the implementation. Use established libraries (python-glicko2) for production.
  • RD initialization matters: New players start with RD=350 (high uncertainty). Using smaller initial RD can cause overconfidence.
  • Rating period concept: Glicko-2 updates in "periods" — multiple games within the same period are processed together. This requires a batch-processing approach that differs from ELO's game-by-game updates.
  • Scale confusion: Glicko uses a different scale from ELO (173.7178 factor). Don't mix ratings between systems without converting.

See Also