Glicko-2 Rating System¶
Overview¶
The Glicko-2 rating system was invented by Mark Glickman in 1995 as an improvement on the elo-rating-system. Its principal innovation is the introduction of a "rating deviation" (RD) — a measure of a player's rating accuracy. A player's RD is one standard deviation of their true strength estimate. Glicko-2 further improves by adding a rating volatility (σ) that measures the degree of expected rating fluctuation based on how erratic a player's performances are.
The RD is central to the system's value: after a game, the rating change is smaller when the player's RD is already low (rating is accurate) and when the opponent's RD is high (opponent's true rating is uncertain, so less information is gained). RD also increases over time during inactivity, reflecting growing uncertainty about a team's current strength.
Glicko-2 is used by Chess.com, Lichess, Dota 2, Counter-Strike 2, Guild Wars 2, and other competitive games. It is in the public domain.
Why It Matters¶
For sports betting models, Glicko-2's uncertainty quantification is valuable because:
1. Small-sample teams: A national team with only 3 World Cup matches has high RD — the model should weight those results less than a team with 50 matches
2. Form dynamics: A team that has been playing erratically (high σ) has ratings that change more dramatically with new results
3. Inactivity handling: International breaks create gaps where team strength changes (injuries, new coach) but no matches occur — RD grows to reflect this
Where elo-rating-system treats all games equally regardless of confidence, Glicko-2 naturally downweights uncertain ratings.
Key Formula¶
RD increase over inactivity:
$$RD' = \min(RD_{max}, \sqrt{RD_{old}^2 + c^2 \times t})$$
Where c is a constant (≈30–50), t = rating periods since last game, RD_max ≈ 350 for new/uncertain players.
Rating update:
$$r' = r + Q \times \sum_{i=1}^{m} g(RD_i) \times (s_i - E)$$
Where Q = 1/ln(10)/400, g(RD_i) = 1/√(1 + 3×Q²×RD_i²/π²), E = 1/(1 + 10^(-Q×(r-r_i)/400))
Scale conversion (Glicko to Glicko-2):
$$r_{g2} = (r - 1500) / 173.7178, \quad RD_{g2} = RD / 173.7178$$
Worked Example¶
Team A: rating=1500, RD=200, σ=0.06 (consistent player)
Team B: rating=1500, RD=350, σ=0.10 (erratic player, uncertain)
Both play against a 1600-rated opponent and win.
Team A's update (low RD, low σ):
- g(RD_opponent) ≈ 0.5 (opponent is uncertain)
- Rating change is moderate; RD decreases only slightly
Team B's update (high RD, high σ):
- g(RD_opponent) ≈ 0.9 (opponent is more certain)
- Rating change is larger; RD decreases more substantially
After 1 win: Team A=1520, RD=195; Team B=1550, RD=280
Code Snippet¶
import math
class Glicko2:
def __init__(self, rating=1500, rd=350, volatility=0.06, tau=0.5):
self.rating = rating
self.rd = rd
self.volatility = volatility
self.tau = tau
def scale(self):
mu = (self.rating - 1500) / 173.7178
phi = self.rd / 173.7178
return mu, phi
def expected_score(self, mu, mu_j):
return 1 / (1 + math.pow(10, -1 * (mu - mu_j) / math.sqrt(2 * math.pow(30, 2))))
def update(self, opponents_ratings, opponents_rds, scores):
"""Update after a rating period with m games."""
mu, phi = self.scale()
v_sum = delta_sum = 0
for r_j, rd_j, s in zip(opponents_ratings, opponents_rds, scores):
mu_j = (r_j - 1500) / 173.7178
phi_j = rd_j / 173.7178
g = 1 / math.sqrt(1 + 3 * math.pow(phi_j, 2) / (math.pow(30, 2) * math.pow(math.pi, 2)))
E = 1 / (1 + math.pow(10, -1 * (mu - mu_j) / (math.sqrt(2) * 30)))
v_sum += math.pow(g, 2) * E * (1 - E)
delta_sum += g * (s - E)
v = 1 / v_sum
delta = v * delta_sum
phi_star = math.sqrt(math.pow(phi, 2) + math.pow(self.volatility, 2))
phi_new = 1 / math.sqrt(1 / math.pow(phi_star, 2) + 1 / v)
mu_new = mu + math.pow(phi_new, 2) * delta_sum
self.rating = 173.7178 * mu_new + 1500
self.rd = 173.7178 * phi_new
return self.rating, self.rd
# Example
player = Glicko2(rating=1500, rd=200, volatility=0.06)
new_rating, new_rd = player.update([1600], [200], [1.0])
print(f"New rating: {new_rating:.1f}, New RD: {new_rd:.1f}")
Pitfalls¶
- Volatility estimation is complex: The iterative algorithm for finding σ is the most involved part of the implementation. Use established libraries (python-glicko2) for production.
- RD initialization matters: New players start with RD=350 (high uncertainty). Using smaller initial RD can cause overconfidence.
- Rating period concept: Glicko-2 updates in "periods" — multiple games within the same period are processed together. This requires a batch-processing approach that differs from ELO's game-by-game updates.
- Scale confusion: Glicko uses a different scale from ELO (173.7178 factor). Don't mix ratings between systems without converting.
See Also¶
- elo-rating-system — the base system Glicko-2 improves upon
- bradley-terry-model — theoretical foundation of both ELO and Glicko
- bayesian-inference-sports — Glicko-2's Bayesian update framework