Backtesting Framework¶
Overview¶
A backtesting framework systematically evaluates a sports betting model by simulating how it would have performed on historical data. The framework must mimic real deployment conditions: using only information available at prediction time, tracking realistic transaction costs (vig), computing proper Kelly stakes, and measuring not just ROI but also CLV, calibration, and risk metrics.
The client's spec requires walk-forward validation across 2010, 2014, 2018, and 2022 World Cups — training on all prior data before each tournament and generating predictions before each match.
Why It Matters¶
A proper backtesting framework is critical because:
1. Without it, you can't know if the model works: Backtesting is the only way to estimate real-world performance before risking actual money.
2. It prevents look-ahead bias: A proper framework only uses data available at prediction time.
3. It measures what matters: ROI alone is insufficient — CLV, calibration, and risk metrics are equally important.
4. It detects overfitting: Performance that looks good in-sample but degrades out-of-sample indicates overfitting.
Key Metrics¶
| Metric | Formula | Target |
|---|---|---|
| ROI on Staked | total_pnl / total_staked | > 0% (positive edge) |
| Hit Rate | wins / total_bets | Varies by odds |
| CLV Mean | mean((bet_odds − close_odds) / close_odds) | > 0% |
| Brier Score | mean((pred_prob − result)²) | < 0.20 |
| Max Drawdown | max(peak − trough) / peak | < 30% |
| Sharpe Ratio | mean(pnl/stake) / std(pnl/stake) × √252 | > 1.0 |
Architecture¶
class SportsBettingBacktester:
def __init__(self, bankroll=10000, kelly_fraction=0.5, min_edge=0.03):
self.bankroll = bankroll
self.kelly_fraction = kelly_fraction
self.min_edge = min_edge
self.history = []
def run_tournament(self, model, fixtures_df, closing_odds_df):
results = []
for _, match in fixtures_df.iterrows():
pred = model.predict(match)
bookie_odds = self.get_odds(match)
fair_probs = devig_multiplicative(bookie_odds)
evs = {o: pred[f'{o}_prob'] * bookie_odds[o] - 1 for o in ['home', 'draw', 'away']}
best_bet = max(evs, key=evs.get)
if evs[best_bet] < self.min_edge:
continue
b = bookie_odds[best_bet] - 1
p = pred[f'{best_bet}_prob']
f_full = max((p * b - (1-p)) / b, 0)
stake = self.bankroll * f_full * self.kelly_fraction
self.history.append({'match': match['id'], 'bet_on': best_bet, 'odds': bookie_odds[best_bet],
'model_prob': p, 'stake': stake, 'close_odds': closing_odds_df.loc[match['id'], best_bet]})
result = self.settle_bet(match, best_bet)
self.update_bankroll(result, stake, bookie_odds[best_bet])
return self.compute_metrics()
def compute_metrics(self):
df = pd.DataFrame(self.history)
return {
'n_bets': len(df),
'roi_on_staked': df['pnl'].sum() / df['stake'].sum() if 'pnl' in df.columns else None,
'clv_mean': ((df['odds'] - df['close_odds']) / df['close_odds']).mean(),
'brier_score': ((df['model_prob'] - df['result'])**2).mean()
}
Pitfalls¶
- CLV tracking is essential: Even profitable strategies can be worse than betting the closing line. CLV validates genuine predictive edge.
- Kelly staking is path-dependent: Early losses reduce later stake sizes — backtest results are realistic but not directly comparable across strategies.
- Transaction costs: Always account for vig — use de-vigged odds for decision-making, actual odds for P&L.
- Sample size: 64 matches per World Cup is small. Aggregate across 4 tournaments for reliable metrics.
See Also¶
- walk-forward-validation — core validation methodology
- closing-line-value — CLV is the primary backtest metric
- brier-score — probability calibration metric
- kelly-criterion — staking strategy to simulate
- overfitting-sports-models — backtesting detects overfitting