Backtesting Framework¶

Overview¶

A backtesting framework systematically evaluates a sports betting model by simulating how it would have performed on historical data. The framework must mimic real deployment conditions: using only information available at prediction time, tracking realistic transaction costs (vig), computing proper Kelly stakes, and measuring not just ROI but also CLV, calibration, and risk metrics.

The client's spec requires walk-forward validation across 2010, 2014, 2018, and 2022 World Cups — training on all prior data before each tournament and generating predictions before each match.

Why It Matters¶

A proper backtesting framework is critical because:
1. Without it, you can't know if the model works: Backtesting is the only way to estimate real-world performance before risking actual money.
2. It prevents look-ahead bias: A proper framework only uses data available at prediction time.
3. It measures what matters: ROI alone is insufficient — CLV, calibration, and risk metrics are equally important.
4. It detects overfitting: Performance that looks good in-sample but degrades out-of-sample indicates overfitting.

Key Metrics¶

Metric	Formula	Target
ROI on Staked	total_pnl / total_staked	> 0% (positive edge)
Hit Rate	wins / total_bets	Varies by odds
CLV Mean	mean((bet_odds − close_odds) / close_odds)	> 0%
Brier Score	mean((pred_prob − result)²)	< 0.20
Max Drawdown	max(peak − trough) / peak	< 30%
Sharpe Ratio	mean(pnl/stake) / std(pnl/stake) × √252	> 1.0

Architecture¶

class SportsBettingBacktester:
    def __init__(self, bankroll=10000, kelly_fraction=0.5, min_edge=0.03):
        self.bankroll = bankroll
        self.kelly_fraction = kelly_fraction
        self.min_edge = min_edge
        self.history = []

    def run_tournament(self, model, fixtures_df, closing_odds_df):
        results = []
        for _, match in fixtures_df.iterrows():
            pred = model.predict(match)
            bookie_odds = self.get_odds(match)
            fair_probs = devig_multiplicative(bookie_odds)
            evs = {o: pred[f'{o}_prob'] * bookie_odds[o] - 1 for o in ['home', 'draw', 'away']}
            best_bet = max(evs, key=evs.get)
            if evs[best_bet] < self.min_edge:
                continue
            b = bookie_odds[best_bet] - 1
            p = pred[f'{best_bet}_prob']
            f_full = max((p * b - (1-p)) / b, 0)
            stake = self.bankroll * f_full * self.kelly_fraction
            self.history.append({'match': match['id'], 'bet_on': best_bet, 'odds': bookie_odds[best_bet],
 'model_prob': p, 'stake': stake, 'close_odds': closing_odds_df.loc[match['id'], best_bet]})
            result = self.settle_bet(match, best_bet)
            self.update_bankroll(result, stake, bookie_odds[best_bet])
        return self.compute_metrics()

    def compute_metrics(self):
        df = pd.DataFrame(self.history)
        return {
            'n_bets': len(df),
            'roi_on_staked': df['pnl'].sum() / df['stake'].sum() if 'pnl' in df.columns else None,
            'clv_mean': ((df['odds'] - df['close_odds']) / df['close_odds']).mean(),
            'brier_score': ((df['model_prob'] - df['result'])**2).mean()
        }

Pitfalls¶

CLV tracking is essential: Even profitable strategies can be worse than betting the closing line. CLV validates genuine predictive edge.
Kelly staking is path-dependent: Early losses reduce later stake sizes — backtest results are realistic but not directly comparable across strategies.
Transaction costs: Always account for vig — use de-vigged odds for decision-making, actual odds for P&L.
Sample size: 64 matches per World Cup is small. Aggregate across 4 tournaments for reliable metrics.

Backtesting Framework¶

Overview¶

Why It Matters¶

Key Metrics¶

Architecture¶

Pitfalls¶

See Also¶