Walk-Forward Validation¶
Overview¶
Walk-forward validation (also called walk-forward optimization or rolling forward) is the gold standard for validating sports betting models. Unlike k-fold cross-validation which randomly splits data, walk-forward validation respects temporal order: the model is trained on historical data, then tested on future data that was not available at training time.
The process: train on period 1 → test on period 2 → expand training window to include period 2 → test on period 3 → repeat. This mimics real deployment where today's model is built on all historical data and used to predict tomorrow's games.
Walk-forward validation prevents look-ahead bias (using future information to make predictions) and gives an honest estimate of how the model would have performed in real-time.
Why It Matters¶
Walk-forward validation is critical because:
1. Mimics real deployment: The model only sees past data when predicting future matches — exactly what happens in production.
2. Prevents look-ahead bias: Random cross-validation would use future information, giving misleadingly good results.
3. Detects overfitting: If performance degrades in later walk-forward periods, the model is overfitting to historical patterns.
4. Required by the spec: The client's spec explicitly requires walk-forward validation across 2010, 2014, 2018, and 2022 World Cups.
Key Concepts¶
- Expanding window: Training set grows over time (all history to T1, all history to T2, etc.). Most common for betting models.
- Rolling window: Training set stays fixed size (last N games), slides forward. Better when team characteristics change significantly.
- Purged cross-validation: Removes a buffer zone between training and test sets to prevent information leakage.
- Look-ahead bias: Using information that wouldn't have been available at prediction time. Walk-forward prevents this by design.
Process¶
Period 1 (2010 WC) → train on 2006-2010 → test on 2010 WC
Period 2 (2014 WC) → train on 2006-2014 → test on 2014 WC
Period 3 (2018 WC) → train on 2006-2018 → test on 2018 WC
Period 4 (2022 WC) → train on 2006-2022 → test on 2022 WC
For each period:
1. Train model on all data up to tournament start
2. Generate predictions for all matches
3. Compare to actual outcomes
4. Record metrics: ROI, CLV, Brier score, hit rate
Code Snippet¶
import pandas as pd
import numpy as np
def walk_forward_validate(df, train_end, test_start, test_end, features, target, model_class, params={}):
"""Single walk-forward validation fold."""
train_df = df[(df['date'] >= df['date'].min())& (df['date'] < test_start)]
test_df = df[(df['date'] >= test_start) & (df['date'] <= test_end)]
if len(train_df) < 50 or len(test_df) < 5:
return None
X_train, y_train = train_df[features], train_df[target]
X_test = test_df[features]
model = model_class(**params)
model.fit(X_train, y_train)
predictions = test_df.copy()
predictions['pred_prob'] = model.predict_proba(X_test)[:, 1]
return predictions
def run_walk_forward(df, tournaments, features, target, model_class, params={}):
"""Run walk-forward across multiple tournaments."""
all_results = []
for train_end, test_start, test_end in tournaments:
result = walk_forward_validate(df, train_end, test_start, test_end, features, target, model_class, params)
if result is not None:
all_results.append(result)
combined = pd.concat(all_results)
return {
'total_bets': len(combined),
'brier_score': ((combined['pred_prob'] - combined['outcome'])**2).mean(),
'clv_mean': combined['clv'].mean() if 'clv' in combined.columns else None,
}
world_cup_tournaments = [
('2010-06-10', '2010-06-11', '2010-07-11'),
('2014-06-12', '2014-06-13', '2014-07-13'),
('2018-06-14', '2018-06-15', '2018-07-15'),
('2022-11-20', '2022-11-21', '2022-12-18'),
]
Pitfalls¶
- Small per-fold samples: 64 World Cup matches per tournament is a small test set. Aggregate across all 4 tournaments for statistical power.
- Expanding window may accumulate stale data: Team strength signals from 2006 may not apply to 2022. Consider decay weighting.
- Non-stationarity: Football team strength changes over time. A model that works in 2010 may not work in 2022.
- The key metric is CLV: Does the model beat the closing line consistently across all four World Cups?
See Also¶
- backtesting-framework — walk-forward validation is a core component
- overfitting-sports-models — walk-forward detects overfitting
- brier-score — primary validation metric per fold
- closing-line-value — CLV is the key walk-forward metric