Forecasting: Principles and Practice — Time Series Cross-Validation

Summary

"Forecasting: Principles and Practice" (3rd edition, Hyndman & Athanasopoulos) is the authoritative open-source textbook on forecasting, and its chapter on time series cross-validation is the definitive reference for understanding why walk-forward validation is the correct method for temporal data. The chapter explains that standard k-fold cross-validation is inappropriate for time series because it randomly mixes past and future observations, creating look-ahead bias.

The chapter covers: (1) the correct way to do cross-validation for time series, (2) the difference between expanding and rolling windows, (3) evaluation metrics averaged across folds, and (4) the purged cross-validation concept (gap between train and test to prevent information leakage).

Key Concepts

  • Why k-fold fails for time series: Randomly splitting temporal data means the model trains on future data and tests on past data — the opposite of real deployment
  • Expanding window: Training set grows over time (all history to T1, all history to T2, etc.). Most common for stable systems.
  • Rolling window: Fixed-size training set slides forward. Better when underlying process changes over time (non-stationarity).
  • tscv (time series CV): The correct cross-validation for time series — respects temporal order
  • Stretch tsibble: The computational tool in R's tidyverts for implementing walk-forward validation
  • Evaluation: Metrics are averaged across all folds — the mean out-of-sample performance is the key metric
  • Purged cross-validation: Adds a buffer/gap between training and test sets to prevent leakage from near-future data influencing training features

Walk-Forward Algorithm

For each fold:
  1. Train on all data up to time t
  2. Forecast h steps ahead (h = horizon)
  3. Compare forecast to actual outcomes
  4. Record metric
  5. Move forward to t+1

Metrics = mean(metric across all folds)

Formula for Expanding vs. Rolling

Expanding window:
$$Training_t = {y_1, y_2, ..., y_t}$$
$$Test_t = {y_{t+1}, y_{t+2}, ..., y_{t+h}}$$

Rolling window:
$$Training_t = {y_{t-m}, y_{t-m+1}, ..., y_t}$$
$$Test_t = {y_{t+1}, ..., y_{t+h}}$$

Where m = window size.

Notes

  • This is the canonical textbook reference for time series cross-validation — the existing walk-forward-validation.md note covers sports betting specifically, but this source provides the theoretical foundation
  • Key insight for sports betting: the textbook explicitly warns against k-fold cross-validation for temporal prediction problems — this is the mathematical justification for the client's walk-forward requirement
  • The expanding window approach is correct for World Cup modeling: team strength accumulates over years and older data is still informative
  • The purged cross-validation concept (adding a gap between train and test) is relevant for betting models where near-term data leakage is a concern
  • The textbook's evaluation framework (mean metric across all folds with standard deviation) is exactly what the World Cup model's backtesting framework should report
  • For the World Cup model: each tournament is one fold, and the expanding window means training on all previous World Cups plus inter-tournament friendlies/qualifiers