Overfitting in Sports Models¶

Overview¶

Overfitting in sports prediction models occurs when the model learns the noise and specific details of the training data to such an extent that it negatively impacts performance on new, unseen data. In sports betting, overfitting is particularly dangerous because: (a) historical data has low signal-to-noise ratio (goals are inherently random), (b) sample sizes are small (64 World Cup matches per tournament), and (c) the market is adversarial — any identified pattern will be arbitraged away.

Common overfitting patterns: too many features relative to training samples, fitting to specific tournament conditions, using look-ahead bias, excessive model complexity, and hyperparameter tuning on the test set.

Why It Matters¶

Overfitting is the primary reason most sports betting models don't work in practice because:
1. Football has high noise: Goals are rare events with significant randomness. A model that fits noise will fail on new data.
2. Small tournaments: World Cup sample (64 matches) is tiny for statistical purposes — complex models will overfit.
3. Adversarial market: If a pattern exists in historical data, bookmakers will have already priced it. Finding it in backtest but not in production is classic overfitting.
4. Look-ahead bias is subtle: It's easy to accidentally use future information in training features.

Prevention Strategies¶

Sample-to-parameter ratio:
- Rule of thumb: number of free parameters should be < N/20 where N is training sample size
- For ~2000 international matches: < 100 parameters. Poisson with attack/defense for 50 teams ≈ 100 parameters (right at the limit)

Regularization:

from sklearn.linear_model import Ridge
def regularized_poisson_regression(X, y, alpha=1.0):
    log_y = np.log(y + 0.1)
    model = Ridge(alpha=alpha)
    model.fit(X, log_y)
    return model

Feature importance stability:

def feature_stability(model, X, y, n_bootstrap=100):
    importances = []
    for _ in range(n_bootstrap):
        idx = np.random.choice(len(X), len(X), replace=True)
        model.fit(X[idx], y[idx])
        importances.append(model.feature_importances_)
    cv = np.array(importances).std(axis=0) / (np.array(importances).mean(axis=0) + 1e-10)
    return cv # high CV = unstable/overfit feature

Common Overfitting Patterns¶

Too many ELO/K-factors: Fitting individual K-factors per team from limited data
Dixon-Coles rho overfitting: Estimating the correlation parameter from small samples
xG model with too many features: Using 50+ shot features when 5 would suffice
Rolling window too short: Adapting too quickly to recent form
Cross-validation on temporal data: Random k-fold splitting introduces look-ahead bias
Hyperparameter tuning on test set: Selecting the model that performs best on the test period

Pitfalls¶