Multi-Sport Prediction Stack¶
Overview¶
A multi-sport prediction stack is a unified modeling architecture that shares infrastructure, feature engineering, and calibration logic across football (soccer), NFL, NBA, and other sports. Rather than building separate silos, the stack uses a shared "core" (feature engineering pipeline, base model, calibration layer) with sport-specific "heads" (sport-specific feature sets, prediction heads, and calibration parameters).
The core insight is that many modeling decisions are sport-agnostic: ELO-style rating updates, Bayesian calibration, probability calibration, and EV calculation all work the same way regardless of the sport. What differs is the feature set (xG in football vs. yards per play in NFL), the outcome space (1X2 in football vs. point spreads in NFL), and the market structure.
Why It Matters¶
A unified stack matters because:
1. Reduced maintenance: One codebase for all sports rather than N separate systems
2. Cross-sport learning: Similar team quality signals in basketball may inform football predictions
3. Consistent risk management: Portfolio-level Kelly sizing across all sports
4. Calibration sharing: High-data sports (NFL) can bootstrap calibration for low-data sports (international football)
Architecture¶
Layer 1: Core Infrastructure
- Data ingestion: Unified connector that normalizes match data, odds data, and statistics into a common schema
- Feature store: Precomputed features shared across sports (team form over last N games, H2H history, rest days, home/away split)
- Model registry: Central registry of model versions with metadata
Layer 2: Sport-Specific Feature Engineering
| Sport | Key Features |
|---|---|
| Football | xG, xGA, shot counts, recent form, ELO/Glicko-2 rating |
| NFL | DVOA, yards per play, turnover differential, EPA per play, rest days |
| NBA | Four factors (eFG%, TOV%, ORB%, FT Rate), pace, net rating, B2B fatigue |
Layer 3: Sport-Specific Model Heads
- Football: Poisson-based goal-scoring model
- NFL: Linear regression for point spreads
- NBA: Logistic regression for moneylines
Layer 4: Shared Calibration Layer
- All sport outputs pass through isotonic regression or Platt scaling
- Calibrated probabilities enable sport-agnostic EV calculation
- Cross-sport Kelly sizing: portfolio-level risk management
Code Structure¶
class MultiSportStack:
def __init__(self):
self.core = FeatureStore()
self.calibrator = Calibrator()
self.heads = {
"football": FootballModel(),
"nfl": NFLModel(),
"nba": NBAModel(),
}
def predict(self, sport, home_team, away_team, bookmaker_odds):
features = self.core.build(sport, home_team, away_team)
raw_probs = self.heads[sport].predict(features)
calibrated = self.calibrator.calibrate(raw_probs)
ev = self.compute_ev(calibrated, bookmaker_odds)
return {"probabilities": calibrated, "ev": ev}
def compute_ev(self, probabilities, odds):
"""Sport-agnostic EV calculation."""
evs = {}
for outcome, prob in probabilities.items():
decimal_odds = odds[outcome]
evs[outcome] = prob * decimal_odds - 1
return evs
def kelly_sizing(self, ev, bankroll, kelly_fraction=0.5):
"""Cross-sport Kelly position sizing."""
if ev <= 0:
return 0
return kelly_fraction * ev * bankroll
Pitfalls¶
- Sport-specific heads over-sharing parameters: If heads share too many parameters, overfitting occurs. Keep feature and model layers separate.
- Cross-sport information leakage: Football model should not train on NFL data — only the calibration layer can use cross-sport data for regularization.
- Market-specific odds normalization: American, decimal, and implied probability odds must be normalized before calibration.
- Small data sports: International tournaments with few matches should borrow calibration parameters from high-data sports.
See Also¶
- poisson-distribution — football model head
- elo-rating-system — shared feature across sports
- bayesian-inference-sports — Bayesian calibration layer
- expected-value-ev — sport-agnostic EV calculation