Multi-Sport Prediction Stack¶

Overview¶

A multi-sport prediction stack is a unified modeling architecture that shares infrastructure, feature engineering, and calibration logic across football (soccer), NFL, NBA, and other sports. Rather than building separate silos, the stack uses a shared "core" (feature engineering pipeline, base model, calibration layer) with sport-specific "heads" (sport-specific feature sets, prediction heads, and calibration parameters).

The core insight is that many modeling decisions are sport-agnostic: ELO-style rating updates, Bayesian calibration, probability calibration, and EV calculation all work the same way regardless of the sport. What differs is the feature set (xG in football vs. yards per play in NFL), the outcome space (1X2 in football vs. point spreads in NFL), and the market structure.

Why It Matters¶

A unified stack matters because:
1. Reduced maintenance: One codebase for all sports rather than N separate systems
2. Cross-sport learning: Similar team quality signals in basketball may inform football predictions
3. Consistent risk management: Portfolio-level Kelly sizing across all sports
4. Calibration sharing: High-data sports (NFL) can bootstrap calibration for low-data sports (international football)

Architecture¶

Layer 1: Core Infrastructure
- Data ingestion: Unified connector that normalizes match data, odds data, and statistics into a common schema
- Feature store: Precomputed features shared across sports (team form over last N games, H2H history, rest days, home/away split)
- Model registry: Central registry of model versions with metadata

Layer 2: Sport-Specific Feature Engineering

Sport	Key Features
Football	xG, xGA, shot counts, recent form, ELO/Glicko-2 rating
NFL	DVOA, yards per play, turnover differential, EPA per play, rest days
NBA	Four factors (eFG%, TOV%, ORB%, FT Rate), pace, net rating, B2B fatigue

Layer 3: Sport-Specific Model Heads
- Football: Poisson-based goal-scoring model
- NFL: Linear regression for point spreads
- NBA: Logistic regression for moneylines

Layer 4: Shared Calibration Layer
- All sport outputs pass through isotonic regression or Platt scaling
- Calibrated probabilities enable sport-agnostic EV calculation
- Cross-sport Kelly sizing: portfolio-level risk management

Code Structure¶

class MultiSportStack:
    def __init__(self):
        self.core = FeatureStore()
        self.calibrator = Calibrator()
        self.heads = {
            "football": FootballModel(),
            "nfl": NFLModel(),
            "nba": NBAModel(),
        }

    def predict(self, sport, home_team, away_team, bookmaker_odds):
        features = self.core.build(sport, home_team, away_team)
        raw_probs = self.heads[sport].predict(features)
        calibrated = self.calibrator.calibrate(raw_probs)
        ev = self.compute_ev(calibrated, bookmaker_odds)
        return {"probabilities": calibrated, "ev": ev}

    def compute_ev(self, probabilities, odds):
        """Sport-agnostic EV calculation."""
        evs = {}
        for outcome, prob in probabilities.items():
            decimal_odds = odds[outcome]
            evs[outcome] = prob * decimal_odds - 1
        return evs

    def kelly_sizing(self, ev, bankroll, kelly_fraction=0.5):
        """Cross-sport Kelly position sizing."""
        if ev <= 0:
            return 0
        return kelly_fraction * ev * bankroll

Pitfalls¶