Upwork Intake: Sports Prediction MVP

Summary

This is the primary project intake document describing a World Cup sports betting prediction system commissioned via Upwork. The client wants an automated value-bet identification system combining a Poisson expected-goals model with live odds feeds to surface positive EV opportunities, delivered via Telegram with a Claude API natural-language interpretation layer.

The system is designed for FIFA World Cup (with future NFL/NBA expansion). Key features include: Poisson xG prediction model, The Odds API integration, Polymarket integration, walk-forward backtesting across 2010–2022 World Cups, Half-Kelly staking calculator, and a daily automated pipeline running on macOS/Mac Mini.

The intake demonstrates strong domain knowledge from the client — specifically mentioning Dixon-Coles correction, CLV (Closing Line Value), and walk-forward validation. This signals a technically sophisticated client who tests for real understanding.

Key Concepts

  • Poisson Expected Goals Model: Core prediction engine, trained on historical international match CSVs; outputs win/draw/loss probabilities and full scoreline matrices
  • Dixon-Coles Correction: Adjusts Poisson for low-scoring football patterns; specifically corrects 0-0, 1-0, 0-1, 1-1 scorelines which are systematically under-predicted by independent Poisson
  • Walk-Forward Validation: Chronological train/test split (not k-fold CV) — train on one period, test on next; client specifically tests for this
  • Closing Line Value (CLV): Difference between bet odds and closing odds before game start; key metric for model quality validation
  • Half-Kelly Staking: Recommended stake sizes from bankroll; full Kelly can be aggressive; half-Kelly is standard practice for managing volatility
  • De-vigging: Removing bookmaker overround from odds to get "fair" implied probabilities for EV calculation

Data Sources Listed

Source Type Cost
github.com/jfjelstul/worldcup Historical World Cup CSV Free
github.com/martj42/international_results International match CSV Free
The Odds API Live bookmaker odds Free tier / $100+/mo
API-Football Fixtures, statistics ~$30/mo
Polymarket Prediction market prices Free (public API)

Technical Stack

Python 3.11+ with: pandas, numpy, scipy, scikit-learn, XGBoost, requests, python-dotenv, schedule, matplotlib, Anthropic Claude API, Telegram Bot API.

Compliance Note

The intake flags significant legal exposure around sports betting predictions. Jurisdictions (US, UK, Australia, EU) have varying regulations. Framing as "informational data analysis" rather than "betting tips" reduces but doesn't eliminate risk. Personal use only without legal review.

Notes

The intake is comprehensive and well-structured. It explicitly identifies the recommended extension stack: Poisson + Dixon-Coles correction → ELO ratings with home/away splits → xG model → ensemble. ELO has very high ROI (low implementation cost, meaningful accuracy gain). The client demonstrates knowledge of concepts that many developers would not, suggesting serious intent.