scipy.stats.poisson — Official Documentation

Summary

The scipy.stats.poisson module is the standard Python implementation for working with Poisson distributions and is the backbone of all Poisson-based sports prediction models. It provides pmf(k, mu) for the probability mass function and cdf(k, mu) for the cumulative distribution function, along with sampling, moments, and interval estimation.

This documentation is the authoritative reference for how to correctly use Poisson distributions in Python. Key notes: the mu parameter is the shape parameter (λ = mean = variance), and pmf(k, mu) = exp(-μ) × μ^k / k!. For shifted distributions, use the loc parameter.

Key Functions

  • poisson.pmf(k, mu): P(X = k) for Poisson(μ). Core function for computing goal probabilities.
  • poisson.cdf(k, mu): P(X ≤ k) — useful for cumulative probability calculations
  • poisson.sf(k, mu): P(X > k) = 1 - cdf — survival function
  • poisson.rvs(mu, size): Random variate generation for simulation
  • poisson.interval(alpha, mu): Confidence interval for the Poisson mean
  • poisson.mean(mu), poisson.var(mu), poisson.std(mu): Moments (mean = var = μ by definition)

Key Concepts

  • Shape parameter mu: Both the mean and variance of the distribution. For football: expected goals (xG) for a team in a match.
  • loc parameter: Allows shifting the distribution. poisson.pmf(k, mu, loc) is equivalent to poisson.pmf(k - loc, mu). Useful for cases where the minimum goal count is shifted.
  • Numerical stability: For large μ (e.g., λ > 100), consider using normal approximation. SciPy handles most cases well.
  • Broadcasting: pmf accepts array inputs for k, returning an array of probabilities — ideal for vectorized scoreline matrix computation.

Code Examples

from scipy.stats import poisson
import numpy as np

# Single goal probability
p_one_goal = poisson.pmf(1, mu=1.5)  # P(X=1) for λ=1.5

# Full distribution
mu = 1.5
goals = np.arange(0, 8)
probs = poisson.pmf(goals, mu)
# [e^-1.5, e^-1.5*1.5/1, e^-1.5*1.5^2/2!, ...]

# Scoreline matrix (vectorized)
def scoreline_matrix(lambda_home, lambda_away, max_goals=6):
    h = np.arange(max_goals + 1)[:, None]  # column
    a = np.arange(max_goals + 1)[None, :] # row
    return poisson.pmf(h, lambda_home) * poisson.pmf(a, lambda_away)

# Cumulative probability (P(X >= 1))
p_at_least_one = poisson.sf(0, mu=1.5)  # 1 - P(X=0)

# Confidence interval for λ estimate
low, high = poisson.interval(0.95, mu=1.5)

Notes

  • The existing poisson-distribution.md note already includes scipy code; this source adds the official documentation reference and confirms the API contract
  • Key detail: poisson.pmf accepts array k values — the vectorized scoreline matrix in the existing note relies on this broadcasting behavior
  • For production sports models: always use scipy.stats.poisson rather than manual implementations to avoid numerical errors in edge cases
  • The loc parameter is rarely needed for football xG models but useful for specialty applications