Log-Loss (Cross-Entropy)¶

Overview¶

Log-loss (also called cross-entropy loss or logarithmic loss) is a scoring rule that measures the quality of probabilistic predictions. For binary outcomes, it is the negative average of log-probabilities assigned to the actual outcomes. It heavily penalizes confident wrong predictions — predicting 99% when the outcome is 0% scores much worse than predicting 51%.

Log-loss is the cost function used in logistic regression and is equivalent to the negative log-likelihood of the data under a Bernoulli model. Lower log-loss is better, with 0 being perfect.

In sports betting models, log-loss is used both as a training objective (to optimize probabilistic classifiers) and as a validation metric. Unlike brier-score which quadratically penalizes errors, log-loss penalizes exponentially.

Why It Matters¶

Log-loss matters because:
1. Training objective: Log-loss is the standard cost function for neural networks and logistic regression — it directly optimizes probability calibration.
2. Overconfidence penalty: A prediction of 0.001 for an event that occurs contributes −ln(0.001) = 6.9 to loss, much worse than Brier's 0.9801.
3. Multi-class support: Categorical cross-entropy handles1X2 soccer predictions naturally.
4. Strictly proper: Like Brier, log-loss is a strictly proper scoring rule.

Key Formula¶

Binary log-loss:

$$LL = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(p_i) + (1 - y_i) \log(1 - p_i)]$$

Multi-class log-loss (categorical, for 1X2):

$$LL = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{3} y_{ic} \log(p_{ic})$$

For soccer 1X2: C=3 classes (home win, draw, away win), y is one-hot encoded.

Normalized log-loss:

$$NLL = \frac{LL}{LL_{climatology}}$$

Worked Example¶

Binary predictions: [0.60, 0.70, 0.30, 0.80, 0.50]
Outcomes: [1, 1, 0, 1, 0]

$$LL = -\frac{1}{5}[\ln(0.6) + \ln(0.7) + \ln(0.7) + \ln(0.8) + \ln(0.5)] = -\frac{1}{5}(-1.891) = 0.378$$

Comparison: Brier score for same predictions = 0.126. Log-loss is higher due to the0.50 prediction for a 0 outcome (0.5 is less wrong than 0.3 would be in Brier, but log-loss penalizes the 0.80 prediction for the 0 outcome more).

Code Snippet¶

import numpy as np
from sklearn.metrics import log_loss

def log_loss_binary(predicted_probs, actual_outcomes, eps=1e-15):
    """Calculate binary log-loss."""
    p = np.clip(predicted_probs, eps, 1 - eps)
    return -np.mean(actual_outcomes * np.log(p) + (1 - actual_outcomes) * np.log(1 - p))

def log_loss_multiclass(predicted_probs_matrix, actual_outcomes_onehot, eps=1e-15):
    """Calculate multi-class log-loss for 1X2 soccer predictions."""
    p = np.clip(predicted_probs_matrix, eps, 1 - eps)
    p = p / p.sum(axis=1, keepdims=True)
    return -np.mean(np.sum(actual_outcomes_onehot * np.log(p), axis=1))

# Example: soccer 1X2
predictions = np.array([[0.55, 0.25, 0.20], [0.30, 0.35, 0.35], [0.70, 0.20, 0.10]])
outcomes = [1, 3, 1]  # home win, away win, home win
y_onehot = np.zeros((3, 3))
for i, outcome in enumerate(outcomes):
    y_onehot[i, outcome - 1] = 1
ll = log_loss_multiclass(predictions, y_onehot)
print(f"Multi-class log-loss: {ll:.4f}")

Pitfalls¶

More volatile than Brier: Small sample sizes (64 World Cup matches) make log-loss less reliable as a standalone metric.
Overconfidence sensitive: Log-loss heavily penalizes confident wrong predictions — in sports with high variance, this can make log-loss seem worse than it is.
Use for training, Brier for evaluation: Log-loss is ideal for model training (gradient-based optimization); Brier score + CLV are better for final evaluation.
Clipping required: Log(0) is undefined — always clip predictions away from 0 and 1.