Log-Loss (Cross-Entropy)¶
Overview¶
Log-loss (also called cross-entropy loss or logarithmic loss) is a scoring rule that measures the quality of probabilistic predictions. For binary outcomes, it is the negative average of log-probabilities assigned to the actual outcomes. It heavily penalizes confident wrong predictions — predicting 99% when the outcome is 0% scores much worse than predicting 51%.
Log-loss is the cost function used in logistic regression and is equivalent to the negative log-likelihood of the data under a Bernoulli model. Lower log-loss is better, with 0 being perfect.
In sports betting models, log-loss is used both as a training objective (to optimize probabilistic classifiers) and as a validation metric. Unlike brier-score which quadratically penalizes errors, log-loss penalizes exponentially.
Why It Matters¶
Log-loss matters because:
1. Training objective: Log-loss is the standard cost function for neural networks and logistic regression — it directly optimizes probability calibration.
2. Overconfidence penalty: A prediction of 0.001 for an event that occurs contributes −ln(0.001) = 6.9 to loss, much worse than Brier's 0.9801.
3. Multi-class support: Categorical cross-entropy handles1X2 soccer predictions naturally.
4. Strictly proper: Like Brier, log-loss is a strictly proper scoring rule.
Key Formula¶
Binary log-loss:
$$LL = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(p_i) + (1 - y_i) \log(1 - p_i)]$$
Multi-class log-loss (categorical, for 1X2):
$$LL = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{3} y_{ic} \log(p_{ic})$$
For soccer 1X2: C=3 classes (home win, draw, away win), y is one-hot encoded.
Normalized log-loss:
$$NLL = \frac{LL}{LL_{climatology}}$$
Worked Example¶
Binary predictions: [0.60, 0.70, 0.30, 0.80, 0.50]
Outcomes: [1, 1, 0, 1, 0]
$$LL = -\frac{1}{5}[\ln(0.6) + \ln(0.7) + \ln(0.7) + \ln(0.8) + \ln(0.5)] = -\frac{1}{5}(-1.891) = 0.378$$
Comparison: Brier score for same predictions = 0.126. Log-loss is higher due to the0.50 prediction for a 0 outcome (0.5 is less wrong than 0.3 would be in Brier, but log-loss penalizes the 0.80 prediction for the 0 outcome more).
Code Snippet¶
import numpy as np
from sklearn.metrics import log_loss
def log_loss_binary(predicted_probs, actual_outcomes, eps=1e-15):
"""Calculate binary log-loss."""
p = np.clip(predicted_probs, eps, 1 - eps)
return -np.mean(actual_outcomes * np.log(p) + (1 - actual_outcomes) * np.log(1 - p))
def log_loss_multiclass(predicted_probs_matrix, actual_outcomes_onehot, eps=1e-15):
"""Calculate multi-class log-loss for 1X2 soccer predictions."""
p = np.clip(predicted_probs_matrix, eps, 1 - eps)
p = p / p.sum(axis=1, keepdims=True)
return -np.mean(np.sum(actual_outcomes_onehot * np.log(p), axis=1))
# Example: soccer 1X2
predictions = np.array([[0.55, 0.25, 0.20], [0.30, 0.35, 0.35], [0.70, 0.20, 0.10]])
outcomes = [1, 3, 1] # home win, away win, home win
y_onehot = np.zeros((3, 3))
for i, outcome in enumerate(outcomes):
y_onehot[i, outcome - 1] = 1
ll = log_loss_multiclass(predictions, y_onehot)
print(f"Multi-class log-loss: {ll:.4f}")
Pitfalls¶
- More volatile than Brier: Small sample sizes (64 World Cup matches) make log-loss less reliable as a standalone metric.
- Overconfidence sensitive: Log-loss heavily penalizes confident wrong predictions — in sports with high variance, this can make log-loss seem worse than it is.
- Use for training, Brier for evaluation: Log-loss is ideal for model training (gradient-based optimization); Brier score + CLV are better for final evaluation.
- Clipping required: Log(0) is undefined — always clip predictions away from 0 and 1.
See Also¶
- brier-score — Brier is more appropriate for evaluation; log-loss for training
- calibration-plots — complement log-loss with visual calibration check
- bayesian-inference-sports — Bayesian models can improve log-loss