Back to Portfolio
Playground Series S5E10

Predicting Road Accident Risk

A regression challenge predicting accident risk scores based on driver behavior, vehicle characteristics, weather conditions, and road features to improve road safety analytics.

Competition Rank
313 / 4,082
Percentile
Top 8%
Evaluation Metric
RMSE
Best Score
See Notebook

Problem Overview

This Playground Series competition focused on predicting road accident risk, a crucial application for insurance companies, autonomous vehicle systems, and traffic safety departments. The dataset included diverse features such as driver demographics, driving patterns, environmental conditions, and historical accident data. The goal was to build a robust regression model that could accurately predict risk scores across various scenarios.

Technical Approach

Key Insight

Weather conditions and time-of-day interactions proved to be highly predictive. Creating features that captured the combined effect of poor weather during rush hours significantly improved model performance. Additionally, historical accident frequency per driver was the single most important feature, emphasizing the value of longitudinal data in risk assessment.

Technology Stack

Python Pandas NumPy Scikit-learn XGBoost LightGBM CatBoost Optuna SHAP Matplotlib Seaborn

Lessons Learned

This competition reinforced my understanding of regression tasks with imbalanced target distributions. Managing the long tail of high-risk cases required careful validation strategy and custom loss functions. The experience also highlighted the importance of ensemble diversity - combining models with different strengths (CatBoost's categorical handling, XGBoost's regularization, LightGBM's speed) produced more robust predictions than any single model.

From a DevOps perspective, I also experimented with containerizing the training pipeline using Docker, making it reproducible and easier to deploy in production environments - a critical skill for taking ML models from notebooks to real-world systems.