Playground Series S5E11

Predicting Loan Payback

A binary classification challenge predicting whether loan applicants will successfully repay their loans based on financial and demographic features.

Competition Rank

172 / 3,724

Percentile

Top 4%

Evaluation Metric

ROC-AUC

Achievement

🏆 Silver

Problem Overview

This competition focused on predicting loan repayment likelihood, a critical task in financial risk assessment. The challenge required analyzing applicant data including credit history, employment status, income levels, and loan characteristics to determine default risk. This type of prediction is essential for banks and lending institutions to make informed lending decisions.

Technical Approach

Feature Engineering: Created interaction features between income and loan amount, debt-to-income ratios, and credit utilization metrics
Missing Value Strategy: Applied sophisticated imputation techniques using KNN and iterative imputation for numerical features
Model Architecture: Ensemble of gradient boosting models (XGBoost, LightGBM, CatBoost) with carefully tuned hyperparameters
Cross-Validation: Implemented stratified K-fold validation to ensure robust performance across different risk segments
Threshold Optimization: Fine-tuned classification thresholds to maximize ROC-AUC score
Model Stacking: Combined multiple base models using logistic regression as meta-learner

Key Insight

The most impactful features were credit history length, debt-to-income ratio, and payment history. However, creating derived features that captured the interaction between loan amount and applicant income significantly boosted model performance, highlighting the importance of domain knowledge in feature engineering.

Technology Stack

Python Pandas NumPy Scikit-learn XGBoost LightGBM CatBoost Optuna SHAP Matplotlib Seaborn

Lessons Learned

This competition reinforced the importance of understanding the business context behind the data. In financial prediction tasks, interpretability is as important as accuracy - stakeholders need to understand why a model makes certain predictions. Using SHAP values to explain model decisions not only helped with feature engineering but also provided insights into which factors most influence loan repayment probability.

View Full Jupyter Notebook →