Rainfall Prediction
Top 0.2%Feature engineering, K-Folds, and ensemble methods. Discovered that simpler algorithms (KNN) can outperform complex ensembles when properly optimized. Rank 5 / 2,529.
Rob Kras
Building data infrastructure with Databricks and Azure.
I'm a Data Engineer with a passion for building scalable data infrastructure and turning raw data into valuable insights. My expertise lies in Databricks, Azure, and the entire data ecosystem—from pipeline design to deployment.
With a background in Computer Science and a specialization in Data Science & AI, I combine academic rigor with practical experience. I've competed in Kaggle competitions and apply those same analytical skills to real-world business problems.
Currently, I work at a leading financial institution, focusing on data engineering, MLOps, and cloud infrastructure.
Rabobank
Building and maintaining scalable data pipelines and ML infrastructure in production environments. Specializing in MLOps practices, containerization, CI/CD automation, and cloud-based data engineering.
Leiden University
Investigated cross-modal sound symbolism in vision-language models. Combined computational linguistics, computer vision, and cognitive science. Grade: 8/10.
View Research ↗A selection of projects showcasing my expertise in data engineering, machine learning, and cloud infrastructure.
Feature engineering, K-Folds, and ensemble methods. Discovered that simpler algorithms (KNN) can outperform complex ensembles when properly optimized. Rank 5 / 2,529.
Regression techniques with domain knowledge and SHAP for feature importance. Achieved rank 37 / 3,935. Discovered and exploited data leakage for near-perfect score.
Binary classification for financial risk assessment. Feature engineering combining credit metrics, debt-to-income ratios, and payment history. Ensemble of XGBoost, LightGBM, and CatBoost. Rank 172 / 3,724.
Predicting song tempo from audio features. Combined signal processing with machine learning, leveraging music theory for feature engineering. Rank 131 / 2,581.
Ensemble methods predicting accident risk. Created temporal and weather interaction features. Containerized training pipeline using Docker. Rank 313 / 4,082.
Customer response prediction in 7 days. YAML configuration management for rapid prototyping. Rank 576 / 3,367.