Hello, I'm

Rob Kras

Data Engineer

Building scalable data infrastructure with Databricks and Azure. Specializing in data pipelines, MLOps, and turning data into actionable insights.

Rob Kras

About Me

I'm a Data Engineer with a passion for building scalable data infrastructure and turning raw data into valuable insights. My expertise lies in Databricks, Azure, and the entire data ecosystem—from pipeline design to deployment.

With a background in Computer Science and a specialization in Data Science & AI, I combine academic rigor with practical experience. I've competed in Kaggle competitions and apply those same analytical skills to real-world business problems.

Currently, I work at a leading financial institution, focusing on data engineering, MLOps, and cloud infrastructure.

Education

MSc Computer Science Leiden University Data Science & AI
BSc Computer Science Vrije Universiteit Amsterdam Minor: Data Science

Technical Skills

Data & Analytics

Databricks Apache Spark SQL Python Pandas

ML & AI

Machine Learning MLflow XGBoost LightGBM PyTorch SHAP

Cloud & DevOps

Azure Terraform Docker CI/CD IaC

Languages

Python SQL Scala Bash C/C++

Experience

Current

Data Engineer

Financial Services

Building and maintaining scalable data pipelines and ML infrastructure in production environments. Specializing in MLOps practices, containerization, CI/CD automation, and cloud-based data engineering.

Databricks Azure Infrastructure as Code CI/CD Apache Spark
2024-2025

Research: Multimodal Sound Symbolism

Leiden University

Investigated cross-modal sound symbolism in vision-language models. Combined computational linguistics, computer vision, and cognitive science. Grade: 8/10.

View Research ↗

Featured Projects

A selection of projects showcasing my expertise in data engineering, machine learning, and cloud infrastructure.

Rainfall Prediction

Top 0.2%

Feature engineering, K-Folds, and ensemble methods. Discovered that simpler algorithms (KNN) can outperform complex ensembles when properly optimized. Rank 5 / 2,529.

KNN Ensemble K-Folds Feature Engineering

House Prices Prediction

Top 1%

Regression techniques with domain knowledge and SHAP for feature importance. Achieved rank 37 / 3,935. Discovered and exploited data leakage for near-perfect score.

Regression SHAP XGBoost Data Leakage

Loan Payback Prediction

Top 4%

Binary classification for financial risk assessment. Feature engineering combining credit metrics, debt-to-income ratios, and payment history. Ensemble of XGBoost, LightGBM, and CatBoost. Rank 172 / 3,724.

Classification Ensemble SHAP XGBoost LightGBM

Music BPM Prediction

Top 5%

Predicting song tempo from audio features. Combined signal processing with machine learning, leveraging music theory for feature engineering. Rank 131 / 2,581.

Audio ML Regression Signal Processing Gradient Boosting

Road Accident Risk

Top 8%

Ensemble methods predicting accident risk. Created temporal and weather interaction features. Containerized training pipeline using Docker. Rank 313 / 4,082.

Ensemble Optuna Docker Feature Engineering

Bank Marketing

Top 17%

Customer response prediction in 7 days. YAML configuration management for rapid prototyping. Rank 576 / 3,367.

Classification Optuna YAML Config