Building scalable data infrastructure and to-be deploying machine learning systems at Rabobank

I specialize in MLOps, data pipelines, and turning experimental models into production-ready solutions. With a proven track record in competitive machine learning and a focus on infrastructure as code, I focus primarily on data science, data engineering, and DevOps.

Rob Kras

Education

MSc Computer Science 2024-2025

Leiden University

Specialization: Data Science & AI

Thesis: Cross-Modal Sound Symbolism in Vision-Language Models

BSc Computer Science 2020-2023

Vrije Universiteit Amsterdam

Minor: Data Science

Tech Stack

Languages
Python SQL Scala Bash C/C++
ML & Data
XGBoost LightGBM PyTorch Apache Spark Pandas
DevOps & Cloud
Git CI/CD Linux AWS GCP

Data & DevOps Engineer

Rabobank

Current

Building and maintaining scalable data pipelines and ML infrastructure in production environments. Specializing in MLOps practices, containerization, CI/CD automation, and cloud-based data engineering. Main duties include application maintenance, production deployment through infrastructure as code, creating data pipelines, automated testing, and monitoring systems.

Python Apache Spark SQL CI/CD Cloud

Research: Multimodal Sound Symbolism

Leiden University

2024-2025

Investigated cross-modal sound symbolism in vision-language models. The research combined computational linguistics, computer vision, and cognitive science to analyze how multimodal AI models process and represent sound-meaning associations across cultures. Achieved grade of 8/10.

View on GitHub ↗

A collection of competitive machine learning projects showcasing technical evolution from experimentation to deployment.

Road Accident Risk Prediction

Top 8%

Ensemble methods predicting accident risk. Created temporal and weather interaction features. Containerized training pipeline using Docker for reproducibility. Rank 313 / 4,082.

Ensemble Optuna Docker

Bank Marketing Classification

Top 17%

Customer response prediction in 7 days. YAML configuration management for rapid prototyping. Rank 576 / 3,367.

Classification Optuna

Podcast Listening Time

Top 16%

Time series regression with temporal features and advanced stacking. Rank 536 / 3,310.

Time Series Stacking

Optimal Fertilizer Prediction

Top 28%

Multi-label classification with MAP@3 optimization and agricultural domain knowledge. Rank 732 / 2,650.

Multi-label MAP@3

Titanic Survival Prediction

First Kaggle experience. Feature engineering fundamentals and XGBoost. Rank 2,331 / 15,346.

XGBoost Feature Engineering

Spaceship Titanic

Lessons on overfitting and feature correlation. Rank 613 / 1,816.

Classification Feature Selection

Credit Card Fraud Detection

Imbalanced data with SMOTE. Learned proper application within K-Folds.

Imbalanced Data SMOTE

Personality Type Prediction

Limited data with advanced oversampling and Bayesian optimization. Rank 1,379 / 4,067.

SMOTE Bayesian Opt

Open Source

ML Utilities Library

Reusable machine learning components and helper functions. Collection of battle-tested utilities for data science workflows.

View on GitHub ↗