rubsj - Overview

Ruby Jha

Engineering Manager · Applied AI · Cloud

I've spent 20+ years leading engineering teams at State Street, Centene, and EY. Teams up to 12 engineers across the US, India, and Poland. The products I've built serve 40+ enterprise customers, drove $250K/mo in cost savings, and handle real regulatory scrutiny where a bad deployment means financial loss.

Now I'm bringing that same discipline to AI. I'm building a series of production-grade AI systems that cover RAG pipelines, embedding fine-tuning, and multi-agent orchestration. Every project has evaluation frameworks, architecture decision records, and metrics I'd actually trust in a code review. The goal is to lead AI engineering teams with the same rigor I bring to building the systems myself.

🌐 rubyjha.dev · 💼 LinkedIn

🤖 AI/ML Portfolio

These aren't API wrappers. Each project solves a real engineering problem with measurable outcomes, reproducible from committed code.

👉 Full Portfolio Overview →

✅ Completed

#	Project	What I Proved	Key Result	Stack
P1	Synthetic Data Pipeline	Self-correcting generation with 5-layer validation	36 failures → 0 · 81.7% inter-rater agreement	Python · Pydantic · OpenAI · Instructor
P2	RAG Evaluation Framework	16-config grid search. Reranking was the single biggest lift	Recall@5 0.625 → 0.747 (+19.5%) · 557 tests	Python · FAISS · LangChain · RAGAS · Cohere
P3	Contrastive Embedding Fine-Tuning	LoRA hit 96.2% of full fine-tune with 0.32% parameters	Spearman -0.22 → +0.85 · AUC-ROC 0.993	Python · Sentence-Transformers · PEFT/LoRA
P4	AI Resume Coach	Template choice is statistically significant for scoring	Chi² = 32.74 (p<0.001) · 532 tests · 99% coverage	Python · OpenAI · ChromaDB · FastAPI
P5	ShopTalk Knowledge Agent	First-principles RAG (no LangChain). Heading-aware chunking dominated 46 configs	NDCG@5 0.896 · Judge 4.77/5.0 · 627 tests	Python · FAISS · LiteLLM · Cohere · Ollama

🔨 In Progress

#	Project	What It Does	Stack
P6	Digital Writing Clone	Multi-agent writing style clone with CrewAI	Python · CrewAI · OpenAI · Sentence-Transformers

🗓️ Up Next: P7–P9 covering feedback intelligence, Jira automation, and DevOps root-cause analysis. See the full roadmap.

📝 Latest Blog Posts

How I Calibrated an LLM Judge That Approved Everything – my first LLM judge had a 0% failure rate, which meant it was useless.
Building 9 AI Projects (While Working Full-Time) – the portfolio, the progression, and what I've learned so far.

👉 More on rubyjha.dev/blog →

🛠️ Skills

Leadership: People Management · Hiring & Team Building · Performance & Promotions · Executive Communication · Technical Strategy

Technical: Python · Java · TypeScript · OpenAI API · LangChain · CrewAI · FastAPI · ChromaDB · Azure · Docker · Kubernetes · React · Spring Boot

I build AI systems and the teams that ship them.
rubyjha.dev · LinkedIn · AI Portfolio