Ruby Jha
Engineering Manager · Applied AI · Cloud
I've spent 20+ years leading engineering teams at State Street, Centene, and EY. Teams up to 12 engineers across the US, India, and Poland. The products I've built serve 40+ enterprise customers, drove $250K/mo in cost savings, and handle real regulatory scrutiny where a bad deployment means financial loss.
Now I'm bringing that same discipline to AI. I'm building a series of production-grade AI systems that cover RAG pipelines, embedding fine-tuning, and multi-agent orchestration. Every project has evaluation frameworks, architecture decision records, and metrics I'd actually trust in a code review. The goal is to lead AI engineering teams with the same rigor I bring to building the systems myself.
🌐 rubyjha.dev · 💼 LinkedIn
🤖 AI/ML Portfolio
These aren't API wrappers. Each project solves a real engineering problem with measurable outcomes, reproducible from committed code.
✅ Completed
| # | Project | What I Proved | Key Result | Stack |
|---|---|---|---|---|
| P1 | Synthetic Data Pipeline | Self-correcting generation with 5-layer validation | 36 failures → 0 · 81.7% inter-rater agreement | Python · Pydantic · OpenAI · Instructor |
| P2 | RAG Evaluation Framework | 16-config grid search. Reranking was the single biggest lift | Recall@5 0.625 → 0.747 (+19.5%) · 557 tests | Python · FAISS · LangChain · RAGAS · Cohere |
| P3 | Contrastive Embedding Fine-Tuning | LoRA hit 96.2% of full fine-tune with 0.32% parameters | Spearman -0.22 → +0.85 · AUC-ROC 0.993 | Python · Sentence-Transformers · PEFT/LoRA |
| P4 | AI Resume Coach | Template choice is statistically significant for scoring | Chi² = 32.74 (p<0.001) · 532 tests · 99% coverage | Python · OpenAI · ChromaDB · FastAPI |
| P5 | ShopTalk Knowledge Agent | First-principles RAG (no LangChain). Heading-aware chunking dominated 46 configs | NDCG@5 0.896 · Judge 4.77/5.0 · 627 tests | Python · FAISS · LiteLLM · Cohere · Ollama |
🔨 In Progress
| # | Project | What It Does | Stack |
|---|---|---|---|
| P6 | Digital Writing Clone | Multi-agent writing style clone with CrewAI | Python · CrewAI · OpenAI · Sentence-Transformers |
🗓️ Up Next: P7–P9 covering feedback intelligence, Jira automation, and DevOps root-cause analysis. See the full roadmap.
📝 Latest Blog Posts
- How I Calibrated an LLM Judge That Approved Everything – my first LLM judge had a 0% failure rate, which meant it was useless.
- Building 9 AI Projects (While Working Full-Time) – the portfolio, the progression, and what I've learned so far.
🛠️ Skills
Leadership: People Management · Hiring & Team Building · Performance & Promotions · Executive Communication · Technical Strategy
Technical: Python · Java · TypeScript · OpenAI API · LangChain · CrewAI · FastAPI · ChromaDB · Azure · Docker · Kubernetes · React · Spring Boot
I build AI systems and the teams that ship them.
rubyjha.dev · LinkedIn · AI Portfolio