AbhiOnGithub - Overview

Hey there 👋 I'm Abhishek

Obsessed with making models go brrr — from training to real-time inference at scale

⚡ About Me

🔥 I live and breathe AI Inference — optimizing models to run faster, cheaper, and at massive scale
🧠 Deep in the NVIDIA inference stack: TensorRT, Triton Inference Server, CUDA, TensorRT-LLM, and NIM
🚀 Passionate about squeezing every last TFLOP out of GPUs — from A100s to H100s to Blackwell
🏗️ Building and scaling inference pipelines that serve millions of requests with minimal latency
🌐 Background in cloud-native architecture across AWS, Azure, and GCP — now laser-focused on GPU-accelerated inference infrastructure
🤝 Open to collaborating on open-source inference tooling, model optimization, and high-performance serving systems

🛠️ Inference & AI Stack:

☁️ Cloud & Infra:

💬 Ask me about GPU-accelerated inference, model optimization, batching strategies, and scaling LLM serving
👯 Looking to collaborate on inference engines, model compilers, and open-source AI infrastructure

⚡ What I'm focused on in 2025–2026

- Optimizing LLM inference — KV-cache management, speculative decoding, continuous batching
- TensorRT-LLM and TensorRT for maximum throughput on NVIDIA GPUs
- Triton Inference Server — model ensembles, dynamic batching, multi-GPU serving
- NVIDIA NIM microservices for production-grade AI deployment
- CUDA kernel optimization and custom inference operators
- vLLM, SGLang, and other open-source LLM serving frameworks
- Multi-node inference on H100 / Blackwell clusters with NVLink & NVSwitch
- Quantization (FP8, INT4, AWQ, GPTQ) for efficient model deployment
- Go, Rust, and C++ for high-performance inference infrastructure

🧠 Technologies I know

- Inference: TensorRT, TensorRT-LLM, Triton Inference Server, NVIDIA NIM, vLLM, ONNX Runtime
- GPU/Compute: CUDA, cuDNN, NCCL, NVLink, Multi-Instance GPU (MIG)
- ML Frameworks: PyTorch, JAX, ONNX
- Cloud: AWS (SageMaker, EKS, EC2 P/G instances), Azure (AKS, NC/ND VMs), GCP (GKE, A3/A2 VMs)
- Containers & Orchestration: Docker, Kubernetes, Helm, NVIDIA GPU Operator
- Languages: Python, C++, Go, Rust, C#, Java
- IaC: Terraform, Pulumi, AWS CloudFormation, Azure ARM
- Monitoring: Prometheus, Grafana, Splunk, Elastic Stack
- Streaming: Apache Kafka, Apache Flink, Spark Streaming

📚 Previously

- Cloud-native architecture and distributed systems across AWS & Azure
- Serverless and modular monolithic architectures
- Full-stack development with C#/.NET, Java/Spring Boot, React
- GoLang microservices (GoORM, Fiber, Chi, Mux)
- Distributed Application Runtime (DAPR)
- Cross-platform development with Xamarin/MAUI

Navigation Menu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abhishek Gupta AbhiOnGithub

Block or report AbhiOnGithub

Hey there 👋 I'm Abhishek

⚡ About Me

🛠️ Inference & AI Stack:

☁️ Cloud & Infra:

Pinned Loading