Deep Dives

In-depth technical essays on LLM systems, AI infrastructure, Agentic AI, and production ML. Each article is designed to build your understanding from first principles to practical implementation.

Read on Substack

Coming Soon

Subscribe to be notified when new articles are published.

15 min readComing Soon

Understanding LLM Inference at Scale

A comprehensive guide to prefill, decode, KV cache, batching strategies, and GPU memory optimization for production LLM serving.

LLMInferenceInfrastructure

20 min readComing Soon

Building Agentic AI Systems

Architecture patterns for autonomous AI agents — from planning and tool use to memory systems and multi-agent coordination.

Agentic AIArchitectureLLM

18 min readComing Soon

RAG Systems: Beyond the Basics

Advanced retrieval-augmented generation patterns including hybrid search, reranking, chunking strategies, and evaluation frameworks.

RAGSearchLLM

22 min readComing Soon

ML Infrastructure for Production

Building reliable ML pipelines — feature stores, model serving, monitoring, and the infrastructure that powers modern AI applications.

MLOpsInfrastructureProduction