-
Deploying self-hosted LLMs in Prod

A practical walkthrough of deploying a self-hosted LLM on AWS EKS with dedicated GPU node groups, taints/tolerations for isolation, separate inference and app services, model-weight caching via volumes, and sizing guidance for VRAM/concurrency and warm GPU capacity.
-
Mobile App Agentic Patterns

I’ve been building mobile apps that leverage LLMs for almost four years, and the hardest lessons didn’t come from prompts—they came from the phone itself. Mobile copilots feel inevitable: we already live in our messaging apps, and text is the most natural UI we have. But building copilots on a device that’s both highly privileged…
-
Reference Architecture: An Agentic CLI Application

While writing the previous post in this series, I hit a very unglamorous problem: my laptop disk was full. Not “almost full”. Full enough that everything started to feel brittle. I did what I always do: open a couple of folders, run a few du commands, check caches, look for the usual suspects, delete stuff,…
