This is the first post on this blog, so it’s mostly a note to myself about why it exists.
I spend my days on Kubernetes-native AI/LLM infrastructure — the platform layer between “a model that works in a notebook” and “a service that stays up under real traffic.” A lot of what matters there isn’t written down anywhere obvious: it lives in incident retros, Slack threads, and the scar tissue of having run the thing in production.
This blog is where I’ll write that down.
What to expect
- Long-form, hands-on posts. I’d rather go deep on one problem than survey ten.
- The operational side. Scheduling and autoscaling GPU workloads, serving and routing inference, observability, cost — the parts that decide whether a system holds up.
- Honest write-ups. Including the approaches that didn’t work and why.
Why a blog at all
Writing forces me to actually understand a thing well enough to explain it. If a post happens to save someone else a few hours, that’s a bonus.
If something here was useful — or wrong — I’d like to know. My links are in the header.
More soon.