● Open to opportunities
Focused on Kubernetes, scalable systems, and building reliable infrastructure. Currently working towards Kubestronaut.
Problem
Under load, apps became unresponsive. No HPA initially.
Even after scaling, traffic didn't distribute as expected and pods (~1.5-2min startup on Fargate) came too late.
What I did
Introduced HPA and redesigned load testing to include scaling behavior.
Identified that CPU/memory signals lag real traffic patterns.
Outcome
HPA improved stability (~80%), but spike loads still broke the system.
Realization: pre-scaling and event-driven scaling were required.
Learning
Autoscaling is reactive, not instantaneous.
For bursty or async workloads, CPU/memory are weak signals of demand.
Scaling decisions must align with how the workload behaves, not just what metrics are available.
Problem
Dozens of microservices had separate pipelines. Any change meant updating every repo - and missed updates were discovered late.
What I did
Initially tried scripting updates across repos (didn't scale).
Moved to shared pipeline libraries and reduced flows into ~3–4 standard patterns.
Outcome
Pipeline changes became centralized. New services only needed a minimal import - no pipeline setup/testing overhead.
Learning
Automating repeated tasks is not the same as solving duplication.
If duplication exists, automate-after-abstraction - not before.
Problem
Ephemeral CI runners (DIND) needed frequent updates with strict security constraints.
Using Alpine reduced vulnerabilities, but introduced issues (musl + unofficial Node support).
What I did
Built an internal CLI to generate/update runner images with required dependencies (Node, Helm, etc.) via arguments.
Outcome
Removed manual image maintenance and made updates repeatable under security constraints.
Learning
When constraints stack (security, compatibility, tooling), pipelines become brittle.
Encapsulating complexity in a dedicated tool is often more stable than pushing it into CI logic.
Problem
AI workloads (file uploads, async processing) didn't scale well with HPA.
CPU/memory signals didn't reflect actual pressure.
What I did
Analyzed scaling gaps during load tests and explored alternatives beyond HPA/VPA.
Outcome
Identified that event-driven scaling (e.g., KEDA) fits the workload better, even though not implemented at the time.
Learning
Choosing the wrong scaling signal is worse than no scaling.
The real problem wasn't tuning HPA - it was assuming the problem fit HPA at all.
Problem
Manual onboarding across tools took 2–3 days per request.
What I did
Automated onboarding using tool APIs into a self-service flow.
Outcome
Reduced setup time to under 5 minutes.
Learning
Most developer friction isn't technical - it's process latency.
Understanding why containers change how we think about processes and systems.
Read →
Using dig to break down DNS step-by-step and understand what's happening.
Read →
Building a mental model of Git instead of memorizing commands.
Read →
Open to discussions around DevOps, Kubernetes, and platform engineering.