Building Workload-Aware Infrastructure at Snaptrude
Two different scaling problems
Snaptrude runs across multiple services — a React frontend, a Node.js API layer, and a Django backend. As the platform grew, two distinct scaling problems emerged that needed completely different solutions.
The first was traffic-driven. More concurrent users meant more requests, and the system needed to handle spikes without degrading. The second was workload-driven. Certain enterprise customers consistently triggered compute-heavy operations that a standard instance couldn't handle cleanly. The mistake would have been solving both with the same tool.
Horizontal scaling and the operational layer around it
Kubernetes HPA handles the mechanics of horizontal scaling — define thresholds, let the cluster add and remove instances. That part is table stakes. The real engineering work is everything around it that makes scaling safe in production.
The first problem is traffic spikes killing in-flight requests. When load drops and Kubernetes decides to terminate an instance, any requests currently being processed on that instance die mid-flight. I implemented graceful shutdown handling on each service — on receiving a termination signal, the service stops accepting new requests but waits for active requests to complete before exiting. Kubernetes gives a configurable grace period before force-killing the process. Getting this window right meant no user-visible errors during scale-down events.
The second problem is new instances taking traffic before they're ready. A freshly started service needs time to warm up — load configuration, establish database connections, prime any in-memory state. Without readiness probes, Kubernetes routes traffic to new instances the moment the container starts, which produces errors during that warm-up window. I added readiness endpoints on each service that return healthy only after the service has completed its initialization sequence. Kubernetes checks this before routing any traffic.
The third problem is knowing when something is wrong. I instrumented each service with Prometheus metrics — request rate, latency percentiles, error rates, and active connection counts. Alerting rules fire when error rates climb past threshold or when p95 latency spikes, independent of whether the auto-scaler has responded yet. This meant operational issues were surfaced through alerting rather than discovered through user complaints. The logs around scale events — when instances were added, when they were removed, which thresholds triggered the decision — made it straightforward to audit the system's behavior and tune thresholds over time.
Proactive vertical scaling for enterprise customers
Enterprise customers at Snaptrude have a consistent usage pattern. They import large architectural projects from external tools — Revit, SketchUp — into the platform and use Snaptrude as a collaborative workspace on top of those projects. These imports are heavy. A large Revit model can contain tens of thousands of geometry objects, complex material definitions, and object types that don't map directly to Snaptrude's internal representation and have to be processed and converted manually.
The naive approach is reactive scaling — wait for CPU and memory to spike, then scale the instance up. The problem is that reactive scaling responds after the user is already experiencing degradation. By the time the new resource allocation kicks in, the import is already struggling.
The better approach came from recognizing that enterprise import load is predictable by customer identity, not by runtime metrics. We know who these customers are. We know what they do when they open the application. So we moved the scaling trigger earlier — to session initialization.
I built a middleware layer that intercepts each client session at startup and checks the account tier. When an enterprise account is detected, the middleware immediately triggers a resource scale-up for that customer's workload before any import has been initiated. The system doesn't wait for the heavy operation to start. It allocates the resources as soon as the customer opens the application, on the assumption — which holds consistently — that a heavy import is coming.
This turned out to be a meaningful UX improvement. Imports that previously hit resource ceilings and slowed to a crawl mid-process now run with adequate resources from the start. The failure mode we were seeing — imports that partially completed before running out of memory — went away.
Docker orchestration across the stack
With React, Node.js, and Django running as separate services, local development was messy. Getting a new engineer set up meant installing multiple runtimes, configuring environment variables correctly for each service, getting the network topology right between them, and inevitably debugging something that worked on one machine but not another.
I containerized the full stack and wrote a compose configuration that wires all services together — correct networking, dependency ordering, shared volumes, environment injection. A new engineer runs one command and has the entire platform running locally in minutes.
The same container definitions are the production artifacts. Dev, staging, and production environments run identical images with environment-specific configuration layered on top. This removed a whole category of deployment bugs where something worked locally but behaved differently in production due to environment differences. What you run locally is structurally the same thing that runs in production.
Maintaining separate environment configurations — with their own resource limits, replica counts, and service dependencies — made promoting builds from staging to production a configuration swap rather than a manual process with room for error.
Takeaways
The most useful framing I found for infrastructure work is that it's really about predictability. Horizontal scaling works because stateless services are predictable — any instance can handle any request, so adding instances linearly adds capacity. Vertical scaling for enterprise customers works because their behavior is predictable — we know who they are and what they'll do, so we can get ahead of the load instead of reacting to it.
Infrastructure that reacts is always one step behind. The places where I could move the decision earlier — closer to the cause rather than the symptom — produced more reliable outcomes than the places where I was tuning reactive thresholds.

