Graceful deploys on Kubernetes
runApp runs a fixed shutdown sequence on SIGTERM: flip /ready to 500, wait for the load balancer to drain, refuse new connections, finish in-flight requests, run plugin shutdown hooks in reverse boot order, exit. A hard timeout backstops a hung process.
This recipe shows the moving parts and the matching Kubernetes YAML.
What runApp does
Inside runApp(app, { shutdown, health }):
| Step | Sequence on SIGTERM |
|---|---|
| 1 | /ready flips to 500 (Lightship). Load balancer removes the pod from rotation. |
| 2 | Wait drainDelay ms (default 10_000). Gives the LB time to actually catch up. |
| 3 | Stop accepting new connections (http-terminator). |
| 4 | Drain in-flight requests, bounded by drainTimeout (default 30_000). |
| 5 | Run plugin + provider shutdown hooks in reverse boot order (DBs, Redis, queues). |
| 6 | Lightship's operational server shuts down. |
| 7 | process.exit(0) — or SIGKILL if hardTimeout (default 45_000) is exceeded. |
App side
import { runApp, defineCheck } from "@nwire/http";
import { app } from "./app.ts";
await runApp(app, {
shutdown: {
drainDelay: 10_000, // LB grace
drainTimeout: 30_000, // in-flight grace
hardTimeout: 45_000, // SIGKILL after this
},
health: {
port: 9_400,
livenessPath: "/live",
readinessPath: "/ready",
checks: [
defineCheck("db", () => prisma.$queryRaw`SELECT 1`),
defineCheck("redis", () => redis.ping()),
],
},
});The boot banner confirms what's running:
▸ nwire app "my-app" ready in 23ms
http http://localhost:3000
docs http://localhost:3000/docs
openapi http://localhost:3000/openapi.json
inspect http://localhost:3000/_nwire/manifest
health http://localhost:9400/live · /readyCluster side
The lifecycle assumes a few things about how the cluster is configured. Match these in your Deployment and Service.
Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
# Give the pod enough time to drain. Must exceed
# drainDelay + drainTimeout from the app's runApp() config.
terminationGracePeriodSeconds: 60
containers:
- name: app
image: my-org/my-app:latest
ports:
- name: http
containerPort: 3000
- name: health
containerPort: 9400
# Lifecycle hook keeps the pod alive long enough for the
# endpoints controller to remove it from Service routing.
# Same number as drainDelay (10s).
lifecycle:
preStop:
exec:
command: ["sleep", "10"]
livenessProbe:
httpGet:
path: /live
port: health
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: health
periodSeconds: 5
failureThreshold: 2
# Don't kill on startup if it's slow.
startupProbe:
httpGet:
path: /live
port: health
periodSeconds: 5
failureThreshold: 30Service
The Service targetPort points at the app port, not the health port. Health probes hit the pod directly through kubelet.
apiVersion: v1
kind: Service
metadata:
name: my-app
spec:
selector:
app: my-app
ports:
- name: http
port: 80
targetPort: httpHow the timeouts compose
terminationGracePeriodSeconds on the pod is the upper bound. Make sure it's bigger than the sum:
terminationGracePeriodSeconds ≥ drainDelay + drainTimeout + small bufferFor the defaults above (10s + 30s = 40s), 60s gives a 20s safety margin before kubelet sends SIGKILL.
hardTimeout (default 45s) is the app's upper bound — if the process is still alive at that point, the app force-exits itself. Set it to less than terminationGracePeriodSeconds so the app always wins the race against kubelet's SIGKILL.
Recommended timings by traffic profile
| Profile | drainDelay | drainTimeout | hardTimeout | terminationGracePeriodSeconds |
|---|---|---|---|---|
| Internal API, low traffic | 2s | 10s | 15s | 20s |
| Public API, moderate | 10s | 30s | 45s | 60s |
| Long-running ops (uploads) | 15s | 120s | 150s | 180s |
Anything longer and you probably want a queue worker instead of an HTTP handler.
Probes — liveness vs readiness
- Liveness = "is the process alive?" Cheap; fails only when the runtime is broken. Fails → kubelet restarts the pod.
- Readiness = "should new traffic come here?" Includes DB/Redis checks. Fails → kubelet stops routing; pod stays alive.
Nwire's defineCheck runs on the readiness path. Don't put expensive checks (slow DB queries, external API pings) on liveness — they'll cause restart loops.
Recipe: blue/green-style deploy
For zero-downtime deploys without rolling, set:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 100%
maxUnavailable: 0This brings all new pods up before any old pod is removed. Combined with Nwire's /ready gating, traffic only shifts when the new pods are actually ready (DB connected, JWKS fetched, etc).