Operate rollouts
On Ultron Infra, every app ships via an Argo Rollouts
canary gated on a Prometheus
success-rate metric. Day-2, you mostly watch it. Occasionally you promote or
abort by hand. Substitute <app> for the workload you’re operating — the
rollout is named after the app.
Watch a rollout
Section titled “Watch a rollout”kubectl argo rollouts get rollout <app> -n <app> --watchThis prints the live step, traffic weight, ReplicaSet revisions, pod readiness, and the background AnalysisRun status. Leave it running through a deploy.
Other useful reads:
# Just the analysis runs (the metric gate)kubectl get analysisrun -n <app>
# Pod-level detail when something's stuckkubectl get pods -n <app> -l app=<app>Promote or abort
Section titled “Promote or abort”# Skip the remaining pauses and go to 100%kubectl argo rollouts promote <app> -n <app>
# Force full promotion, ignoring remaining steps/analysiskubectl argo rollouts promote <app> -n <app> --full
# Abort: roll traffic back to the stable ReplicaSetkubectl argo rollouts abort <app> -n <app>
# After fixing the underlying issue, restart the canarykubectl argo rollouts retry rollout <app> -n <app>The metric gate aborts automatically on failure — successCondition: result[0] >= 0.95, failureLimit: 2. You rarely need a manual abort unless
you’re cutting a deploy short.
The safe config-cutover trick
Section titled “The safe config-cutover trick”The canary doubles as a safe way to repoint config (a new DB, a new Keycloak, changed env). The mechanism is the readiness probe:
flowchart LR Bad[bad repoint] --> New[new pods start] New -->|/readyz fails| NR[never become Ready] NR --> Hold[canary won't promote] Hold --> Stable[stable pods keep serving]
A bad repoint makes the new pods fail their /readyz probe, so they never
become Ready, so the canary won’t promote — and the old (stable) pods keep
serving live traffic. This is exactly how an auth-provider cutover is made
safe: if the new config had been wrong, no users would have seen it.