Skip to content

Disaster recovery / rebuild

Ultron Infra runs on one box (ultron), which is one point of failure. The recovery story is “reinstall + resync”: GitOps rebuilds the cluster from webb1es/gitops, and a short list of out-of-band items gets recreated by hand. Backups cover the data. The Penvoice manifests below are just the example app that happens to be onboarded onto this instance.

In Git (auto via Argo CD)By hand (out-of-band)
App-of-apps: cnpg-operator, keycloak-operator, penvoice, keycloak-testk3s install + Helm bootstrap (cert-manager, monitoring, Argo)
Rollout, AnalysisTemplate, ingress, ServiceMonitor, ConfigMapsSecrets: penvoice-api-kc, penvoice-pg-backup-creds, keycloak-pg-backup-creds
Postgres Cluster definitions + backup configRegister the gitops repo in Argo CD (private → PAT)
Keycloak instance CRKeycloak realm penvoice + clients (unless via KeycloakRealmImport)
GHCR package public (or an imagePullSecret)

Postgres data is not in Git — it’s restored from the Oracle Object Storage backups (see Backup & restore). The Keycloak realm also lives in its DB, which is backed up.

sequenceDiagram
  participant Op as Operator
  participant Node as ultron
  participant Argo as Argo CD
  participant Git as gitops repo
  participant KC as Keycloak

  Op->>Node: 1. install k3s (keep bundled Traefik)
  Op->>Node: 2. firewall: open 80/443, 6443 stays private
  Op->>Node: 3. Helm bootstrap (cert-manager, monitoring, Argo)
  Op->>Argo: 4. register gitops repo (PAT) + apply root-app
  Argo->>Git: reconcile apps/ (cnpg, keycloak-operator, penvoice, keycloak-test)
  Op->>Node: 5. recreate out-of-band Secrets
  Op->>KC: 6. configure realm penvoice + clients + audience mapper
Terminal window
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $(tailscale ip -4) --tls-san ultron" sh -

The --tls-san flags are what make kubectl over Tailscale work.

Modern k3s (kube-router) usually needs no host-firewall changes; ensure 80/443 accepted and verify pod egress. Open VCN 80/443 to the internet; 6443 stays private (Tailscale only).

3. Helm bootstrap (own namespaces, pinned versions)

Section titled “3. Helm bootstrap (own namespaces, pinned versions)”

Install, in order, at the pinned versions:

  • cert-manager + letsencrypt-staging / letsencrypt-prod ClusterIssuers (HTTP-01 via Traefik).
  • kube-prometheus-stack.
  • Argo CD + Rollouts + Workflows + Events. Expose Argo CD at argocd.webbies.dev (Traefik ingress, server.insecure: true, cert-manager annotation).
Terminal window
# Register the private gitops repo in Argo CD (PAT, Contents:read)
# then bootstrap everything:
kubectl apply -f bootstrap/root-app.yaml

Argo CD now reconciles cnpg-operator, keycloak-operator, penvoice, and keycloak-test.

Not in Git — create by hand:

  • penvoice/penvoice-api-kc — Keycloak API client secret.
  • penvoice/penvoice-pg-backup-creds and keycloak/keycloak-pg-backup-creds — Oracle Object Storage S3 keys. Access key = clean hex; secret key has +/= — don’t swap (see Troubleshooting).
  • Ensure ghcr.io/webb1es/penvoice-api is public (or add an imagePullSecret).

Configure realm penvoice with:

  • client penvoice-apiconfidential, service-account roles manage-users / query-users.
  • client penvoice-webpublic, PKCE.
  • an audience mapper adding aud: penvoice-api.

Do it in the admin console (<host>/admin, creds in the operator’s <instance>-initial-admin Secret — see Access & consoles), or declaratively as a KeycloakRealmImport CR (preferred — the operator applies it, making the realm config-as-code in Git).