Disaster recovery / rebuild
Ultron Infra runs on one box (ultron), which is one point of failure. The
recovery story is “reinstall + resync”: GitOps rebuilds
the cluster from webb1es/gitops, and a short list of out-of-band items gets
recreated by hand. Backups cover the data. The Penvoice manifests below are just
the example app that happens to be onboarded onto this instance.
What’s in Git vs by hand
Section titled “What’s in Git vs by hand”| In Git (auto via Argo CD) | By hand (out-of-band) |
|---|---|
| App-of-apps: cnpg-operator, keycloak-operator, penvoice, keycloak-test | k3s install + Helm bootstrap (cert-manager, monitoring, Argo) |
| Rollout, AnalysisTemplate, ingress, ServiceMonitor, ConfigMaps | Secrets: penvoice-api-kc, penvoice-pg-backup-creds, keycloak-pg-backup-creds |
Postgres Cluster definitions + backup config | Register the gitops repo in Argo CD (private → PAT) |
| Keycloak instance CR | Keycloak realm penvoice + clients (unless via KeycloakRealmImport) |
| GHCR package public (or an imagePullSecret) |
Postgres data is not in Git — it’s restored from the Oracle Object Storage backups (see Backup & restore). The Keycloak realm also lives in its DB, which is backed up.
The sequence
Section titled “The sequence”sequenceDiagram participant Op as Operator participant Node as ultron participant Argo as Argo CD participant Git as gitops repo participant KC as Keycloak Op->>Node: 1. install k3s (keep bundled Traefik) Op->>Node: 2. firewall: open 80/443, 6443 stays private Op->>Node: 3. Helm bootstrap (cert-manager, monitoring, Argo) Op->>Argo: 4. register gitops repo (PAT) + apply root-app Argo->>Git: reconcile apps/ (cnpg, keycloak-operator, penvoice, keycloak-test) Op->>Node: 5. recreate out-of-band Secrets Op->>KC: 6. configure realm penvoice + clients + audience mapper
1. Install k3s (keep bundled Traefik)
Section titled “1. Install k3s (keep bundled Traefik)”curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--write-kubeconfig-mode 644 --tls-san $(tailscale ip -4) --tls-san ultron" sh -The --tls-san flags are what make kubectl over Tailscale work.
2. Firewall
Section titled “2. Firewall”Modern k3s (kube-router) usually needs no host-firewall changes; ensure 80/443 accepted and verify pod egress. Open VCN 80/443 to the internet; 6443 stays private (Tailscale only).
3. Helm bootstrap (own namespaces, pinned versions)
Section titled “3. Helm bootstrap (own namespaces, pinned versions)”Install, in order, at the pinned versions:
- cert-manager +
letsencrypt-staging/letsencrypt-prodClusterIssuers (HTTP-01 via Traefik). - kube-prometheus-stack.
- Argo CD + Rollouts + Workflows + Events. Expose Argo CD at
argocd.webbies.dev(Traefik ingress,server.insecure: true, cert-manager annotation).
4. Register the repo + apply the root app
Section titled “4. Register the repo + apply the root app”# Register the private gitops repo in Argo CD (PAT, Contents:read)# then bootstrap everything:kubectl apply -f bootstrap/root-app.yamlArgo CD now reconciles cnpg-operator, keycloak-operator, penvoice, and keycloak-test.
5. Recreate out-of-band Secrets
Section titled “5. Recreate out-of-band Secrets”Not in Git — create by hand:
penvoice/penvoice-api-kc— Keycloak API client secret.penvoice/penvoice-pg-backup-credsandkeycloak/keycloak-pg-backup-creds— Oracle Object Storage S3 keys. Access key = clean hex; secret key has+/=— don’t swap (see Troubleshooting).- Ensure
ghcr.io/webb1es/penvoice-apiis public (or add animagePullSecret).
6. Configure the Keycloak realm
Section titled “6. Configure the Keycloak realm”Configure realm penvoice with:
- client
penvoice-api— confidential, service-account rolesmanage-users/query-users. - client
penvoice-web— public, PKCE. - an audience mapper adding
aud: penvoice-api.
Do it in the admin console (<host>/admin, creds in the operator’s
<instance>-initial-admin Secret — see Access & consoles),
or declaratively as a KeycloakRealmImport CR (preferred — the operator
applies it, making the realm config-as-code in Git).