Skip to content

Example: Keycloak in-cluster

The generic recipe is in Onboard an app; this page shows it filled in for an in-cluster Keycloak — a stateful, operator-managed workload onboarded onto Ultron Infra.

Keycloak originally ran as a host-Docker container behind host nginx. It was migrated into the cluster as an operator-managed instance at test-auth.webbies.dev, with its own CloudNativePG database and a clean-slate realm (no data migration). The Penvoice API and web app were then repointed at it, and the old host Keycloak container + host nginx were retired — leaving Traefik as the single edge.

flowchart TD
  Op[apps/keycloak-operator] --> Operator[Keycloak Operator 26.6.3]
  KT[apps/keycloak-test] --> CR[Keycloak CR keycloak-test]
  KT --> PG[CNPG Cluster keycloak-test-pg]
  Operator -. manages .-> CR
  CR --> Svc[keycloak-test-service :8080]
  Ing[Ingress test-auth.webbies.dev] --> Svc
  PG --> CR

apps/keycloak-operator.yaml is an Argo CD Application that syncs platform/keycloak-operator into the keycloak namespace, pinned to operator 26.6.3. It uses ServerSideApply because the CRDs are ~500KB — over the client-side apply limit:

syncPolicy:
automated: { prune: true, selfHeal: true }
syncOptions:
- CreateNamespace=true
- ServerSideApply=true # CRDs are ~500KB

workloads/keycloak-test/keycloak-test.yaml is a Keycloak CR. It runs HTTP-only behind Traefik (which terminates TLS), trusts forwarded headers, and disables the operator’s own ingress so we can supply a Traefik one:

spec:
instances: 1
db:
vendor: postgres
host: keycloak-test-pg-rw # CNPG read-write service
database: keycloak
http:
httpEnabled: true # Traefik terminates TLS
hostname:
hostname: https://test-auth.webbies.dev
proxy:
headers: xforwarded
ingress:
enabled: false # we use our own Traefik ingress

Its DB is a CNPG Cluster named keycloak-test-pg (db/owner keycloak, 2Gi local-path). CNPG auto-creates the keycloak-test-pg-app Secret, whose username/password the CR reads. The CR also reaches Postgres via the CNPG read-write service keycloak-test-pg-rw.

ingress.yaml exposes test-auth.webbies.dev over Traefik with a letsencrypt-prod cert, routing to the operator-created keycloak-test-service on :8080.

The realm penvoice was configured fresh (admin console, or declaratively as a KeycloakRealmImport). It holds two clients plus an audience mapper:

ClientTypeNotes
penvoice-apiconfidentialservice-account roles manage-users / query-users
penvoice-webpublic, PKCEthe SPA; Authorization Code + PKCE

The audience mapper is the non-obvious piece: tokens issued to penvoice-web carry aud: ["account"] by default. A client scope with an Audience mapper adds penvoice-api as a custom audience and is attached as a Default scope to penvoice-web, so the API’s audience check passes.

sequenceDiagram
  participant W as penvoice-web (SPA)
  participant K as Keycloak (test-auth.webbies.dev)
  participant A as penvoice-api
  W->>K: Authorization Code + PKCE (client penvoice-web)
  K-->>W: access token (aud: penvoice-api)
  W->>A: GET /v1/me  Authorization: Bearer <token>
  A->>K: fetch JWKS, validate signature + aud + iss
  A-->>W: 200 (or 403 email_verification_required)

The API is a pure OAuth2 resource server: it validates bearer JWTs against the realm penvoice issuer (https://test-auth.webbies.dev/realms/penvoice) and the expected aud: penvoice-api — it never issues tokens. These values are exactly the KC_* keys in the API’s ConfigMap (see the API example).

5. The cutover used the Rollout canary as a safety net

Section titled “5. The cutover used the Rollout canary as a safety net”

Repointing the API at the new in-cluster Keycloak is a config change, and the API canary doubles as a guardrail for it. The repoint was shipped by bumping the penvoice.app/redeploy: "kc-test-cutover" annotation in rollout.yaml. Had the new config been wrong, the new pods would fail /readyz, the canary would refuse to promote, and the old pods would keep serving — see Progressive delivery.