Files
hermes-mcp/docs/runbooks/2026-06-14-outage-fix-log.md
2026-06-14 12:45:44 -04:00

13 KiB
Raw Blame History

Outage Fix Log — 2026-06-14

This is the step-by-step record of what was changed to restore public access to the SquareMCP / FetcherPay commercial sites.


Environment

  • Host: 104.190.60.129 (MicroK8s + Docker)
  • Edge proxy: Traefik v3 in Docker, binds :80, :443, :8080
  • Hermes MCP: K8s pod with hostNetwork: true on :3456
  • Key files:
    • /home/garfield/traefik-compose.yml
    • /home/garfield/traefik.yml
    • /home/garfield/letsencrypt/manual/tls.yml
    • /home/garfield/Downloads/docker-compose.prod.yml

1. Attach Traefik to the FetcherPay network

File: /home/garfield/traefik-compose.yml

Added the fetcherpay external network so Traefik can reach FetcherPay Docker backends.

services:
  traefik:
    ...
    networks:
      - hermes-net
      - obsidian-net
      - fetcherpay

networks:
  hermes-net:
    external: true
    name: hermes-mcp_hermes-net
  obsidian-net:
    external: true
    name: obsidian_obsidian-net
  fetcherpay:
    external: true
    name: fetcherpay_fetcherpay

2. Rebuild the Traefik file-provider routing config

File: /home/garfield/letsencrypt/manual/tls.yml

Final config includes routers and services for:

  • hermes.squaremcp.com
  • app.squaremcp.com
  • docs.squaremcp.com
  • squaremcp.com / www.squaremcp.com
  • tiktok.squaremcp.com
  • fetcherpay.com / www.fetcherpay.com
  • workflow.fetcherpay.com
  • mail.fetcherpay.com
  • git.fetcherpay.com

Path-specific rules that route to Hermes (104.190.60.129:3456):

  • /api/pilot-request on squaremcp.com / www.squaremcp.com
  • /auth/tiktok and /api/pilot-request on tiktok.squaremcp.com

Full final config:

http:
  routers:
    hermes:
      rule: "Host(`hermes.squaremcp.com`)"
      service: hermes
      entryPoints: [websecure]
      tls: { certResolver: letsencrypt }

    squaremcp-app:
      rule: "Host(`app.squaremcp.com`)"
      service: squaremcp-app
      entryPoints: [websecure]
      tls: {}

    squaremcp-docs:
      rule: "Host(`docs.squaremcp.com`)"
      service: squaremcp-docs
      entryPoints: [websecure]
      tls: {}

    squaremcp-site-main:
      rule: "Host(`squaremcp.com`) || Host(`www.squaremcp.com`)"
      service: squaremcp-site
      priority: 10
      entryPoints: [websecure]
      tls: {}

    squaremcp-site-pilot:
      rule: "(Host(`squaremcp.com`) || Host(`www.squaremcp.com`)) && PathPrefix(`/api/pilot-request`)"
      service: hermes
      priority: 30
      entryPoints: [websecure]
      tls: {}

    squaremcp-tiktok-main:
      rule: "Host(`tiktok.squaremcp.com`)"
      service: squaremcp-site
      priority: 10
      entryPoints: [websecure]
      tls: {}

    squaremcp-tiktok-auth:
      rule: "Host(`tiktok.squaremcp.com`) && PathPrefix(`/auth/tiktok`)"
      service: hermes
      priority: 30
      entryPoints: [websecure]
      tls: {}

    squaremcp-tiktok-pilot:
      rule: "Host(`tiktok.squaremcp.com`) && PathPrefix(`/api/pilot-request`)"
      service: hermes
      priority: 30
      entryPoints: [websecure]
      tls: {}

    fetcherpay-root:
      rule: "Host(`fetcherpay.com`) || Host(`www.fetcherpay.com`)"
      service: fetcherpay-web
      priority: 60
      entryPoints: [websecure]
      tls: {}

    workflow:
      rule: "Host(`workflow.fetcherpay.com`)"
      service: temporal-ui
      priority: 60
      entryPoints: [websecure]
      tls: {}

    mail:
      rule: "Host(`mail.fetcherpay.com`)"
      service: poste
      priority: 60
      entryPoints: [websecure]
      tls: {}

    git:
      rule: "Host(`git.fetcherpay.com`)"
      service: gitea
      priority: 60
      entryPoints: [websecure]
      tls: {}

  services:
    hermes:
      loadBalancer:
        servers: [{ url: "http://104.190.60.129:3456" }]
        passHostHeader: true
    squaremcp-app:
      loadBalancer:
        servers: [{ url: "http://10.152.183.164:80" }]
        passHostHeader: true
    squaremcp-docs:
      loadBalancer:
        servers: [{ url: "http://10.152.183.130:80" }]
        passHostHeader: true
    squaremcp-site:
      loadBalancer:
        servers: [{ url: "http://10.152.183.48:80" }]
        passHostHeader: true
    fetcherpay-web:
      loadBalancer:
        servers: [{ url: "http://172.20.0.9:80" }]
        passHostHeader: true
    temporal-ui:
      loadBalancer:
        servers: [{ url: "http://172.20.0.3:8080" }]
        passHostHeader: true
    poste:
      loadBalancer:
        servers: [{ url: "http://poste:80" }]
        passHostHeader: true
    gitea:
      loadBalancer:
        servers: [{ url: "http://gitea:3000" }]
        passHostHeader: true

tls:
  certificates:
    - certFile: /letsencrypt/manual/certs/squaremcp-app.crt
      keyFile:  /letsencrypt/manual/certs/squaremcp-app.key
    - certFile: /letsencrypt/manual/certs/squaremcp-docs.crt
      keyFile:  /letsencrypt/manual/certs/squaremcp-docs.key
    - certFile: /letsencrypt/manual/certs/squaremcp-site.crt
      keyFile:  /letsencrypt/manual/certs/squaremcp-site.key
    - certFile: /letsencrypt/manual/certs/fetcherpay-root.crt
      keyFile:  /letsencrypt/manual/certs/fetcherpay-root.key
    - certFile: /letsencrypt/manual/certs/mail-fetcherpay.crt
      keyFile:  /letsencrypt/manual/certs/mail-fetcherpay.key
    - certFile: /letsencrypt/manual/certs/git-fetcherpay.crt
      keyFile:  /letsencrypt/manual/certs/git-fetcherpay.key

3. Extract static TLS certificates from K8s cert-manager secrets

Because Traefiks GoDaddy DNS-01 resolver fails with DUPLICATE_RECORD for existing _acme-challenge.* TXT records, valid certificates were pulled from the K8s secrets that cert-manager already held.

mkdir -p /home/garfield/letsencrypt/manual/certs

# squaremcp-app
microk8s kubectl get secret squaremcp-app-tls -n fetcherpay -o jsonpath='{.data.tls\.crt}' | base64 -d > squaremcp-app.crt
microk8s kubectl get secret squaremcp-app-tls -n fetcherpay -o jsonpath='{.data.tls\.key}' | base64 -d > squaremcp-app.key

# squaremcp-docs
microk8s kubectl get secret squaremcp-docs-tls -n fetcherpay -o jsonpath='{.data.tls\.crt}' | base64 -d > squaremcp-docs.crt
microk8s kubectl get secret squaremcp-docs-tls -n fetcherpay -o jsonpath='{.data.tls\.key}' | base64 -d > squaremcp-docs.key

# squaremcp-site (covers squaremcp.com / www.squaremcp.com / tiktok.squaremcp.com)
microk8s kubectl get secret squaremcp-tls -n fetcherpay -o jsonpath='{.data.tls\.crt}' | base64 -d > squaremcp-site.crt
microk8s kubectl get secret squaremcp-tls -n fetcherpay -o jsonpath='{.data.tls\.key}' | base64 -d > squaremcp-site.key

# fetcherpay-root
microk8s kubectl get secret fetcherpay-root-tls -n fetcherpay -o jsonpath='{.data.tls\.crt}' | base64 -d > fetcherpay-root.crt
microk8s kubectl get secret fetcherpay-root-tls -n fetcherpay -o jsonpath='{.data.tls\.key}' | base64 -d > fetcherpay-root.key

# mail.fetcherpay.com
microk8s kubectl get secret mail-fetcherpay-tls -n email -o jsonpath='{.data.tls\.crt}' | base64 -d > mail-fetcherpay.crt
microk8s kubectl get secret mail-fetcherpay-tls -n email -o jsonpath='{.data.tls\.key}' | base64 -d > mail-fetcherpay.key

# git.fetcherpay.com
microk8s kubectl get secret fetcherpay-git-tls -n fetcherpay -o jsonpath='{.data.tls\.crt}' | base64 -d > git-fetcherpay.crt
microk8s kubectl get secret fetcherpay-git-tls -n fetcherpay -o jsonpath='{.data.tls\.key}' | base64 -d > git-fetcherpay.key

4. Start stopped backend containers

FetcherPay web

docker compose -p fetcherpay -f /home/garfield/docker-compose.fetcherpay.yml up -d fetcherpay-web

Poste (mail)

docker compose -p fetcherpay -f /home/garfield/Downloads/docker-compose.prod.yml up -d poste

Postgres + Gitea (git)

Gitea credentials were recovered from the existing Gitea config volume:

docker run --rm -v fetcherpay_gitea_data:/data alpine \
  sh -c 'cat /data/gitea/conf/app.ini | grep -E "^(NAME|USER|PASSWD|HOST|DB_TYPE)"'
# DB_TYPE = postgres
# HOST    = postgres:5432
# NAME    = gitea
# USER    = fetcherpay
# PASSWD  = fetcherpay_secure_2024

Then postgres and gitea were started with the required env vars:

cd /home/garfield/Downloads
export POSTGRES_USER=fetcherpay
export POSTGRES_PASSWORD=fetcherpay_secure_2024
export POSTGRES_DB=postgres
export GITEA_HOST=git.fetcherpay.com
export GITEA_DB=gitea
export MAIL_HOST=mail.fetcherpay.com
export WEB_HOST=fetcherpay.com
export API_HOST=api.fetcherpay.com
export PROM_HOST=prometheus.fetcherpay.com
export GRAFANA_HOST=grafana.fetcherpay.com
export ADMINER_HOST=adminer.fetcherpay.com
export TEMPORAL_HOST=workflow.fetcherpay.com
export REDIS_PASSWORD=redis_pass
export MYSQL_ROOT_PASSWORD=mysql_root
export MYSQL_DATABASE=fetcherpay
export MYSQL_USER=fetcherpay
export MYSQL_PASSWORD=mysql_pass
export GRAFANA_ADMIN_PASSWORD=admin
export ADMINER_USERS=admin:admin
export TRAEFIK_DASHBOARD_HOST=traefik.fetcherpay.com

docker compose -p fetcherpay -f docker-compose.prod.yml up -d postgres gitea

5. Fix workflow.fetcherpay.com

The Docker label on the temporal service pointed Traefik at port 7233 (gRPC), causing 502s. A file-provider router was added in tls.yml pointing workflow.fetcherpay.comtemporal-ui:8080.


6. Fix Gitea SSH port conflict

The host port 2222 was already in use by an unknown process and could not be freed. The Gitea SSH mapping was changed from 2222:22 to 22222:22.

File: /home/garfield/Downloads/docker-compose.prod.yml

gitea:
  ...
  ports:
    - "22222:22" # SSH (optional for git over SSH)

The gitea container was then recreated with the new mapping.


7. Restart Traefik after every config change

docker restart traefik

8. Verification results

Final public reachability check:

https://hermes.squaremcp.com/openapi-living-brief.json       -> 200 (cert=0)
https://app.squaremcp.com/                                   -> 200 (cert=0)
https://docs.squaremcp.com/                                  -> 200 (cert=0)
https://squaremcp.com/                                       -> 200 (cert=0)
https://www.squaremcp.com/                                   -> 200 (cert=0)
https://tiktok.squaremcp.com/                                -> 200 (cert=0)
https://tiktok.squaremcp.com/auth/tiktok/start               -> 302 (cert=0)
https://fetcherpay.com/                                      -> 200 (cert=0)
https://www.fetcherpay.com/                                  -> 200 (cert=0)
https://workflow.fetcherpay.com/                             -> 200 (cert=0)
https://mail.fetcherpay.com/                                 -> 302 (cert=0)
https://git.fetcherpay.com/                                  -> 200 (cert=0)

POST /api/pilot-request (tiktok)                             -> 201
POST /api/pilot-request (root/www)                           -> 201
GET /auth/tiktok/start                                       -> 302

cert:0 means TLS verification passed.


9. Push documentation to Gitea

After committing the runbooks to the local hermes-mcp repo, the push to git.fetcherpay.com failed because:

  1. The Gitea instance was in install mode (INSTALL_LOCK = false in the runtime /etc/gitea/app.ini).
  2. The configured image gitea/gitea:1.22-rootless had a DB migration version (299) older than the existing database (321), causing Gitea to exit on startup once install lock was enabled.

Changes made

File: /home/garfield/Downloads/docker-compose.prod.yml

  • Added environment variable to lock installation:
    environment:
      ...
      - GITEA__security__INSTALL_LOCK=true
    
  • Upgraded Gitea image:
    image: gitea/gitea:1.24.6-rootless
    
  • SSH host port already changed from 2222:22 to 22222:22 (see step 6).

Commands

# Recreate Gitea with the updated config/image
cd /home/garfield/Downloads
export ... # (same env vars as step 4)
docker compose -p fetcherpay -f docker-compose.prod.yml up -d gitea

# Create the hermes-mcp repository under the existing Gitea admin user
TOKEN=$(docker exec gitea gitea --config /data/gitea/conf/app.ini admin user generate-access-token \
  --username yuukiii --token-name deployment-push --raw --scopes write:user,write:repository,write:admin)
curl -X POST "https://git.fetcherpay.com/api/v1/user/repos" \
  -H "Authorization: token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"hermes-mcp","private":false,"description":"Hermes MCP / SquareMCP monorepo"}'

# Push the runbook commit and update the local remote
cd /home/garfield/hermes-mcp
git push https://yuukiii:${TOKEN}@git.fetcherpay.com/yuukiii/hermes-mcp.git main
git remote set-url origin https://git.fetcherpay.com/yuukiii/hermes-mcp.git

Result: main branch with the runbooks is now live at https://git.fetcherpay.com/yuukiii/hermes-mcp.


Notes / gotchas

  • /api/pilot-request is POST-only. A GET request returns 404, which is expected.
  • The /auth/tiktok routes are /auth/tiktok/start and /auth/tiktok/callback; the Traefik PathPrefix(/auth/tiktok) rule correctly forwards both.
  • Static certificate extraction required root access; Docker root containers were used when sudo began prompting for a password.
  • The local git remote was updated from the non-existent garfield/hermes-mcp path to yuukiii/hermes-mcp because the only existing Gitea admin user is yuukiii.