# Infrastructure Findings — SquareMCP / FetcherPay This document captures the as-built architecture, ingress behavior, monitoring state, and Hermes route table discovered during the 2026-06-14 outage response. --- ## 1. High-level architecture The single production server (`104.190.60.129`) hosts two separate ingress layers: | Ingress Layer | Technology | Serves | |---|---|---| | **Docker edge proxy** | Traefik v3 | `*.fetcherpay.com` Docker Compose stacks, plus static file-provider routes for `*.squaremcp.com` | | **Kubernetes ingress** | nginx-ingress-microk8s + cert-manager | `*.squaremcp.com` K8s workloads (currently bypassed by Traefik) | Both layers use Let’s Encrypt TLS. Public ports `80`/`443` are bound by the Docker Traefik container, so its `iptables` rules win over host-network K8s services. --- ## 2. Traefik configuration ### Static config **File:** `/home/garfield/traefik.yml` - Dashboard enabled on `:8080` with `insecure: true`. - Entrypoints: `web` (HTTP → HTTPS redirect) and `websecure` (HTTPS, `:443`). - Providers: Docker (socket) + file provider (`/letsencrypt/manual/tls.yml`, `watch: true`). - Certificate resolver: `letsencrypt` via GoDaddy DNS-01. ### Compose **File:** `/home/garfield/traefik-compose.yml` - Networks: `hermes-net`, `obsidian-net`, `fetcherpay` (all external). - Volumes: Docker socket, static config, `letsencrypt` directory. ### Dynamic routing **File:** `/home/garfield/letsencrypt/manual/tls.yml` Final state after the fix has file-provider routers for all commercial domains and path-specific rules that send `/api/pilot-request` and `/auth/tiktok` to Hermes. --- ## 3. Kubernetes ingress mismatch - **Controller class:** `public` - **Ingress class used by manifests:** `nginx` This means the active controller ignores most Ingress resources. Even if Traefik were removed, those Ingresses would not be served until the class is reconciled. Affected manifests include: - `hermes-mcp/hermes-k8s.yaml` - `hermes-mcp/product/app/app-k8s.yaml` - `hermes-mcp/docs/docs-k8s.yaml` - `hermes-mcp/product/site/squaremcp-k8s-ingress.yaml` --- ## 4. Hermes MCP route table **File:** `hermes-mcp/src/index.ts` ### Public / commercial endpoints | Method | Path | Notes | |---|---|---| | `GET` | `/` | Static files from `../product` | | `GET` | `/openapi-living-brief.json` | Obsidian-only OpenAPI spec for ChatGPT | | `GET` | `/openapi.json` | Full OpenAPI spec | | `GET` | `/auth/tiktok/start` | Redirect to TikTok Login Kit | | `GET` | `/auth/tiktok/callback` | TikTok OAuth callback | | `POST` | `/api/pilot-request` | Public form submission; origin-gated | | `GET` | `/health` | Liveness/readiness probe | ### OAuth / MCP discovery | Method | Path | |---|---| | `POST` | `/oauth/register` | | `GET` / `POST` | `/oauth/authorize` | | `POST` | `/oauth/token` | | `GET` | `/.well-known/oauth-authorization-server` | | `GET` | `/.well-known/openid-configuration` | | `GET` / `POST` / `DELETE` | `/mcp` | | `GET` | `/sse` | | `POST` | `/messages` | | `GET` | `/tools` | ### Capability-guarded tool API All `/api/*` tool routes require auth + capability grant: | Capability | Example endpoints | |---|---| | `obsidian` | `/api/obsidian/search`, `/api/obsidian/note`, `/api/obsidian/note/append`, `/api/obsidian/sync` | | `email` | `/api/email/profile`, `/api/email/search`, `/api/email/read`, `/api/email/send` | | `whatsapp` | `/api/whatsapp/send`, `/api/whatsapp/templates` | | `linkedin` | `/api/linkedin/profile`, `/api/linkedin/post`, `/api/linkedin/message` | | `telegram` | `/api/telegram/me`, `/api/telegram/message`, `/api/telegram/updates` | | `discord` | `/api/discord/me`, `/api/discord/guilds`, `/api/discord/message` | | `instagram` | `/api/instagram/profile`, `/api/instagram/media`, `/api/instagram/post` | | `twitter` | `/api/twitter/search`, `/api/twitter/tweets`, `/api/twitter/tweet` | | `facebook` | `/api/facebook/page`, `/api/facebook/posts`, `/api/facebook/post` | | `tiktok` | `/api/tiktok/profile`, `/api/tiktok/video`, `/api/tiktok/video/status` | ### Health endpoint ```typescript app.get('/health', (_req, res) => { res.json({ status: 'ok', service: 'hermes-mcp', toolCount, transports, endpoints, }); }); ``` Used by both K8s readiness and liveness probes in `hermes-k8s.yaml`. --- ## 5. Monitoring gaps ### Prometheus / Grafana - Prometheus and Grafana containers exist in `docker-compose.fetcherpay.yml`. - Prometheus scrapes itself, `fetcherpay-api:3000`, and Docker metrics at `172.20.0.1:9323`. - **Hermes MCP is not scraped** and has no `/metrics` endpoint. - No Alertmanager, no alert rules. ### Health checks - Hermes has `/health` but no `/ready` or `/livez` separation. - Docker health checks exist for Postgres, MySQL, Redis, Gitea, and FetcherPay API, but **not for Hermes**. ### Uptime / synthetic probes - No blackbox exporter. - No external uptime monitoring (Pingdom, UptimeRobot, Grafana Cloud, etc.). - No cert-expiry alerting. - No K8s ingress reconciliation check. ### Logs - No centralized log aggregation (Loki, Vector, Fluentd). --- ## 6. Secret management - `hermes-k8s.yaml` is gitignored and contains plaintext secrets (email, DB, OAuth, API keys). - Docker Compose stacks rely on exported env vars or `.env` files. - No Sealed Secrets, External Secrets Operator, or Vault in use. --- ## 7. Notable risks 1. **Single point of failure:** one residential IP, one host, one edge proxy. 2. **Split edge:** two ingress controllers with conflicting class configuration. 3. **Manual certificate workaround:** static K8s-extracted certs in Traefik must be manually rotated before expiry. 4. **No observability:** no metrics, alerting, or synthetic probes for the commercial domains. 5. **Stopped services not detected:** Docker restart policies only help if containers were initially started.