1.6 KiB
1.6 KiB
2026-06-14 Public Edge Outage — Vault Index
All documentation for the outage, its root cause, the fix, and the follow-up plan lives in this SquareMCP vault folder.
Files
| File | Purpose |
|---|---|
2026-06-14-public-edge-outage-rca.md |
Root cause analysis and incident timeline. |
2026-06-14-outage-fix-log.md |
Step-by-step record of every config change, command, and verification result. |
2026-06-14-infrastructure-findings.md |
As-built architecture, Traefik/K8s behavior, Hermes route table, and monitoring gaps. |
2026-06-14-active-issues-and-debt.md |
What is still down, remaining technical debt, and recommended next steps. |
2026-06-14-public-edge-outage-plan.md |
Proposed runbook, monitoring, probes, and alerting plan (Phase 1–4). |
2026-06-14-outage-index.md |
This file. |
Quick status
- ✅ All listed
squaremcp.comdomains reachable with valid TLS. - ✅ All listed
fetcherpay.comdomains reachable with valid TLS. - ✅ Hermes path routes (
/api/pilot-request,/auth/tiktok) verified. - ⚠️ K8s nginx-ingress remains bypassed by Traefik.
- ⚠️ Several FetcherPay services still stopped (
api, Prometheus, Grafana, Adminer). - ⚠️ No automated monitoring or alerting yet.
Reference paths on disk
- Traefik compose:
/home/garfield/traefik-compose.yml - Traefik static config:
/home/garfield/traefik.yml - Traefik dynamic config:
/home/garfield/letsencrypt/manual/tls.yml - Static certs:
/home/garfield/letsencrypt/manual/certs/ - FetcherPay prod compose:
/home/garfield/Downloads/docker-compose.prod.yml - Hermes K8s manifest:
/home/garfield/hermes-mcp/hermes-k8s.yaml