3.0 KiB
3.0 KiB
Active Issues and Remaining Debt — 2026-06-14
What is working now
All commercial domains verified reachable with valid TLS:
hermes.squaremcp.com/openapi-living-brief.jsonapp.squaremcp.comdocs.squaremcp.comsquaremcp.com/www.squaremcp.comtiktok.squaremcp.comfetcherpay.com/www.fetcherpay.comworkflow.fetcherpay.commail.fetcherpay.comgit.fetcherpay.com
Hermes path-specific routes verified:
POST /api/pilot-request→201onsquaremcp.com,www.squaremcp.com,tiktok.squaremcp.comGET /auth/tiktok/start→302ontiktok.squaremcp.com
Still down / not addressed
| Subdomain / Service | Why it is down | What would fix it |
|---|---|---|
api.fetcherpay.com |
fetcherpay-api container not running |
Start fetcherpay-api (needs env vars, Postgres, Redis) |
prometheus.fetcherpay.com |
Prometheus container not running | Start Prometheus from docker-compose.fetcherpay.yml |
grafana.fetcherpay.com |
Grafana container not running | Start Grafana from docker-compose.fetcherpay.yml |
adminer.fetcherpay.com |
Adminer container not running | Start Adminer from docker-compose.fetcherpay.yml |
traefik.fetcherpay.com |
Traefik dashboard is on :8080 but not routed through a public host label |
Add a secure router or restrict dashboard to localhost/VPN |
Architectural debt
-
K8s nginx-ingress is bypassed
- Traefik’s Docker iptables rules intercept all public HTTP/S traffic.
- The active nginx-ingress controller class is
public; manifests usenginx. - Long term: either reconcile
ingressClassNameor migrate the public edge to K8s.
-
Manual static certificate workaround
- Traefik cannot issue new certs via GoDaddy DNS-01 for several domains because of
DUPLICATE_RECORDTXT errors. - Certs are extracted from K8s cert-manager secrets and loaded statically.
- These must be manually rotated before expiry.
- Traefik cannot issue new certs via GoDaddy DNS-01 for several domains because of
-
No observability
- No synthetic uptime probes.
- No cert-expiry alerting.
- No Hermes
/metricsendpoint. - No Alertmanager / Slack alerts.
- No centralized logs.
-
Secret management
- Plaintext secrets in
hermes-k8s.yamland compose env vars. - No Sealed Secrets / External Secrets / Vault.
- Plaintext secrets in
-
Single point of failure
- One host, one residential IP, one edge proxy.
- No redundancy or failover.
-
Gitea SSH port
- Changed from
2222to22222due to an unknown process binding2222. - The original occupant of port
2222was never identified; a reboot would be needed to clear it.
- Changed from
Recommended next steps
See 2026-06-14-public-edge-outage-plan.md for the full phased plan. Priorities:
- Immediate: finalize RCA, runbook, and
scripts/verify-public-endpoints.sh. - This week: deploy blackbox exporter + cert-expiry alerts + container-up check.
- Next sprint: add Hermes
/metrics, Grafana dashboards, Alertmanager Slack routing. - Future: decide on K8s edge migration vs. reconciling ingress classes.