Files

Garfield 8d62e4d9d5 feat: multi-tenant credential isolation + architecture docs

- Add src/multitenancy/ with AES-256-GCM credential store, WhatsApp
  webhook router (phone_number_id -> customerId), and per-customer
  audit log (90-day Redis TTL)
- Add src/billing/ with plan definitions and meterMiddleware that
  resolves API key -> Customer object with getCredential() closure
- Refactor all src/clients/* to accept optional customer param,
  falling back to env vars for backward compat with single-user mode
- Thread customer through handleToolCall(name, args, customer?)
- Add customers table to MySQL schema initDatabase()
- Add /webhook/whatsapp (immediate 200 + async routing) and
  /api/connect/* onboarding endpoints to index.ts
- Add Redis 7 to docker-compose.yml; add REDIS_URL and
  CREDENTIAL_ENCRYPTION_KEY to hermes-k8s.yaml
- Add product/incubation/ with architecture write-up and PlantUML
  diagrams (system architecture + 5 user flows)
- Extend OpenAPI spec in manifest.ts with all platform endpoints

Verification: 3 isolation tests (credential, webhook routing, audit
log) passed against live Redis. Deployed to hermes.squaremcp.com.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-08 11:27:29 -04:00

8.0 KiB

Raw Blame History

hermes-mcp — Architecture

Version: post multi-tenancy (2026-05-08) Deployed: hermes.squaremcp.com (MicroK8s)

Overview

hermes-mcp is a TypeScript/Node.js MCP gateway that gives AI agents (Claude, ChatGPT, opencode) authenticated access to messaging and productivity platforms — WhatsApp, LinkedIn, Telegram, Discord, Instagram, Twitter, email, and Obsidian.

It was built first as a single-user prototype for the builder, then extended with multi-tenant credential isolation so multiple paying customers can connect their own platform accounts with zero data leakage between them.

Stack

Layer	Technology
Runtime	Node.js 20, TypeScript, ESM
MCP transport	`@modelcontextprotocol/sdk` — Streamable HTTP + SSE
HTTP server	Express 4
Database	MySQL 8 (`mysql2`) — OAuth clients, tokens, customers
Cache / credential store	Redis 7 (`redis` npm, v5)
Deployment	MicroK8s single-node, Traefik/nginx ingress, Let's Encrypt TLS

Directory structure

src/
├── index.ts                   Express server, MCP sessions, REST endpoints, OAuth
├── tools.ts                   Tool registry + handleToolCall(name, args, customer?)
├── db.ts                      MySQL pool init, schema migrations
├── oauth.ts                   OAuth 2.0 server (DCR, authorize, token)
├── imap.ts                    Multi-account IMAP email reader
├── smtp.ts                    Multi-account SMTP email sender
├── manifest.ts                OpenAPI + ChatGPT plugin manifest generation
│
├── clients/                   One file per platform
│   ├── whatsapp.ts            Meta Cloud API
│   ├── linkedin.ts            LinkedIn API v2
│   ├── telegram.ts            Telegram Bot API
│   ├── discord.ts             Discord API v10
│   ├── instagram.ts           Meta Graph API (Instagram Business)
│   ├── twitter.ts             Twitter API v2
│   └── obsidian.ts            Local filesystem vault
│
├── multitenancy/              Added 2026-05-08
│   ├── credential-store.ts    AES-256-GCM encrypted credentials in Redis
│   ├── webhook-router.ts      WhatsApp phone_number_id → customerId routing
│   └── audit-log.ts           Per-customer tool call audit trail (90-day TTL)
│
└── billing/                   Added 2026-05-08
    ├── plans.ts               Plan definitions (free/starter/growth/enterprise)
    └── middleware.ts          Customer resolution + meterMiddleware

Multi-tenancy design

Credential isolation

Each customer's platform tokens are stored encrypted in Redis under a namespaced key:

creds:{customerId}:{platform}

Encryption is AES-256-GCM with a 32-byte key from CREDENTIAL_ENCRYPTION_KEY (env var). IV and auth tag are prepended to the ciphertext as hex. The key must never be rotated without first re-encrypting all stored credentials.

Customer resolution

The meterMiddleware resolves an API key to a Customer object on every request:

Check Redis cache: customer:apikey:{apiKey} (60s TTL)
On miss: SELECT id, plan, active, email FROM customers WHERE api_key = ?
Attach getCredential() closure (not cacheable — functions can't be JSON serialized)
Write serialisable fields back to Redis cache

interface Customer {
  id: string;
  plan: PlanKey;
  active: boolean;
  email: string;
  getCredential: <T extends PlatformCredentials>(platform: Platform) => Promise<T | null>;
}

The credential loader is attached at resolution time, capturing id in a closure. Tool handlers call customer.getCredential('whatsapp') — they cannot accidentally use the wrong customer's ID.

Backward compatibility

All platform clients have customer as an optional second parameter. When absent (single-user mode via MCP_API_KEY), they fall back to env vars — the builder's existing setup is unchanged.

export async function sendMessage(args, customer?: Customer) {
  if (customer) {
    const creds = await customer.getCredential<WhatsAppCredentials>('whatsapp');
    if (!creds) throw new Error('WhatsApp not connected for this account');
    // use creds.phoneNumberId, creds.accessToken
  } else {
    // read WHATSAPP_DEFAULT_PHONE_NUMBER_ID etc from process.env
  }
}

WhatsApp webhook routing

Meta sends all inbound messages for all connected numbers to one webhook endpoint. The router uses a Redis lookup table populated at onboarding:

wa_phone_id:{phoneNumberId} → customerId

The webhook endpoint acknowledges immediately (within Meta's 20-second SLA) and routes asynchronously:

app.post('/webhook/whatsapp', express.json(), async (req, res) => {
  res.status(200).send('EVENT_RECEIVED');   // sync — never blocked by routing
  try {
    const events = await routeWhatsAppWebhook(req.body);
    for (const event of events) await handleInboundWhatsAppMessage(event);
  } catch (err) { console.error(err); }
});

Audit log

Every tool call (when a Customer is present) is logged to Redis:

log_seq:{customerId}:{date}          INCR counter
logs:{customerId}:{date}:{seq}       JSON entry, EX 7776000 (90 days)

The sequence key ensures chronological ordering without ULIDs. No cross-customer query path exists — all retrieval functions require customerId as the first argument.

Redis key namespace summary

Key	Value	TTL
`creds:{customerId}:{platform}`	AES-256-GCM encrypted JSON	none (permanent until revoked)
`wa_phone_id:{phoneNumberId}`	customerId string	none
`customer:apikey:{apiKey}`	JSON (id, plan, active, email)	60s
`log_seq:{customerId}:{date}`	integer counter	95 days
`logs:{customerId}:{date}:{seq}`	JSON AuditEntry	90 days

Request paths

MCP tool calls (existing single-user)

Claude.ai → POST /mcp → requireAuth(MCP_API_KEY) → handleToolCall(name, args)
                                                     → client(args, undefined)
                                                     → env vars → Platform API

Multi-tenant REST tool calls

Customer → POST /api/whatsapp/send → requireAuth → handleToolCall(name, args)
                                                    → client(args, undefined)
                                                    → env vars

(REST endpoints do not yet thread customer — future work)

Multi-tenant onboarding

Customer → POST /api/connect/whatsapp → meterMiddleware → storeCredential()
                                                         → registerWhatsAppNumber()

Inbound WhatsApp webhook

Meta → POST /webhook/whatsapp → 200 immediately
                              → routeWhatsAppWebhook(body)
                              → Redis lookup phone_number_id → customerId
                              → getCredential(customerId, 'whatsapp')
                              → handleInboundWhatsAppMessage(event)

PlantUML diagrams

architecture-system.puml — component and dependency diagram
architecture-userflows.puml — sequence diagrams for all 5 flows

Render with plantuml.com, the PlantUML VS Code extension, or plantuml -tsvg architecture-system.puml.

What's not yet done

Item	Notes
Customer provisioning	`customers` table exists but needs an INSERT path (Stripe webhook → seed row)
MCP session → Customer	MCP calls don't resolve customers; sessions still use single-user env vars
Email multi-tenancy	`imap.ts` / `smtp.ts` use Account enum; `customer.getCredential('email')` not wired
Usage metering	`meter.ts` not implemented; plan limits not enforced
Obsidian per-customer vault	Currently one global vault path from env
Key rotation tooling	Script to re-encrypt all `creds:*` keys under a new `CREDENTIAL_ENCRYPTION_KEY`

8.0 KiB Raw Blame History