feat: multi-tenant credential isolation + architecture docs
- Add src/multitenancy/ with AES-256-GCM credential store, WhatsApp webhook router (phone_number_id -> customerId), and per-customer audit log (90-day Redis TTL) - Add src/billing/ with plan definitions and meterMiddleware that resolves API key -> Customer object with getCredential() closure - Refactor all src/clients/* to accept optional customer param, falling back to env vars for backward compat with single-user mode - Thread customer through handleToolCall(name, args, customer?) - Add customers table to MySQL schema initDatabase() - Add /webhook/whatsapp (immediate 200 + async routing) and /api/connect/* onboarding endpoints to index.ts - Add Redis 7 to docker-compose.yml; add REDIS_URL and CREDENTIAL_ENCRYPTION_KEY to hermes-k8s.yaml - Add product/incubation/ with architecture write-up and PlantUML diagrams (system architecture + 5 user flows) - Extend OpenAPI spec in manifest.ts with all platform endpoints Verification: 3 isolation tests (credential, webhook routing, audit log) passed against live Redis. Deployed to hermes.squaremcp.com. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
208
product/incubation/ARCHITECTURE.md
Normal file
208
product/incubation/ARCHITECTURE.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# hermes-mcp — Architecture
|
||||
|
||||
**Version:** post multi-tenancy (2026-05-08)
|
||||
**Deployed:** hermes.squaremcp.com (MicroK8s)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
hermes-mcp is a TypeScript/Node.js MCP gateway that gives AI agents (Claude, ChatGPT, opencode) authenticated access to messaging and productivity platforms — WhatsApp, LinkedIn, Telegram, Discord, Instagram, Twitter, email, and Obsidian.
|
||||
|
||||
It was built first as a single-user prototype for the builder, then extended with multi-tenant credential isolation so multiple paying customers can connect their own platform accounts with zero data leakage between them.
|
||||
|
||||
---
|
||||
|
||||
## Stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|-----------|
|
||||
| Runtime | Node.js 20, TypeScript, ESM |
|
||||
| MCP transport | `@modelcontextprotocol/sdk` — Streamable HTTP + SSE |
|
||||
| HTTP server | Express 4 |
|
||||
| Database | MySQL 8 (`mysql2`) — OAuth clients, tokens, customers |
|
||||
| Cache / credential store | Redis 7 (`redis` npm, v5) |
|
||||
| Deployment | MicroK8s single-node, Traefik/nginx ingress, Let's Encrypt TLS |
|
||||
|
||||
---
|
||||
|
||||
## Directory structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── index.ts Express server, MCP sessions, REST endpoints, OAuth
|
||||
├── tools.ts Tool registry + handleToolCall(name, args, customer?)
|
||||
├── db.ts MySQL pool init, schema migrations
|
||||
├── oauth.ts OAuth 2.0 server (DCR, authorize, token)
|
||||
├── imap.ts Multi-account IMAP email reader
|
||||
├── smtp.ts Multi-account SMTP email sender
|
||||
├── manifest.ts OpenAPI + ChatGPT plugin manifest generation
|
||||
│
|
||||
├── clients/ One file per platform
|
||||
│ ├── whatsapp.ts Meta Cloud API
|
||||
│ ├── linkedin.ts LinkedIn API v2
|
||||
│ ├── telegram.ts Telegram Bot API
|
||||
│ ├── discord.ts Discord API v10
|
||||
│ ├── instagram.ts Meta Graph API (Instagram Business)
|
||||
│ ├── twitter.ts Twitter API v2
|
||||
│ └── obsidian.ts Local filesystem vault
|
||||
│
|
||||
├── multitenancy/ Added 2026-05-08
|
||||
│ ├── credential-store.ts AES-256-GCM encrypted credentials in Redis
|
||||
│ ├── webhook-router.ts WhatsApp phone_number_id → customerId routing
|
||||
│ └── audit-log.ts Per-customer tool call audit trail (90-day TTL)
|
||||
│
|
||||
└── billing/ Added 2026-05-08
|
||||
├── plans.ts Plan definitions (free/starter/growth/enterprise)
|
||||
└── middleware.ts Customer resolution + meterMiddleware
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Multi-tenancy design
|
||||
|
||||
### Credential isolation
|
||||
|
||||
Each customer's platform tokens are stored encrypted in Redis under a namespaced key:
|
||||
|
||||
```
|
||||
creds:{customerId}:{platform}
|
||||
```
|
||||
|
||||
Encryption is AES-256-GCM with a 32-byte key from `CREDENTIAL_ENCRYPTION_KEY` (env var). IV and auth tag are prepended to the ciphertext as hex. **The key must never be rotated without first re-encrypting all stored credentials.**
|
||||
|
||||
### Customer resolution
|
||||
|
||||
The `meterMiddleware` resolves an API key to a `Customer` object on every request:
|
||||
|
||||
1. Check Redis cache: `customer:apikey:{apiKey}` (60s TTL)
|
||||
2. On miss: `SELECT id, plan, active, email FROM customers WHERE api_key = ?`
|
||||
3. Attach `getCredential()` closure (not cacheable — functions can't be JSON serialized)
|
||||
4. Write serialisable fields back to Redis cache
|
||||
|
||||
```typescript
|
||||
interface Customer {
|
||||
id: string;
|
||||
plan: PlanKey;
|
||||
active: boolean;
|
||||
email: string;
|
||||
getCredential: <T extends PlatformCredentials>(platform: Platform) => Promise<T | null>;
|
||||
}
|
||||
```
|
||||
|
||||
The credential loader is attached at resolution time, capturing `id` in a closure. Tool handlers call `customer.getCredential('whatsapp')` — they cannot accidentally use the wrong customer's ID.
|
||||
|
||||
### Backward compatibility
|
||||
|
||||
All platform clients have `customer` as an optional second parameter. When absent (single-user mode via `MCP_API_KEY`), they fall back to env vars — the builder's existing setup is unchanged.
|
||||
|
||||
```typescript
|
||||
export async function sendMessage(args, customer?: Customer) {
|
||||
if (customer) {
|
||||
const creds = await customer.getCredential<WhatsAppCredentials>('whatsapp');
|
||||
if (!creds) throw new Error('WhatsApp not connected for this account');
|
||||
// use creds.phoneNumberId, creds.accessToken
|
||||
} else {
|
||||
// read WHATSAPP_DEFAULT_PHONE_NUMBER_ID etc from process.env
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### WhatsApp webhook routing
|
||||
|
||||
Meta sends all inbound messages for all connected numbers to one webhook endpoint. The router uses a Redis lookup table populated at onboarding:
|
||||
|
||||
```
|
||||
wa_phone_id:{phoneNumberId} → customerId
|
||||
```
|
||||
|
||||
The webhook endpoint acknowledges immediately (within Meta's 20-second SLA) and routes asynchronously:
|
||||
|
||||
```typescript
|
||||
app.post('/webhook/whatsapp', express.json(), async (req, res) => {
|
||||
res.status(200).send('EVENT_RECEIVED'); // sync — never blocked by routing
|
||||
try {
|
||||
const events = await routeWhatsAppWebhook(req.body);
|
||||
for (const event of events) await handleInboundWhatsAppMessage(event);
|
||||
} catch (err) { console.error(err); }
|
||||
});
|
||||
```
|
||||
|
||||
### Audit log
|
||||
|
||||
Every tool call (when a `Customer` is present) is logged to Redis:
|
||||
|
||||
```
|
||||
log_seq:{customerId}:{date} INCR counter
|
||||
logs:{customerId}:{date}:{seq} JSON entry, EX 7776000 (90 days)
|
||||
```
|
||||
|
||||
The sequence key ensures chronological ordering without ULIDs. No cross-customer query path exists — all retrieval functions require `customerId` as the first argument.
|
||||
|
||||
---
|
||||
|
||||
## Redis key namespace summary
|
||||
|
||||
| Key | Value | TTL |
|
||||
|-----|-------|-----|
|
||||
| `creds:{customerId}:{platform}` | AES-256-GCM encrypted JSON | none (permanent until revoked) |
|
||||
| `wa_phone_id:{phoneNumberId}` | customerId string | none |
|
||||
| `customer:apikey:{apiKey}` | JSON (id, plan, active, email) | 60s |
|
||||
| `log_seq:{customerId}:{date}` | integer counter | 95 days |
|
||||
| `logs:{customerId}:{date}:{seq}` | JSON AuditEntry | 90 days |
|
||||
|
||||
---
|
||||
|
||||
## Request paths
|
||||
|
||||
### MCP tool calls (existing single-user)
|
||||
```
|
||||
Claude.ai → POST /mcp → requireAuth(MCP_API_KEY) → handleToolCall(name, args)
|
||||
→ client(args, undefined)
|
||||
→ env vars → Platform API
|
||||
```
|
||||
|
||||
### Multi-tenant REST tool calls
|
||||
```
|
||||
Customer → POST /api/whatsapp/send → requireAuth → handleToolCall(name, args)
|
||||
→ client(args, undefined)
|
||||
→ env vars
|
||||
```
|
||||
*(REST endpoints do not yet thread customer — future work)*
|
||||
|
||||
### Multi-tenant onboarding
|
||||
```
|
||||
Customer → POST /api/connect/whatsapp → meterMiddleware → storeCredential()
|
||||
→ registerWhatsAppNumber()
|
||||
```
|
||||
|
||||
### Inbound WhatsApp webhook
|
||||
```
|
||||
Meta → POST /webhook/whatsapp → 200 immediately
|
||||
→ routeWhatsAppWebhook(body)
|
||||
→ Redis lookup phone_number_id → customerId
|
||||
→ getCredential(customerId, 'whatsapp')
|
||||
→ handleInboundWhatsAppMessage(event)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PlantUML diagrams
|
||||
|
||||
- [`architecture-system.puml`](./architecture-system.puml) — component and dependency diagram
|
||||
- [`architecture-userflows.puml`](./architecture-userflows.puml) — sequence diagrams for all 5 flows
|
||||
|
||||
Render with [plantuml.com](https://www.plantuml.com/plantuml/uml/), the PlantUML VS Code extension, or `plantuml -tsvg architecture-system.puml`.
|
||||
|
||||
---
|
||||
|
||||
## What's not yet done
|
||||
|
||||
| Item | Notes |
|
||||
|------|-------|
|
||||
| Customer provisioning | `customers` table exists but needs an INSERT path (Stripe webhook → seed row) |
|
||||
| MCP session → Customer | MCP calls don't resolve customers; sessions still use single-user env vars |
|
||||
| Email multi-tenancy | `imap.ts` / `smtp.ts` use Account enum; `customer.getCredential('email')` not wired |
|
||||
| Usage metering | `meter.ts` not implemented; plan limits not enforced |
|
||||
| Obsidian per-customer vault | Currently one global vault path from env |
|
||||
| Key rotation tooling | Script to re-encrypt all `creds:*` keys under a new `CREDENTIAL_ENCRYPTION_KEY` |
|
||||
Reference in New Issue
Block a user