Runbooks¶
Condensed execution guides. Each section links to the full source document in ftl-docs/guides/ for complete step-by-step detail.
CFD margin trading cutover¶
Source: ftl-docs/guides/CFD-CUTOVER.md
Estimated window: 30 minutes (15 min pre-flight, 10 min cutover, 5 min smoke)
When to run: during a no-match window, after staging has held a green end-to-end test for at least 48 hours.
This runbook switches production from the legacy buy-and-hold AMM to the CFD margin trading model.
Pre-flight¶
-
Confirm staging CI is green on
release/stagingfor all three repos (ftl-backend,ftl-frontend,ftl-docs):Both must reportgh run list --repo JetaFutures/ftl-backend --branch release/staging --limit 1 --json status,conclusion gh run list --repo JetaFutures/ftl-frontend --branch release/staging --limit 1 --json status,conclusionstatus:completed, conclusion:success. -
Confirm the staging migrate job ran at the same SHA as
release/staging: -
Run a 5-minute k6 load test on staging:
Acceptance: zero 5xx, P99 < 250 ms onPOST /api/positions/open. -
Take a manual PITR backup of prod Postgres:
Record the backup ID — the rollback plan needs it.
The flip¶
-
Optionally block user trades during cutover (return 503 for protected routes):
-
Drain legacy share positions — run the migration script in dry-run mode first, then live:
cd ftl-backend go run ./scripts/migrate-legacy-positions.go \ -db "$(AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/print-db-url.sh prd)" \ -dry-run # Review output. If it looks correct: go run ./scripts/migrate-legacy-positions.go \ -db "$(AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/print-db-url.sh prd)" -
Promote branches into
release/prod(fast-forward only): -
Approve the GitHub Environment gate —
ftl-backendfirst. Wait for/healthsmoke in CI to go green before approvingftl-frontend. -
Confirm Single revision mode before and after deploy:
Smoke tests¶
Run from a dev machine against prod with a QA account cookie jar:
# Open a position
curl -X POST https://api.ftljeta.cloud/api/positions/open \
-H 'Content-Type: application/json' \
-H "Idempotency-Key: $(uuidgen)" \
-b "$COOKIE_JAR" \
-d '{"instrumentId":"<known-id>","direction":"long","lotSize":0.01}'
# List open positions
curl -b "$COOKIE_JAR" https://api.ftljeta.cloud/api/positions?status=open
# Close the position
curl -X POST -b "$COOKIE_JAR" -H 'Content-Type: application/json' \
-d "{\"clientRequestId\":\"smoke-$(date +%s)\"}" \
https://api.ftljeta.cloud/api/positions/<id>/close
Acceptance: all requests return 2xx; closedBy on the closed position is user; balance after close equals balance before open plus realised PnL.
Re-open¶
Rollback¶
If smoke fails or a critical bug surfaces within the first hour:
# Revert all three Container Apps to the previous revision
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-api -1
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-ws -1
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-flusher -1
Revert the frontend bundle:
Migrations 000024–000026 are additive (new tables, new columns with defaults). The reverted code ignores the new tables — no schema rollback needed unless a data-corruption bug is found. If that happens, restore from the PITR backup taken in pre-flight step 4:
AZURE_CONFIG_DIR=~/.azure-ftl az postgres flexible-server restore \
--restore-time "<timestamp from pre-flight backup>" \
--source-server ftl-prd-pg --name ftl-prd-pg-rolled-back
Write a post-mortem in ftl-docs/reports/cfd-cutover-rollback-<date>.md within 48 hours.
Path 2 production deployment¶
Source: ftl-docs/guides/22-prod-path2-cutover.md and ftl-docs/guides/23-path2-deployment-runbook.md
Decision: split-domain topology (Path 2) chosen because the registrar for jetafutures.com cannot add NS records to redirect a subdomain to Azure. Path 1 (single domain, Azure DNS delegation) is blocked by the registrar constraint.
Final topology¶
flowchart TB
browser --> spa["ftl.jetafutures.com\n(Azure SWA + Cloudflare)"]
spa -- fetch/wss --> api["api.ftljeta.cloud\n(Azure Container Apps + Cloudflare proxy)"]
spa -- wss --> ws["ws.ftljeta.cloud\n(Azure Container Apps + Cloudflare DNS-only)"]
api --> data["Postgres Flex · Redis · Key Vault · ACR"]
ws --> data
| Component | Domain | Provider |
|---|---|---|
| Frontend SPA | ftl.jetafutures.com |
Azure SWA Standard (eastasia), Cloudflare-proxied |
| API | api.ftljeta.cloud |
Azure Container Apps (centralindia), Cloudflare proxy (orange) |
| WS | ws.ftljeta.cloud |
Azure Container Apps (centralindia), Cloudflare DNS-only (grey, permanent) |
Cookie strategy: Domain=.ftljeta.cloud, SameSite=None; Secure; Partitioned, HttpOnly. WS auth uses a JWT in the query string — the cookie is not relevant for WebSocket upgrades.
Required one-time code changes¶
These are already merged if the staging dry-run was completed. Verify with git log.
ftl-backend/internal/auth/cookies.go — uses SameSite=None; Secure; Partitioned when cfg.Secure == true (production/staging), and SameSite=Lax in local dev (HTTP). Both cookie functions (SetAccessCookie, SetRefreshCookie) and ClearAuthCookies follow the same buildCookie helper, so the attribute is applied consistently.
ftl-backend/cmd/api-server/main.go — reads CORS_ORIGINS env var (comma-separated list of allowed origins); falls back to http://localhost:5173,http://localhost:8080 when unset.
ftl-backend/cmd/ws-server/main.go — builds the originAllow map from CORS_ORIGINS env var; falls back to http://localhost:5173 when unset.
ftl-frontend/src/lib/api.ts — selects the API base URL at build time: staging build (MODE=staging) uses https://api.ftljeta.cloud/api; production build uses https://prd-api.ftljeta.cloud/api; local dev falls back to /api.
ftl-frontend/src/hooks/useWebSocket.ts — selects the WS base URL at build time: staging build uses wss://ws.ftljeta.cloud; production build uses wss://prd-ws.ftljeta.cloud; local dev derives the URL from window.location.host.
ftl-frontend/public/staticwebapp.config.json — has navigationFallback rule so deep-link reloads (/stadium, /leaderboard, etc.) serve index.html.
Execution order¶
-
Provision prod backend infra following
ftl-docs/guides/22-prod-path2-cutover.md§2.3. Separate resource groupftl-prd-rg-cin. Use prod SKUs (Postgres production tier, Redis Standard C1 minimum, ws min=3, api min=2, flusher min=1). -
Capture backend FQDNs after deploy:
API_FQDN=$(AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \ -g ftl-prd-rg-cin -n ftl-prd-api \ --query properties.configuration.ingress.fqdn -o tsv) WS_FQDN=$(AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \ -g ftl-prd-rg-cin -n ftl-prd-ws \ --query properties.configuration.ingress.fqdn -o tsv) -
Add Cloudflare DNS records on
ftljeta.cloud— all DNS-only (grey) initially:
| Type | Name | Target | Proxy |
|---|---|---|---|
| CNAME | api |
$API_FQDN |
DNS-only (grey) initially |
| CNAME | ws |
$WS_FQDN |
DNS-only (grey) permanently |
| TXT | asuid.api |
CAE customDomainVerificationId |
N/A |
| TXT | asuid.ws |
Same value | N/A |
Get the verification ID:
AZURE_CONFIG_DIR=~/.azure-ftl az containerapp env show \
-g ftl-prd-rg-cin -n ftl-prd-cae-cin \
--query 'properties.customDomainConfiguration.customDomainVerificationId' -o tsv
-
Bind custom hostnames on the Container Apps (issues TLS certs):
EachAZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname add \ -g ftl-prd-rg-cin -n ftl-prd-api --hostname api.ftljeta.cloud AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname bind \ -g ftl-prd-rg-cin -n ftl-prd-api --hostname api.ftljeta.cloud \ --environment ftl-prd-cae-cin --validation-method CNAME AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname add \ -g ftl-prd-rg-cin -n ftl-prd-ws --hostname ws.ftljeta.cloud AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname bind \ -g ftl-prd-rg-cin -n ftl-prd-ws --hostname ws.ftljeta.cloud \ --environment ftl-prd-cae-cin --validation-method CNAMEbindcall takes up to 20 minutes. Output showsbindingType: SniEnabledon success. -
After cert succeeds on
api.ftljeta.cloud: flip it to Proxied (orange) in Cloudflare. Leavews.ftljeta.cloudgrey — Cloudflare Free does not reliably sustain WebSocket connections. -
Set prod env vars on
ftl-prd-api: -
Build and deploy the SPA with prod values:
Note: The staging vs production API/WS URLs are baked in at build time vianpm run build SWA_TOKEN=$(AZURE_CONFIG_DIR=~/.azure-ftl az staticwebapp secrets list \ -g ftl-prd-rg-cin -n ftl-prd-frontend \ --query properties.apiKey -o tsv) npx -y @azure/static-web-apps-cli deploy ./dist \ --deployment-token "$SWA_TOKEN" --env productionimport.meta.env.MODEandimport.meta.env.PROD. For a production build usenpm run build(MODE=production). For a staging build usenpm run build:staging(MODE=staging). -
Hand the
Wait for propagation (usually under 1 hour). Azure issues the managed cert automatically once the CNAME resolves.jetafutures.comregistrar owner one record: -
Update Google OAuth prod client in GCP Console:
- Authorized JS origin:
https://ftl.jetafutures.com - Authorized redirect URI:
https://api.ftljeta.cloud/api/auth/google/callback
Pre-traffic verification¶
# Health checks
curl -fsS https://api.ftljeta.cloud/health
curl -fsS https://ws.ftljeta.cloud/health
# CORS preflight
curl -sSI -X OPTIONS \
-H "Origin: https://ftl.jetafutures.com" \
-H "Access-Control-Request-Method: GET" \
https://api.ftljeta.cloud/api/me \
| grep -iE "access-control|HTTP"
# Expect: 204, access-control-allow-origin: https://ftl.jetafutures.com,
# access-control-allow-credentials: true
# SPA deep-link fallback
curl -fsS -o /dev/null -w "%{http_code}\n" https://ftl.jetafutures.com/stadium
# Expect: 200
Also run cross-browser smoke: Chrome, Firefox, Safari (macOS and iOS), and Brave with default shields. Safari is the critical test — cross-root ITP blocks cookies on reload if SameSite=None is not properly set. If Safari fails: implement a Storage Access API call after login before shipping.
Rollback¶
- Frontend: redeploy the previous
dist/viaswa deploy(~2 min). - Backend:
az containerapp revision activate -n <previous-revision-name>for api, ws, and flusher (~30s each). - DNS: ask the
jetafutures.comowner to remove the CNAME. Cloudflare records onftljeta.cloudare fully under your control. - OAuth: changes to the prod OAuth client are additive — no rollback needed; the staging client is untouched.
Local dev common issues¶
Source: ftl-docs/guides/LOCAL-DEV-GOTCHAS.md
schema_migrations.version drifted¶
Symptom: admin_settings, feature_flags, or other later-migration tables exist in Postgres, but SELECT version FROM schema_migrations; reports a version several migrations behind the highest migrations/NNNNN_*.up.sql on disk.
Cause: earlier migrations were applied via a manual psql path or a golang-migrate run was interrupted. The version counter never moved.
Fix:
- For each migration from
version+1upward, open theup.sqland confirm the change is already live (e.g.\d admin_settingsfor migration 21). - Apply any genuinely missing migrations as plain SQL inside a transaction.
- Bump the version row:
- Verify with the next planned migrate run.
If everything from version+1 onward is already live, just bump the row. The 2026-05-11 regression sweep did exactly this: bumped from 19 to 22 because migrations 20 and 21 were already in place.
Wallet balance stale after direct Postgres write¶
Symptom: UPDATE wallets SET balance = ... in psql succeeds, but /api/wallet still returns the old amount.
Cause: wallet.Service.GetByUserID is read-through with Redis as the authority. The Redis hash wallet:<user_id> field balance wins whenever it exists. See internal/wallet/service.go GetByUserID.
Fix: also write Redis:
Or restartftl-api-server to drop the hash and let the next read fall through to Postgres.
Replay prices don't move in the UI¶
Symptom: go run scripts/local-replay/ runs without error but prices don't move. Redis pub/sub publishes but no instrument hash exists at the expected key.
Cause: replay scripts target sportmonks_id 1000–1049. If those instruments are missing from Postgres, the WS server drops the broadcast.
Fix:
docker exec ftl-postgres psql -U ftl_admin -d ftl2026 \
-c "SELECT COUNT(*) FROM instruments WHERE sportmonks_id BETWEEN 1000 AND 1049;"
DATABASE_URL="postgres://ftl_admin:localdev@localhost:5432/ftl2026?sslmode=disable" \
go run ./scripts/seed
docker restart ftl-api-server # re-hydrates Redis instrument hashes on boot
api-server's HydrateRedis only runs at startup — adding instruments to Postgres without restarting leaves Redis without the keys the replay targets.
Activity feed empty after a successful test trade¶
Symptom: trade returns 200 filled but /api/activity returns {"count":0,"events":[]}.
Cause: the activity-feed gate in internal/activity/service.go shouldAppend intentionally drops trades smaller than quantity >= 10 OR total >= ₹2000. Micro trades produce no feed entry.
Fix: trigger a trade of at least 10 shares, or any trade that crosses the ₹2000 total threshold.
Goroutine-side notifications need a few seconds to surface¶
Symptom: /api/notifications/unread-count shows a new notification, but a follow-up /api/activity call doesn't have the event row yet.
Cause: the activity append is go func() { ... }() from the trade handler with a 2-second timeout. It is best-effort and detached.
Fix: wait at least 2 seconds before asserting the feed, or use a retry loop. E2E tests should use await page.waitForResponse(...) rather than asserting immediately.