Runbooks¶

Condensed execution guides. Each section links to the full source document in ftl-docs/guides/ for complete step-by-step detail.

CFD margin trading cutover¶

Source: ftl-docs/guides/CFD-CUTOVER.md Estimated window: 30 minutes (15 min pre-flight, 10 min cutover, 5 min smoke) When to run: during a no-match window, after staging has held a green end-to-end test for at least 48 hours.

This runbook switches production from the legacy buy-and-hold AMM to the CFD margin trading model.

Pre-flight¶

Confirm staging CI is green on release/staging for all three repos (ftl-backend, ftl-frontend, ftl-docs):

gh run list --repo JetaFutures/ftl-backend  --branch release/staging --limit 1 --json status,conclusion
gh run list --repo JetaFutures/ftl-frontend --branch release/staging --limit 1 --json status,conclusion

Both must report status:completed, conclusion:success.

Confirm the staging migrate job ran at the same SHA as release/staging:

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp job execution list \
  -n ftl-stg-migrate -g ftl-stg-rg-cin \
  --query '[0].{status:properties.status, end:properties.endTime}' -o json

Run a 5-minute k6 load test on staging:
```
cd ftl-frontend
k6 run --vus 50 --duration 5m loadtest-cfd.js
```
Acceptance: zero 5xx, P99 < 250 ms on POST /api/positions/open.

Take a manual PITR backup of prod Postgres:

AZURE_CONFIG_DIR=~/.azure-ftl az postgres flexible-server backup create \
  --resource-group ftl-prd-rg-cin --name ftl-prd-pg \
  --backup-name "pre-cfd-cutover-$(date +%Y%m%d-%H%M)"

Record the backup ID — the rollback plan needs it.

The flip¶

Optionally block user trades during cutover (return 503 for protected routes):

AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/flag.sh prd coming_soon true

Drain legacy share positions — run the migration script in dry-run mode first, then live:

cd ftl-backend
go run ./scripts/migrate-legacy-positions.go \
  -db "$(AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/print-db-url.sh prd)" \
  -dry-run
# Review output. If it looks correct:
go run ./scripts/migrate-legacy-positions.go \
  -db "$(AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/print-db-url.sh prd)"

Promote branches into release/prod (fast-forward only):

for r in ftl-backend ftl-frontend; do
  (cd $r && git fetch origin && git checkout release/prod && \
   git merge --ff-only origin/release/staging && git push)
done

Approve the GitHub Environment gate — ftl-backend first. Wait for /health smoke in CI to go green before approving ftl-frontend.

Confirm Single revision mode before and after deploy:

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show -n ftl-prd-api -g ftl-prd-rg-cin \
  --query 'properties.configuration.activeRevisionsMode' -o tsv
# Must print: Single

Smoke tests¶

Run from a dev machine against prod with a QA account cookie jar:

# Open a position
curl -X POST https://api.ftljeta.cloud/api/positions/open \
  -H 'Content-Type: application/json' \
  -H "Idempotency-Key: $(uuidgen)" \
  -b "$COOKIE_JAR" \
  -d '{"instrumentId":"<known-id>","direction":"long","lotSize":0.01}'

# List open positions
curl -b "$COOKIE_JAR" https://api.ftljeta.cloud/api/positions?status=open

# Close the position
curl -X POST -b "$COOKIE_JAR" -H 'Content-Type: application/json' \
  -d "{\"clientRequestId\":\"smoke-$(date +%s)\"}" \
  https://api.ftljeta.cloud/api/positions/<id>/close

Acceptance: all requests return 2xx; closedBy on the closed position is user; balance after close equals balance before open plus realised PnL.

Re-open¶

AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/flag.sh prd coming_soon false

Rollback¶

If smoke fails or a critical bug surfaces within the first hour:

# Revert all three Container Apps to the previous revision
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-api     -1
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-ws      -1
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-flusher -1

Revert the frontend bundle:

AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-fe-bundle.sh prd

Migrations 000024–000026 are additive (new tables, new columns with defaults). The reverted code ignores the new tables — no schema rollback needed unless a data-corruption bug is found. If that happens, restore from the PITR backup taken in pre-flight step 4:

AZURE_CONFIG_DIR=~/.azure-ftl az postgres flexible-server restore \
  --restore-time "<timestamp from pre-flight backup>" \
  --source-server ftl-prd-pg --name ftl-prd-pg-rolled-back

Write a post-mortem in ftl-docs/reports/cfd-cutover-rollback-<date>.md within 48 hours.

Path 2 production deployment¶

Source: ftl-docs/guides/22-prod-path2-cutover.md and ftl-docs/guides/23-path2-deployment-runbook.md

Decision: split-domain topology (Path 2) chosen because the registrar for jetafutures.com cannot add NS records to redirect a subdomain to Azure. Path 1 (single domain, Azure DNS delegation) is blocked by the registrar constraint.

Final topology¶

flowchart TB
    browser --> spa["ftl.jetafutures.com\n(Azure SWA + Cloudflare)"]
    spa -- fetch/wss --> api["api.ftljeta.cloud\n(Azure Container Apps + Cloudflare proxy)"]
    spa -- wss --> ws["ws.ftljeta.cloud\n(Azure Container Apps + Cloudflare DNS-only)"]
    api --> data["Postgres Flex · Redis · Key Vault · ACR"]
    ws --> data

Component	Domain	Provider
Frontend SPA	`ftl.jetafutures.com`	Azure SWA Standard (`eastasia`), Cloudflare-proxied
API	`api.ftljeta.cloud`	Azure Container Apps (`centralindia`), Cloudflare proxy (orange)
WS	`ws.ftljeta.cloud`	Azure Container Apps (`centralindia`), Cloudflare DNS-only (grey, permanent)

Cookie strategy: Domain=.ftljeta.cloud, SameSite=None; Secure; Partitioned, HttpOnly. WS auth uses a JWT in the query string — the cookie is not relevant for WebSocket upgrades.

Required one-time code changes¶

These are already merged if the staging dry-run was completed. Verify with git log.

ftl-backend/internal/auth/cookies.go — uses SameSite=None; Secure; Partitioned when cfg.Secure == true (production/staging), and SameSite=Lax in local dev (HTTP). Both cookie functions (SetAccessCookie, SetRefreshCookie) and ClearAuthCookies follow the same buildCookie helper, so the attribute is applied consistently.

ftl-backend/cmd/api-server/main.go — reads CORS_ORIGINS env var (comma-separated list of allowed origins); falls back to http://localhost:5173,http://localhost:8080 when unset.

ftl-backend/cmd/ws-server/main.go — builds the originAllow map from CORS_ORIGINS env var; falls back to http://localhost:5173 when unset.

ftl-frontend/src/lib/api.ts — selects the API base URL at build time: staging build (MODE=staging) uses https://api.ftljeta.cloud/api; production build uses https://prd-api.ftljeta.cloud/api; local dev falls back to /api.

ftl-frontend/src/hooks/useWebSocket.ts — selects the WS base URL at build time: staging build uses wss://ws.ftljeta.cloud; production build uses wss://prd-ws.ftljeta.cloud; local dev derives the URL from window.location.host.

ftl-frontend/public/staticwebapp.config.json — has navigationFallback rule so deep-link reloads (/stadium, /leaderboard, etc.) serve index.html.

Execution order¶

Provision prod backend infra following ftl-docs/guides/22-prod-path2-cutover.md §2.3. Separate resource group ftl-prd-rg-cin. Use prod SKUs (Postgres production tier, Redis Standard C1 minimum, ws min=3, api min=2, flusher min=1).

Capture backend FQDNs after deploy:

API_FQDN=$(AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \
  -g ftl-prd-rg-cin -n ftl-prd-api \
  --query properties.configuration.ingress.fqdn -o tsv)
WS_FQDN=$(AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \
  -g ftl-prd-rg-cin -n ftl-prd-ws \
  --query properties.configuration.ingress.fqdn -o tsv)

Add Cloudflare DNS records on ftljeta.cloud — all DNS-only (grey) initially:

Type	Name	Target	Proxy
CNAME	`api`	`$API_FQDN`	DNS-only (grey) initially
CNAME	`ws`	`$WS_FQDN`	DNS-only (grey) permanently
TXT	`asuid.api`	CAE `customDomainVerificationId`	N/A
TXT	`asuid.ws`	Same value	N/A

Get the verification ID:

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp env show \
  -g ftl-prd-rg-cin -n ftl-prd-cae-cin \
  --query 'properties.customDomainConfiguration.customDomainVerificationId' -o tsv

Bind custom hostnames on the Container Apps (issues TLS certs):

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname add \
  -g ftl-prd-rg-cin -n ftl-prd-api --hostname api.ftljeta.cloud
AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname bind \
  -g ftl-prd-rg-cin -n ftl-prd-api --hostname api.ftljeta.cloud \
  --environment ftl-prd-cae-cin --validation-method CNAME

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname add \
  -g ftl-prd-rg-cin -n ftl-prd-ws --hostname ws.ftljeta.cloud
AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname bind \
  -g ftl-prd-rg-cin -n ftl-prd-ws --hostname ws.ftljeta.cloud \
  --environment ftl-prd-cae-cin --validation-method CNAME

Each bind call takes up to 20 minutes. Output shows bindingType: SniEnabled on success.

After cert succeeds on api.ftljeta.cloud: flip it to Proxied (orange) in Cloudflare. Leave ws.ftljeta.cloud grey — Cloudflare Free does not reliably sustain WebSocket connections.

Set prod env vars on ftl-prd-api:

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp update \
  -g ftl-prd-rg-cin -n ftl-prd-api \
  --set-env-vars "COOKIE_DOMAIN=.ftljeta.cloud" \
                 "CORS_ORIGINS=https://ftl.jetafutures.com"

Build and deploy the SPA with prod values:

npm run build

SWA_TOKEN=$(AZURE_CONFIG_DIR=~/.azure-ftl az staticwebapp secrets list \
  -g ftl-prd-rg-cin -n ftl-prd-frontend \
  --query properties.apiKey -o tsv)
npx -y @azure/static-web-apps-cli deploy ./dist \
  --deployment-token "$SWA_TOKEN" --env production

Note: The staging vs production API/WS URLs are baked in at build time via import.meta.env.MODE and import.meta.env.PROD. For a production build use npm run build (MODE=production). For a staging build use npm run build:staging (MODE=staging).

Hand the jetafutures.com registrar owner one record:
```
CNAME ftl.jetafutures.com → <SWA default hostname>
```
Wait for propagation (usually under 1 hour). Azure issues the managed cert automatically once the CNAME resolves.
Update Google OAuth prod client in GCP Console:
Authorized JS origin: https://ftl.jetafutures.com
Authorized redirect URI: https://api.ftljeta.cloud/api/auth/google/callback

Pre-traffic verification¶

# Health checks
curl -fsS https://api.ftljeta.cloud/health
curl -fsS https://ws.ftljeta.cloud/health

# CORS preflight
curl -sSI -X OPTIONS \
  -H "Origin: https://ftl.jetafutures.com" \
  -H "Access-Control-Request-Method: GET" \
  https://api.ftljeta.cloud/api/me \
  | grep -iE "access-control|HTTP"
# Expect: 204, access-control-allow-origin: https://ftl.jetafutures.com,
#         access-control-allow-credentials: true

# SPA deep-link fallback
curl -fsS -o /dev/null -w "%{http_code}\n" https://ftl.jetafutures.com/stadium
# Expect: 200

Also run cross-browser smoke: Chrome, Firefox, Safari (macOS and iOS), and Brave with default shields. Safari is the critical test — cross-root ITP blocks cookies on reload if SameSite=None is not properly set. If Safari fails: implement a Storage Access API call after login before shipping.

Rollback¶

Frontend: redeploy the previous dist/ via swa deploy (~2 min).
Backend: az containerapp revision activate -n <previous-revision-name> for api, ws, and flusher (~30s each).
DNS: ask the jetafutures.com owner to remove the CNAME. Cloudflare records on ftljeta.cloud are fully under your control.
OAuth: changes to the prod OAuth client are additive — no rollback needed; the staging client is untouched.

Local dev common issues¶

Source: ftl-docs/guides/LOCAL-DEV-GOTCHAS.md

`schema_migrations.version` drifted¶

Symptom: admin_settings, feature_flags, or other later-migration tables exist in Postgres, but SELECT version FROM schema_migrations; reports a version several migrations behind the highest migrations/NNNNN_*.up.sql on disk.

Cause: earlier migrations were applied via a manual psql path or a golang-migrate run was interrupted. The version counter never moved.

Fix:

For each migration from version+1 upward, open the up.sql and confirm the change is already live (e.g. \d admin_settings for migration 21).
Apply any genuinely missing migrations as plain SQL inside a transaction.

Bump the version row:

UPDATE schema_migrations SET version = <N>;

Verify with the next planned migrate run.

If everything from version+1 onward is already live, just bump the row. The 2026-05-11 regression sweep did exactly this: bumped from 19 to 22 because migrations 20 and 21 were already in place.

Wallet balance stale after direct Postgres write¶

Symptom: UPDATE wallets SET balance = ... in psql succeeds, but /api/wallet still returns the old amount.

Cause: wallet.Service.GetByUserID is read-through with Redis as the authority. The Redis hash wallet:<user_id> field balance wins whenever it exists. See internal/wallet/service.go GetByUserID.

Fix: also write Redis:

docker exec ftl-redis redis-cli HSET wallet:<user-id> balance <amount>

Or restart ftl-api-server to drop the hash and let the next read fall through to Postgres.

Replay prices don't move in the UI¶

Symptom: go run scripts/local-replay/ runs without error but prices don't move. Redis pub/sub publishes but no instrument hash exists at the expected key.

Cause: replay scripts target sportmonks_id 1000–1049. If those instruments are missing from Postgres, the WS server drops the broadcast.

Fix:

docker exec ftl-postgres psql -U ftl_admin -d ftl2026 \
  -c "SELECT COUNT(*) FROM instruments WHERE sportmonks_id BETWEEN 1000 AND 1049;"

If the count is 0:

DATABASE_URL="postgres://ftl_admin:localdev@localhost:5432/ftl2026?sslmode=disable" \
  go run ./scripts/seed
docker restart ftl-api-server   # re-hydrates Redis instrument hashes on boot

api-server's HydrateRedis only runs at startup — adding instruments to Postgres without restarting leaves Redis without the keys the replay targets.

Activity feed empty after a successful test trade¶

Symptom: trade returns 200 filled but /api/activity returns {"count":0,"events":[]}.

Cause: the activity-feed gate in internal/activity/service.go shouldAppend intentionally drops trades smaller than quantity >= 10 OR total >= ₹2000. Micro trades produce no feed entry.

Fix: trigger a trade of at least 10 shares, or any trade that crosses the ₹2000 total threshold.

Goroutine-side notifications need a few seconds to surface¶

Symptom: /api/notifications/unread-count shows a new notification, but a follow-up /api/activity call doesn't have the event row yet.

Cause: the activity append is go func() { ... }() from the trade handler with a 2-second timeout. It is best-effort and detached.

Fix: wait at least 2 seconds before asserting the feed, or use a retry loop. E2E tests should use await page.waitForResponse(...) rather than asserting immediately.

Runbooks¶

CFD margin trading cutover¶

Pre-flight¶

The flip¶

Smoke tests¶

Re-open¶

Rollback¶

Path 2 production deployment¶

Final topology¶

Required one-time code changes¶

Execution order¶

Pre-traffic verification¶

Rollback¶

Local dev common issues¶

schema_migrations.version drifted¶

Wallet balance stale after direct Postgres write¶

Replay prices don't move in the UI¶

Activity feed empty after a successful test trade¶

Goroutine-side notifications need a few seconds to surface¶

`schema_migrations.version` drifted¶