Skip to content

Runbooks

Condensed execution guides. Each section links to the full source document in ftl-docs/guides/ for complete step-by-step detail.


CFD margin trading cutover

Source: ftl-docs/guides/CFD-CUTOVER.md Estimated window: 30 minutes (15 min pre-flight, 10 min cutover, 5 min smoke) When to run: during a no-match window, after staging has held a green end-to-end test for at least 48 hours.

This runbook switches production from the legacy buy-and-hold AMM to the CFD margin trading model.

Pre-flight

  1. Confirm staging CI is green on release/staging for all three repos (ftl-backend, ftl-frontend, ftl-docs):

    gh run list --repo JetaFutures/ftl-backend  --branch release/staging --limit 1 --json status,conclusion
    gh run list --repo JetaFutures/ftl-frontend --branch release/staging --limit 1 --json status,conclusion
    
    Both must report status:completed, conclusion:success.

  2. Confirm the staging migrate job ran at the same SHA as release/staging:

    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp job execution list \
      -n ftl-stg-migrate -g ftl-stg-rg-cin \
      --query '[0].{status:properties.status, end:properties.endTime}' -o json
    

  3. Run a 5-minute k6 load test on staging:

    cd ftl-frontend
    k6 run --vus 50 --duration 5m loadtest-cfd.js
    
    Acceptance: zero 5xx, P99 < 250 ms on POST /api/positions/open.

  4. Take a manual PITR backup of prod Postgres:

    AZURE_CONFIG_DIR=~/.azure-ftl az postgres flexible-server backup create \
      --resource-group ftl-prd-rg-cin --name ftl-prd-pg \
      --backup-name "pre-cfd-cutover-$(date +%Y%m%d-%H%M)"
    
    Record the backup ID — the rollback plan needs it.

The flip

  1. Optionally block user trades during cutover (return 503 for protected routes):

    AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/flag.sh prd coming_soon true
    

  2. Drain legacy share positions — run the migration script in dry-run mode first, then live:

    cd ftl-backend
    go run ./scripts/migrate-legacy-positions.go \
      -db "$(AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/print-db-url.sh prd)" \
      -dry-run
    # Review output. If it looks correct:
    go run ./scripts/migrate-legacy-positions.go \
      -db "$(AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/print-db-url.sh prd)"
    

  3. Promote branches into release/prod (fast-forward only):

    for r in ftl-backend ftl-frontend; do
      (cd $r && git fetch origin && git checkout release/prod && \
       git merge --ff-only origin/release/staging && git push)
    done
    

  4. Approve the GitHub Environment gate — ftl-backend first. Wait for /health smoke in CI to go green before approving ftl-frontend.

  5. Confirm Single revision mode before and after deploy:

    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show -n ftl-prd-api -g ftl-prd-rg-cin \
      --query 'properties.configuration.activeRevisionsMode' -o tsv
    # Must print: Single
    

Smoke tests

Run from a dev machine against prod with a QA account cookie jar:

# Open a position
curl -X POST https://api.ftljeta.cloud/api/positions/open \
  -H 'Content-Type: application/json' \
  -H "Idempotency-Key: $(uuidgen)" \
  -b "$COOKIE_JAR" \
  -d '{"instrumentId":"<known-id>","direction":"long","lotSize":0.01}'

# List open positions
curl -b "$COOKIE_JAR" https://api.ftljeta.cloud/api/positions?status=open

# Close the position
curl -X POST -b "$COOKIE_JAR" -H 'Content-Type: application/json' \
  -d "{\"clientRequestId\":\"smoke-$(date +%s)\"}" \
  https://api.ftljeta.cloud/api/positions/<id>/close

Acceptance: all requests return 2xx; closedBy on the closed position is user; balance after close equals balance before open plus realised PnL.

Re-open

AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/flag.sh prd coming_soon false

Rollback

If smoke fails or a critical bug surfaces within the first hour:

# Revert all three Container Apps to the previous revision
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-api     -1
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-ws      -1
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-revision.sh prd ftl-prd-flusher -1

Revert the frontend bundle:

AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/revert-fe-bundle.sh prd

Migrations 000024–000026 are additive (new tables, new columns with defaults). The reverted code ignores the new tables — no schema rollback needed unless a data-corruption bug is found. If that happens, restore from the PITR backup taken in pre-flight step 4:

AZURE_CONFIG_DIR=~/.azure-ftl az postgres flexible-server restore \
  --restore-time "<timestamp from pre-flight backup>" \
  --source-server ftl-prd-pg --name ftl-prd-pg-rolled-back

Write a post-mortem in ftl-docs/reports/cfd-cutover-rollback-<date>.md within 48 hours.


Path 2 production deployment

Source: ftl-docs/guides/22-prod-path2-cutover.md and ftl-docs/guides/23-path2-deployment-runbook.md

Decision: split-domain topology (Path 2) chosen because the registrar for jetafutures.com cannot add NS records to redirect a subdomain to Azure. Path 1 (single domain, Azure DNS delegation) is blocked by the registrar constraint.

Final topology

flowchart TB
    browser --> spa["ftl.jetafutures.com\n(Azure SWA + Cloudflare)"]
    spa -- fetch/wss --> api["api.ftljeta.cloud\n(Azure Container Apps + Cloudflare proxy)"]
    spa -- wss --> ws["ws.ftljeta.cloud\n(Azure Container Apps + Cloudflare DNS-only)"]
    api --> data["Postgres Flex · Redis · Key Vault · ACR"]
    ws --> data
Component Domain Provider
Frontend SPA ftl.jetafutures.com Azure SWA Standard (eastasia), Cloudflare-proxied
API api.ftljeta.cloud Azure Container Apps (centralindia), Cloudflare proxy (orange)
WS ws.ftljeta.cloud Azure Container Apps (centralindia), Cloudflare DNS-only (grey, permanent)

Cookie strategy: Domain=.ftljeta.cloud, SameSite=None; Secure; Partitioned, HttpOnly. WS auth uses a JWT in the query string — the cookie is not relevant for WebSocket upgrades.

Required one-time code changes

These are already merged if the staging dry-run was completed. Verify with git log.

ftl-backend/internal/auth/cookies.go — uses SameSite=None; Secure; Partitioned when cfg.Secure == true (production/staging), and SameSite=Lax in local dev (HTTP). Both cookie functions (SetAccessCookie, SetRefreshCookie) and ClearAuthCookies follow the same buildCookie helper, so the attribute is applied consistently.

ftl-backend/cmd/api-server/main.go — reads CORS_ORIGINS env var (comma-separated list of allowed origins); falls back to http://localhost:5173,http://localhost:8080 when unset.

ftl-backend/cmd/ws-server/main.go — builds the originAllow map from CORS_ORIGINS env var; falls back to http://localhost:5173 when unset.

ftl-frontend/src/lib/api.ts — selects the API base URL at build time: staging build (MODE=staging) uses https://api.ftljeta.cloud/api; production build uses https://prd-api.ftljeta.cloud/api; local dev falls back to /api.

ftl-frontend/src/hooks/useWebSocket.ts — selects the WS base URL at build time: staging build uses wss://ws.ftljeta.cloud; production build uses wss://prd-ws.ftljeta.cloud; local dev derives the URL from window.location.host.

ftl-frontend/public/staticwebapp.config.json — has navigationFallback rule so deep-link reloads (/stadium, /leaderboard, etc.) serve index.html.

Execution order

  1. Provision prod backend infra following ftl-docs/guides/22-prod-path2-cutover.md §2.3. Separate resource group ftl-prd-rg-cin. Use prod SKUs (Postgres production tier, Redis Standard C1 minimum, ws min=3, api min=2, flusher min=1).

  2. Capture backend FQDNs after deploy:

    API_FQDN=$(AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \
      -g ftl-prd-rg-cin -n ftl-prd-api \
      --query properties.configuration.ingress.fqdn -o tsv)
    WS_FQDN=$(AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \
      -g ftl-prd-rg-cin -n ftl-prd-ws \
      --query properties.configuration.ingress.fqdn -o tsv)
    

  3. Add Cloudflare DNS records on ftljeta.cloud — all DNS-only (grey) initially:

Type Name Target Proxy
CNAME api $API_FQDN DNS-only (grey) initially
CNAME ws $WS_FQDN DNS-only (grey) permanently
TXT asuid.api CAE customDomainVerificationId N/A
TXT asuid.ws Same value N/A

Get the verification ID:

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp env show \
  -g ftl-prd-rg-cin -n ftl-prd-cae-cin \
  --query 'properties.customDomainConfiguration.customDomainVerificationId' -o tsv

  1. Bind custom hostnames on the Container Apps (issues TLS certs):

    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname add \
      -g ftl-prd-rg-cin -n ftl-prd-api --hostname api.ftljeta.cloud
    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname bind \
      -g ftl-prd-rg-cin -n ftl-prd-api --hostname api.ftljeta.cloud \
      --environment ftl-prd-cae-cin --validation-method CNAME
    
    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname add \
      -g ftl-prd-rg-cin -n ftl-prd-ws --hostname ws.ftljeta.cloud
    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp hostname bind \
      -g ftl-prd-rg-cin -n ftl-prd-ws --hostname ws.ftljeta.cloud \
      --environment ftl-prd-cae-cin --validation-method CNAME
    
    Each bind call takes up to 20 minutes. Output shows bindingType: SniEnabled on success.

  2. After cert succeeds on api.ftljeta.cloud: flip it to Proxied (orange) in Cloudflare. Leave ws.ftljeta.cloud grey — Cloudflare Free does not reliably sustain WebSocket connections.

  3. Set prod env vars on ftl-prd-api:

    AZURE_CONFIG_DIR=~/.azure-ftl az containerapp update \
      -g ftl-prd-rg-cin -n ftl-prd-api \
      --set-env-vars "COOKIE_DOMAIN=.ftljeta.cloud" \
                     "CORS_ORIGINS=https://ftl.jetafutures.com"
    

  4. Build and deploy the SPA with prod values:

    npm run build
    
    SWA_TOKEN=$(AZURE_CONFIG_DIR=~/.azure-ftl az staticwebapp secrets list \
      -g ftl-prd-rg-cin -n ftl-prd-frontend \
      --query properties.apiKey -o tsv)
    npx -y @azure/static-web-apps-cli deploy ./dist \
      --deployment-token "$SWA_TOKEN" --env production
    
    Note: The staging vs production API/WS URLs are baked in at build time via import.meta.env.MODE and import.meta.env.PROD. For a production build use npm run build (MODE=production). For a staging build use npm run build:staging (MODE=staging).

  5. Hand the jetafutures.com registrar owner one record:

    CNAME ftl.jetafutures.com → <SWA default hostname>
    
    Wait for propagation (usually under 1 hour). Azure issues the managed cert automatically once the CNAME resolves.

  6. Update Google OAuth prod client in GCP Console:

  7. Authorized JS origin: https://ftl.jetafutures.com
  8. Authorized redirect URI: https://api.ftljeta.cloud/api/auth/google/callback

Pre-traffic verification

# Health checks
curl -fsS https://api.ftljeta.cloud/health
curl -fsS https://ws.ftljeta.cloud/health

# CORS preflight
curl -sSI -X OPTIONS \
  -H "Origin: https://ftl.jetafutures.com" \
  -H "Access-Control-Request-Method: GET" \
  https://api.ftljeta.cloud/api/me \
  | grep -iE "access-control|HTTP"
# Expect: 204, access-control-allow-origin: https://ftl.jetafutures.com,
#         access-control-allow-credentials: true

# SPA deep-link fallback
curl -fsS -o /dev/null -w "%{http_code}\n" https://ftl.jetafutures.com/stadium
# Expect: 200

Also run cross-browser smoke: Chrome, Firefox, Safari (macOS and iOS), and Brave with default shields. Safari is the critical test — cross-root ITP blocks cookies on reload if SameSite=None is not properly set. If Safari fails: implement a Storage Access API call after login before shipping.

Rollback

  • Frontend: redeploy the previous dist/ via swa deploy (~2 min).
  • Backend: az containerapp revision activate -n <previous-revision-name> for api, ws, and flusher (~30s each).
  • DNS: ask the jetafutures.com owner to remove the CNAME. Cloudflare records on ftljeta.cloud are fully under your control.
  • OAuth: changes to the prod OAuth client are additive — no rollback needed; the staging client is untouched.

Local dev common issues

Source: ftl-docs/guides/LOCAL-DEV-GOTCHAS.md

schema_migrations.version drifted

Symptom: admin_settings, feature_flags, or other later-migration tables exist in Postgres, but SELECT version FROM schema_migrations; reports a version several migrations behind the highest migrations/NNNNN_*.up.sql on disk.

Cause: earlier migrations were applied via a manual psql path or a golang-migrate run was interrupted. The version counter never moved.

Fix:

  1. For each migration from version+1 upward, open the up.sql and confirm the change is already live (e.g. \d admin_settings for migration 21).
  2. Apply any genuinely missing migrations as plain SQL inside a transaction.
  3. Bump the version row:
    UPDATE schema_migrations SET version = <N>;
    
  4. Verify with the next planned migrate run.

If everything from version+1 onward is already live, just bump the row. The 2026-05-11 regression sweep did exactly this: bumped from 19 to 22 because migrations 20 and 21 were already in place.

Wallet balance stale after direct Postgres write

Symptom: UPDATE wallets SET balance = ... in psql succeeds, but /api/wallet still returns the old amount.

Cause: wallet.Service.GetByUserID is read-through with Redis as the authority. The Redis hash wallet:<user_id> field balance wins whenever it exists. See internal/wallet/service.go GetByUserID.

Fix: also write Redis:

docker exec ftl-redis redis-cli HSET wallet:<user-id> balance <amount>
Or restart ftl-api-server to drop the hash and let the next read fall through to Postgres.

Replay prices don't move in the UI

Symptom: go run scripts/local-replay/ runs without error but prices don't move. Redis pub/sub publishes but no instrument hash exists at the expected key.

Cause: replay scripts target sportmonks_id 1000–1049. If those instruments are missing from Postgres, the WS server drops the broadcast.

Fix:

docker exec ftl-postgres psql -U ftl_admin -d ftl2026 \
  -c "SELECT COUNT(*) FROM instruments WHERE sportmonks_id BETWEEN 1000 AND 1049;"
If the count is 0:
DATABASE_URL="postgres://ftl_admin:localdev@localhost:5432/ftl2026?sslmode=disable" \
  go run ./scripts/seed
docker restart ftl-api-server   # re-hydrates Redis instrument hashes on boot
api-server's HydrateRedis only runs at startup — adding instruments to Postgres without restarting leaves Redis without the keys the replay targets.

Activity feed empty after a successful test trade

Symptom: trade returns 200 filled but /api/activity returns {"count":0,"events":[]}.

Cause: the activity-feed gate in internal/activity/service.go shouldAppend intentionally drops trades smaller than quantity >= 10 OR total >= ₹2000. Micro trades produce no feed entry.

Fix: trigger a trade of at least 10 shares, or any trade that crosses the ₹2000 total threshold.

Goroutine-side notifications need a few seconds to surface

Symptom: /api/notifications/unread-count shows a new notification, but a follow-up /api/activity call doesn't have the event row yet.

Cause: the activity append is go func() { ... }() from the trade handler with a 2-second timeout. It is best-effort and detached.

Fix: wait at least 2 seconds before asserting the feed, or use a retry loop. E2E tests should use await page.waitForResponse(...) rather than asserting immediately.