Deployment (current state)¶

Companion: azure (CLI setup), prod-launch-runbook (one-time WC26 launch fixes), prod-deployment-runbook (staging → prod promotion flow).

Last verified against measured infra: 2026-06-11. Replaces the legacy May-2026 deployment doc which referenced a development branch that no longer exists.

1. Branch flow¶

gitGraph
   commit id: "release/staging"
   branch dev-amal-integration-staging-<date>
   checkout dev-amal-integration-staging-<date>
   commit id: "cut from release/staging --no-track"
   branch feat/your-feature
   checkout feat/your-feature
   commit id: "work"
   checkout dev-amal-integration-staging-<date>
   merge feat/your-feature id: "PR → integration"
   checkout release/staging
   merge dev-amal-integration-staging-<date> id: "PR → release/staging"
   commit id: "CI: migrate + roll + smoke"
   branch staging-to-prod-release-<date>
   checkout staging-to-prod-release-<date>
   commit id: "cut from release/staging"
   checkout release/staging
   branch prod-integration-<date>
   checkout prod-integration-<date>
   commit id: "cut from release/prod --no-track"
   merge staging-to-prod-release-<date> id: "PR 1"
   checkout release/prod
   merge prod-integration-<date> id: "PR 2 — gated by reviewer"
   commit id: "CI: migrate + approval + roll prod"

Branch	Cut from	Convention
`feat/<slug>`	integration branch	new features
`bug/<slug>`	integration branch	bug fixes
`dev-amal-integration-staging-<date>`	`release/staging`	weekly integration branch
`release/staging`	(long-lived)	CI integration target
`staging-to-prod-release-<date>`	`release/staging`	promotion-only, no own commits
`prod-integration-<date>`	`release/prod`	review seam for prod cutover
`release/prod`	(long-lived)	production deploy target

main is NOT the integration branch

main is unused for day-to-day work. Cut feature branches off the dated integration branch (which itself is cut off release/staging with --no-track). Never branch off main.

Rules:

Cut every feat/ / bug/ branch off the current dated integration branch, never off release/staging or main.
--no-track on the integration cut so the branch can never accidentally git push onto release/staging. Pushing requires an explicit git push -u origin <branch>.
One branch = one scope. Branch again if scope expands.
Push to origin immediately after the first meaningful commit.
Every PR description includes Author: @<github-handle>.

For the staging → prod flow specifically, see prod-deployment-runbook.

2. Services¶

Three Go microservices on Azure Container Apps in centralindia. Replica counts below are measured 2026-06-11 on prod (ftl-prd-rg-cin).

Service	Container App	vCPU / Memory	min / max replicas	Ingress
api-server	`ftl-prd-api`	0.5 vCPU / 1 GiB	1 / 2	external
ws-server	`ftl-prd-ws`	1.0 vCPU / 2 GiB	1 / 2	external
flusher	`ftl-prd-flusher`	0.5 vCPU / 1 GiB	1 / 1	none (internal)

Staging (ftl-stg-rg-cin):

Service	Container App	vCPU / Memory	min / max	Ingress
api-server	`ftl-stg-api`	0.5 vCPU / 1 GiB	1 / 2	external
ws-server	`ftl-stg-ws`	0.5 vCPU / 1 GiB	1 / 2	external
flusher	`ftl-stg-flusher`	0.25 vCPU / 0.5 GiB	1 / 1	none
whatsapp-bot	`ftl-stg-whatsapp-bot`	0.25 vCPU / 0.5 GiB	1 / 3	external
frontend	`ftl-stg-frontend`	0.25 vCPU / 0.5 GiB	1 / 1	external

No frontend / whatsapp-bot Container Apps on prod today — the frontend is served by Azure Static Web App ftl-prd-frontend (Free SKU). See prod-snapshot-2026-06-11.

Decision: Custom Go WebSocket server on Container Apps rather than Azure Web PubSub — 15× cheaper at 50K concurrent connections.

Decision: Redis Lua atomic scripts over a message queue — single-threaded execution, zero lock contention, no Service Bus cost.

3. Ingress endpoints¶

Public DNS names (custom domains) and Container Apps system FQDNs:

ProdStaging

Service	Custom domain (use this)	Azure FQDN (fallback)
api	`https://prd-api.ftljeta.cloud`	`https://ftl-prd-api.livelywave-a872f0a8.centralindia.azurecontainerapps.io`
ws	`wss://prd-ws.ftljeta.cloud`	`wss://ftl-prd-ws.livelywave-a872f0a8.centralindia.azurecontainerapps.io`
frontend (SWA)	`https://ftl.jetafutures.com`	`https://blue-grass-01f012000.7.azurestaticapps.net`

Service	Custom domain (use this)	Azure FQDN (fallback)
api	`https://api.ftljeta.cloud`	`https://ftl-stg-api.kindfield-0bb54ed4.centralindia.azurecontainerapps.io`
ws	`wss://ws.ftljeta.cloud`	`wss://ftl-stg-ws.kindfield-0bb54ed4.centralindia.azurecontainerapps.io`
frontend (SWA)	`https://yellow-sand-0d6f5a100.7.azurestaticapps.net`	same

Prod uses the ftljeta.cloud apex; staging shares it without the prd- prefix. Domain shape is different per environment — using the prod URL against staging or vice versa will hit the wrong system. Smoke and runbook calls must use the env-correct domain.

4. Resource group + Container Apps Environment¶

Environment	Resource group	Container Apps Environment
Staging	`ftl-stg-rg-cin`	`ftl-stg-cae-cin`
Production	`ftl-prd-rg-cin`	`ftl-prd-cae-cin`

Both environments use the Consumption workload profile; not zone-redundant. Same centralindia region.

5. Single-revision mode requirement¶

Always confirm Single-revision mode before scaling

Running scale-down.sh / scale-up.sh (or any az containerapp update) against an app in Multiple revision mode creates a new revision on every call. Old revisions keep their original minReplicas and bill forever.

2026-05-02 incident reference: ftl-stg-api accumulated 9 revisions (10 always-warm replicas) after repeated az containerapp update calls. Actual burn was ₹1,234/day vs the ₹150/day plan. See azure-cost-audit-2026-05-02.md for the post-mortem.

Verify on every app:

for app in ftl-prd-api ftl-prd-ws ftl-prd-flusher; do
  AZURE_CONFIG_DIR=~/.azure-ftl az containerapp show \
    -g ftl-prd-rg-cin -n "$app" \
    --query 'properties.configuration.activeRevisionsMode' -o tsv
done
# Expected: 'Single' on every line.

If Multiple, fix it:

AZURE_CONFIG_DIR=~/.azure-ftl az containerapp revision set-mode \
  -g <resource-group> -n <app-name> --mode Single

Then deactivate non-traffic revisions:

# List
AZURE_CONFIG_DIR=~/.azure-ftl az containerapp revision list \
  -g <resource-group> -n <app-name> \
  --query "[?properties.active].name" -o tsv

# Deactivate each non-live
AZURE_CONFIG_DIR=~/.azure-ftl az containerapp revision deactivate \
  -g <resource-group> -n <app-name> --revision <revision-name>

6. Scale-to-zero overnight (STAGING ONLY)¶

Staging runs scale-down.sh / scale-up.sh to cap idle burn. These scripts MUST NOT be pointed at prod — they stop the Postgres flex server and set all Container Apps to minReplicas=0, which would kill prod.

# Night — staging only
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/scale-down.sh

# Morning — staging only
AZURE_CONFIG_DIR=~/.azure-ftl bash ftl-infra/scripts/scale-up.sh

What they do:

scale-down.sh — stops the Postgres Flex server and sets --min-replicas=0 on staging Container Apps.
scale-up.sh — starts Postgres, waits 60 s, then sets --min-replicas=1 on ftl-stg-api, ftl-stg-ws, ftl-stg-flusher, and ftl-stg-frontend.

Azure Flex auto-restart

Azure auto-restarts a stopped Flex server after 7 days. The daily night-time stop resets that clock — the 7-day limit is never reached during active development.

7. Staging cost reference (measured)¶

Snapshot of the staging RG cost over the last 11 days (June 1–11, 2026):

Service	MTD INR	Share
Redis Standard C1	1 253	36 %
PostgreSQL D2ds_v5	1 021	29 %
Container Apps	1 017	29 %
ACR Basic	158	5 %
Everything else (KV, IPs, etc.)	35	1 %
Total MTD	~3 484	100 %

Detailed cost model + per-tier projections: cost-estimate-audit-2026-06-11.

8. CI pipeline overview¶

flowchart LR
    A[push to release/staging] --> B[lint + test]
    B --> C[build + push 4 images]
    C --> D[migrate job]
    D --> E[roll api / ws / flusher]
    E --> F[health smoke]
    F --> G[staging live]

    H[push to release/prod] --> I[lint + test]
    I --> J[build + push 4 images]
    J --> K[production env approval gate]
    K --> L[migrate job]
    L --> M[roll api / ws / flusher]
    M --> N[health smoke]
    N --> O[prod live]

Each release/staging push fires deploy-staging.yml; each release/prod push fires deploy-prod.yml and pauses at the production GitHub Environment approval gate (see github-env-audit-2026-06-11 — required reviewers must be enabled before the gate has any effect).

After approval, CI runs migrations against prod Postgres, then rolls the three Container Apps in parallel, then smokes /health.

9. What's next (action items)¶

For the immediate WC26 launch sequence see today-2026-06-11. For one-off prod fixes (PG SKU, Redis tier, env vars) see prod-launch-runbook. For the staging → prod promotion flow see prod-deployment-runbook.