Handoff: PR #553 — production security hardening and deploy
date: 2026-05-31T18:19:37+0900 git_commit: a15ec82bb6d7342ae9cea9a30229bc7b16ab16a1 branch: main issue: PR #553 tests: passing with one unrelated e2e failure noted
Handoff: PR #553 — production security hardening and deploy
What I was doing
Security-hardening exponential after finding that .env.prod.backup contained production secrets. The risky backup file was deleted locally, env files were hardened out of Docker build context, high-risk security fixes were implemented, pushed, and deployed to production ECS.
PR: https://github.com/namuh-eng/exponential/pull/553
Current State
- Branch:
mainnow contains the merged security hardening via merge commitb26f6a4. Source branch wascodex/fix-production-session-secret. - Production deploy: completed manually from this branch using image tag
a15ec82.
- API: exponential-api:36, running 699486076867.dkr.ecr.us-east-1.amazonaws.com/exponential-api:a15ec82.
- Web: exponential-web:40, running 699486076867.dkr.ecr.us-east-1.amazonaws.com/exponential-web:a15ec82.
- ECS rollout state: COMPLETED, desired/running 1/1 for both services.
- Prod smoke passed: web root,
/api/healthz, and token-authenticated/api/metrics/red. - Production metrics are protected now: unauthenticated
/api/metrics/redreturns404, token-authenticated returns200. - Security headers are live on
https://exponential.namuh.co/. .env.prod.backupis absent locally..envand/.envare ignored for Docker context;.env.exampleremains allowed.- Uncommitted/untracked local file:
landing.htmlonly; unrelated, do not commit unless the owner asks.
Test State
Passing:
make checkmake testcd apps/api && go test ./...pnpm audit --audit-level=moderate-> no known vulnerabilitiesgovulncheck ./...-> no vulnerabilities found- Targeted E2E:
pnpm --filter @exponential/web test:e2e -- tests/e2e/workspace-auth-methods.spec.ts tests/e2e/settings-api-oauth.spec.ts
Full E2E caveat:
make test-e2eran the real suite: 140 passed, 1 flaky passed on retry, 1 failed.- Failing test:
apps/web/tests/e2e/workspace-ai-settings.spec.tsexpected self-demotion to return 200, but API correctly returned 400 for "Use your account settings to change your own access". This appears unrelated to this security pass. - Flaky retry-pass:
tests/e2e/views.spec.tshadERR_EMPTY_RESPONSEonce, then passed on retry.
Recent Commits
a15ec82 fix: require token for production metrics smoke1d12829 fix: harden production security controlsafd86b9 fix: reject missing rendered ecs secrets85bdd41 fix: exclude env files from docker builds4f1b888 fix: route web api calls internally
Key files touched
.dockerignore— excludes.envand/.envfrom Docker contexts while allowing examples.apps/api/internal/auth/auth.go:33— principal now carries PAT scopes.apps/api/internal/auth/auth.go:97andapps/api/internal/auth/auth.go:138— PAT scope enforcement for read/write methods.apps/api/internal/auth/auth.go:162— CSRF fails closed in production when allowed origins are not configured.apps/api/internal/auth/auth.go:682— PAT scopes are loaded frompersonal_access_token.scopes, defaulting legacy empty PAT scopes to read-only.apps/api/internal/tokens/handler.go:82— unsupported PAT scopes are rejected instead of silently granting arbitrary scope strings.apps/api/internal/sanitizehtml/html.go:18— central rich-text sanitizer using bluemonday.apps/api/internal/issues/handler.go:652andapps/api/internal/issues/handler.go:755— issue descriptions sanitized on create/update.apps/api/internal/inbound/handler.go:212andapps/api/internal/inbound/handler.go:214— inbound email HTML/plaintext descriptions sanitized before issue creation.apps/api/internal/authproviders/firstparty.go:58andapps/api/internal/authproviders/firstparty.go:161— server-side workspace auth-method policy enforced for Google and magic link starts.apps/api/internal/authproviders/oauth.go:50— invalid OAuth redirect URI returns JSON 400 instead of redirecting to attacker-controlled URLs.apps/api/internal/workspaces/api_current.go:461andapps/api/internal/workspaces/api_current.go:482— server-side OAuth redirect/scope validation for workspace OAuth app create/update.apps/api/internal/http/router.go:72andapps/api/internal/http/router.go:153— RED metrics endpoint requiresEXPONENTIAL_METRICS_TOKENin production.apps/web/next.config.js:27— security headers/CSP configured.infra/ecs/api-task-definition.json:69— API task receivesEXPONENTIAL_METRICS_TOKENfrom Secrets Manager.scripts/smoke-prod.sh:8andscripts/smoke-prod.sh:46— production smoke reads the metrics token and sendsX-Metrics-Token.
Learnings
.env.prod.backupwas ignored by git and not found in git history, but it was still a Docker build-context risk until.dockerignorewas hardened. Final images checked earlier did not contain.env, but excluding all.envfrom context is the durable fix.- PR #553 was merged to
mainas merge commitb26f6a4; this handoff file is intentionally force-added under.handoffs/despite the directory normally being gitignored. - Production deploy initially required a metrics token because the new metrics guard made
/api/metrics/redprivate. A new AWS Secrets Manager secret was created:arn:aws:secretsmanager:us-east-1:699486076867:secret:exponential/metrics-token-1cAwo4, and GitHub repo variableMETRICS_TOKEN_SECRET_ARNwas set to that ARN. - A manual GitHub Actions deploy run was started on the branch and then cancelled because it stayed queued/in-progress without jobs; local break-glass deploy succeeded.
- Rotate secrets after code deploy, not before, so the deployed code can consume the new protected secret paths and metrics smoke can authenticate.
Blockers
- No current deployment blocker. Production is running the hardened images.
- Remaining security hygiene work is secret rotation by the human/operator.
- Full E2E has one unrelated failure in workspace AI settings that should be fixed separately before treating the whole suite as green.
Next steps
- Rotate any real secrets that were in
.env.prod.backup: session secret, DB password/URL, Redis auth if present, Google OAuth client secret, inbound/webhook secrets, and any copied PAT/API tokens. - After each secret rotation, force ECS service redeploy so tasks reload Secrets Manager values:
aws ecs update-service --cluster exponential-cluster --service exponential-api --force-new-deployment --region us-east-1 && aws ecs update-service --cluster exponential-cluster --service exponential-web --force-new-deployment --region us-east-1
- Re-run production smoke after rotation.
- Separately fix or update
workspace-ai-settings.spec.tsself-demotion expectation/API behavior, then re-run fullmake test-e2e.