Atria - CDK Stack Refactoring Guide
Overview
This document summarizes the refactoring of the Atria CDK infrastructure from a single monolithic stack to a modular nested stack architecture. The primary motivation was to overcome CloudFormation's 500-resource limit that was blocking deployments.
CDK Stack Refactoring Guide
Overview
This guide explains how to migrate from the monolithic cdk-stack.ts to a nested stack architecture that avoids the CloudFormation 500 resource limit.
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ CdkStackRefactored (Main) │
│ STATEFUL (preserved): │
│ • Cognito User Pool, Client, Identity Pool │
│ • Shared S3 Bucket │
│ • Shared Email Notification Lambda │
│ COORDINATION: │
│ • SSM Parameters (write ARNs/IDs for nested stacks) │
│ • Amplify App (collects all API URLs) │
│ • Stack Outputs │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌───────────────┬───────────────┼───────────────┬───────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌──────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
│AdminApi │ │DeviceHealth│ │Installer │ │SiteSurvey │ │GrefInstalls│
│Stack │ │Stack │ │Stack │ │Stack │ │Stack │
│~50 res │ │~80 res │ │~150 res │ │~80 res │ │~60 res │
└──────────┘ └───────────┘ └───────────┘ └───────────┘ └───────────┘
│ │
▼ ▼
┌───────────┐ ┌───────────────┐
│SitePictures│ │DeviceDashboard│
│Stack │ │Stack (existing)│
│~40 res │ │~50 res │
└───────────┘ └───────────────┘Key Principles
1. Stateful Resources Stay in Main Stack
These resources contain user data and must be preserved across deployments:
Cognito User Pool - User accounts and authentication
Cognito Identity Pool - Federated identities
Shared S3 Bucket - User uploads, images, exports
2. SSM Parameter Store for Decoupling
Instead of passing construct references (which creates CloudFormation dependencies), we use SSM:
// Main stack WRITES:
ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ARN, userPool.userPoolArn)
// Nested stack READS:
const userPoolArn = ssm.StringParameter.valueForStringParameter(
this,
SSM_PATHS.COGNITO.USER_POOL_ARN
)Benefits:
No circular dependencies
Stacks can be deployed independently
Easy to reference values across stack boundaries
3. Each Applet = One Nested Stack
Each feature area becomes its own nested stack:
DeviceHealthNestedStack - Device health monitoring
InstallerNestedStack - Installation workflow
SiteSurveyNestedStack - Site survey forms
GrefInstallsNestedStack - GREF installation tracking
SitePicturesNestedStack - Site photo management
ProvisioningNestedStack - Device provisioning
AdminApiNestedStack - Admin user management
DeviceDashboardStack - Device dashboard (already exists)
Migration Steps
Phase 1: Setup Shared Infrastructure (DO FIRST)
Create the shared utilities:
lib/shared/ssm-parameters.ts- SSM path constants and helperslib/shared/api-utils.ts- Reusable API Gateway configurationslib/shared/index.ts- Exports
Update existing stack to write SSM parameters:
// Add to existing cdk-stack.ts temporarily const ssmWriter = new SsmParameterWriter(this, "Main") ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ARN, userPool.userPoolArn) // ... etcDeploy to create SSM parameters.
Phase 2: Migrate One Stack at a Time
Recommended order (least dependencies first):
✅ DeviceDashboardStack (already done)
✅ DeviceHealthStack (example provided)
SitePicturesStack (simple, few dependencies)
GrefInstallsStack (self-contained)
ProvisioningStack (self-contained)
SiteSurveyStack (depends on shared email lambda)
InstallerStack (most complex, has step functions)
AdminApiStack (depends on Device Dashboard table)
For each migration:
Create the nested stack file in
lib/stacks/Move the relevant resources from
cdk-stack.tsReplace construct references with SSM parameter reads
Test deployment with
cdk diffDeploy and verify functionality
Remove migrated code from
cdk-stack.ts
Phase 3: Switch to Refactored Main Stack
Once all nested stacks are created:
Rename
cdk-stack.tstocdk-stack-legacy.tsRename
cdk-stack-refactored.tstocdk-stack.tsUpdate
bin/cdk.tsto use the new stackDeploy with
cdk deploy
Handling Specific Resources
API Gateway Authorizers
Problem: Cognito authorizers need User Pool reference, creating dependencies.
Solution: Use CfnAuthorizer (L1 construct) with ARN string:
// Read User Pool ARN from SSM
const userPoolArn = ssm.StringParameter.valueForStringParameter(
this,
SSM_PATHS.COGNITO.USER_POOL_ARN
)
// Create authorizer using ARN (no construct dependency)
const authorizer = new apigateway.CfnAuthorizer(this, "MyAuthorizer", {
name: "MyAuthorizer",
type: "COGNITO_USER_POOLS",
restApiId: api.restApiId,
identitySource: "method.request.header.Authorization",
providerArns: [userPoolArn],
})Cross-Stack Table Access
Problem: Stack A needs to write to a table in Stack B.
Solution: Use ARN-based permissions:
// Stack B writes table name to SSM
ssmWriter.write(SSM_PATHS.DYNAMODB.MY_TABLE, table.tableName)
// Stack A reads and grants permission by ARN
const tableName = ssm.StringParameter.valueForStringParameter(...)
lambdaRole.addToPolicy(new iam.PolicyStatement({
actions: ["dynamodb:GetItem", "dynamodb:PutItem"],
resources: [
`arn:aws:dynamodb:${cdk.Aws.REGION}:${cdk.Aws.ACCOUNT_ID}:table/${tableName}`,
],
}))Lambda Cross-Invocation
Problem: Lambda A needs to invoke Lambda B in another stack.
Solution: Write function ARN to SSM:
// Stack B writes Lambda ARN
ssmWriter.write(SSM_PATHS.LAMBDA.MY_FUNCTION_ARN, myLambda.functionArn)
// Stack A grants invoke permission by ARN
const functionArn = ssm.StringParameter.valueForStringParameter(...)
lambdaRole.addToPolicy(new iam.PolicyStatement({
actions: ["lambda:InvokeFunction"],
resources: [functionArn],
}))SSM Parameter Naming Convention
/atria/{category}/{resource-name}
Examples:
/atria/cognito/user-pool-id
/atria/cognito/user-pool-arn
/atria/s3/shared-bucket-name
/atria/api/device-health-url
/atria/dynamodb/installer-device-details-table
/atria/lambda/shared-email-notification-arnDeployment Strategy
Option A: Blue-Green (Recommended for Production)
Deploy refactored stack with different ID (e.g.,
AtriaV2)Test thoroughly
Update DNS/Amplify to point to new stack
Keep old stack running as backup
Delete old stack once verified
Option B: In-Place (Development/Staging)
Backup all stateful resources
Add
removalPolicy: cdk.RemovalPolicy.RETAINto stateful resourcesDeploy nested stacks incrementally
Verify each migration step
Remove migrated code from main stack
Avoiding Common Pitfalls
1. Circular Dependencies
Symptom: CDK synth fails with circular dependency error
Fix: Use SSM parameters instead of direct construct references
2. Resource Replacement
Symptom: CDK wants to delete and recreate resources
Fix:
Use
removalPolicy: RETAINfor stateful resourcesKeep resource IDs consistent
Use
tableName,bucketName, etc. to maintain physical resource names
3. CloudFormation Limits
Nested Stack Limits:
Each nested stack can have up to 500 resources
You can have multiple nested stacks
Max stack depth is 5 levels (root → nested1 → nested2 → etc.)
4. SSM Parameter Timing
Symptom: Nested stack can't find SSM parameter
Fix: Deploy in correct order:
Main stack (writes SSM parameters)
Nested stacks (read SSM parameters)
For initial deployment, you may need:
cdk deploy MainStack
cdk deploy MainStack/DeviceHealthStackTesting Checklist
Before each deployment:
[ ] Run
cdk diffto review changes[ ] Check no stateful resources are being replaced
[ ] Verify API URLs are correctly propagated
[ ] Test authentication flow works
[ ] Verify Lambda environment variables
[ ] Check IAM permissions are sufficient
Rollback Plan
If issues occur:
Revert to legacy
cdk-stack.tsDeploy to restore original resources
SSM parameters don't affect functionality if unused
Nested stacks can be deleted without affecting main stack stateful resources
File Structure After Migration
devops/cdk/lib/
├── shared/
│ ├── index.ts # Shared exports
│ ├── ssm-parameters.ts # SSM path constants and helpers
│ └── api-utils.ts # API Gateway configurations
├── stacks/
│ ├── device-health-stack.ts # Device Health nested stack
│ ├── installer-stack.ts # Installer nested stack
│ ├── site-survey-stack.ts # Site Survey nested stack
│ ├── gref-installs-stack.ts # GREF Installs nested stack
│ ├── site-pictures-stack.ts # Site Pictures nested stack
│ ├── provisioning-stack.ts # Provisioning nested stack
│ └── admin-api-stack.ts # Admin API nested stack
├── device-dashboard-stack.ts # Existing nested stack
├── cdk-stack.ts # Refactored main stack
└── cdk-stack-legacy.ts # Original monolithic stack (backup)Estimated Resource Counts
Stack | Approx Resources | Status |
|---|---|---|
Main (Stateful) | ~30 | Keep in main |
DeviceHealth | ~80 | ✅ Migrated |
DeviceDashboard | ~50 | ✅ Already nested |
Installer | ~150 | To migrate |
SiteSurvey | ~80 | To migrate |
SitePictures | ~40 | To migrate |
GrefInstalls | ~60 | To migrate |
Provisioning | ~35 | To migrate |
AdminApi | ~50 | To migrate |
Total | ~575 | Well under limits |
With this architecture, each nested stack stays well under the 500 resource limit, and the main stack only contains stateful resources and coordination logic.
Atria CDK Nested Stack Migration Plan
Executive Summary
This document outlines the step-by-step plan to migrate the monolithic cdk-stack.ts (4,058 lines, ~575 resources) into modular nested stacks to resolve the CloudFormation 500 resource limit error.
Timeline Estimate: 2-3 weeks (1-2 days per applet)
Phase 0: Preparation (Day 1)
0.1 Understand the Pattern
Review the example nested stack at lib/stacks/device-health-stack.ts:
// KEY PATTERN 1: Read shared resources from SSM (not direct references)
const userPoolArn = ssm.StringParameter.valueForStringParameter(
this,
SSM_PATHS.COGNITO.USER_POOL_ARN
)
// KEY PATTERN 2: Use CfnAuthorizer with ARN string (avoids construct dependency)
const authorizer = createCfnAuthorizer(this, "MyAuthorizer", api, userPoolArn)
// KEY PATTERN 3: Write outputs to SSM for other stacks
ssmWriter.write(SSM_PATHS.API.DEVICE_HEALTH_API_URL, api.url)0.2 Backup Current State
# Create a backup branch
git checkout -b backup/pre-nested-stack-migration
git add .
git commit -m "Backup before nested stack migration"
git push origin backup/pre-nested-stack-migration
# Return to main development
git checkout dev0.3 Verify Shared Utilities Exist
Confirm these files are in place:
[ ]
lib/shared/ssm-parameters.ts[ ]
lib/shared/api-utils.ts[ ]
lib/shared/index.ts[ ]
lib/stacks/device-health-stack.ts(example)
Phase 1: Deploy SSM Parameters (Day 1)
1.1 Add SSM Parameter Writes to Existing Stack
Add this code block to the end of the current cdk-stack.ts (before the closing braces):
import { SsmParameterWriter, SSM_PATHS } from "./shared/ssm-parameters"
// ... at the end of constructor, before closing brace ...
// =========================================================================
// SSM PARAMETERS (for nested stack migration)
// =========================================================================
const ssmWriter = new SsmParameterWriter(this, "Migration")
// Cognito
ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ID, atriaUserPool.userPoolId)
ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ARN, atriaUserPool.userPoolArn)
ssmWriter.write(
SSM_PATHS.COGNITO.USER_POOL_CLIENT_ID,
atriaUserPoolClient.userPoolClientId
)
ssmWriter.write(SSM_PATHS.COGNITO.IDENTITY_POOL_ID, atriaIdentityPool.ref)
// S3
ssmWriter.write(SSM_PATHS.S3.SHARED_BUCKET_NAME, atriaSharedS3Bucket.bucketName)
ssmWriter.write(SSM_PATHS.S3.SHARED_BUCKET_ARN, atriaSharedS3Bucket.bucketArn)
// Lambda
ssmWriter.write(
SSM_PATHS.LAMBDA.SHARED_EMAIL_NOTIFICATION_ARN,
sharedEmailNotificationLambda.functionArn
)
ssmWriter.write(
SSM_PATHS.LAMBDA.SHARED_EMAIL_NOTIFICATION_NAME,
sharedEmailNotificationLambda.functionName
)1.2 Deploy SSM Parameters
cd devops/cdk
npx cdk diff
npx cdk deploy1.3 Verify SSM Parameters
aws ssm get-parameters-by-path --path "/atria/" --recursiveExpected: 8+ parameters created
Phase 2: Migrate Applets (Days 2-10)
Migration Order (Simplest → Most Complex)
Order | Applet | Est. Resources | Dependencies | Complexity |
|---|---|---|---|---|
1 | Site Pictures | ~40 | Shared S3, Cognito | ⭐ Simple |
2 | GREF Installs | ~60 | Cognito only | ⭐ Simple |
3 | Provisioning | ~35 | Cognito, S3 | ⭐ Simple |
4 | Device Health | ~80 | Cognito, S3, External tables | ⭐⭐ Medium |
5 | Site Survey | ~80 | Cognito, S3, Email Lambda | ⭐⭐ Medium |
6 | Admin API | ~50 | Cognito, Device Dashboard table | ⭐⭐ Medium |
7 | Installer | ~150 | Cognito, S3, Step Functions, Multiple tables | ⭐⭐⭐ Complex |
2.1 Site Pictures Stack (Day 2)
Location: lib/stacks/site-pictures-stack.ts
Resources to migrate:
[ ]
AtriaSitePicturesLambdaRole[ ]
AtriaSitePicturesLambda[ ]
AtriaSitePicturesAPI(API Gateway)[ ]
AtriaSitePicturesAuthorizer[ ] All API resources and methods (~15 resources)
Steps:
Create
lib/stacks/site-pictures-stack.tsCopy relevant code from
cdk-stack.tslines ~2280-2620Replace
atriaUserPoolreferences with SSM parameter readsReplace
atriaSharedS3Bucketreferences with SSM readsAdd to main stack:
new SitePicturesNestedStack(this, "SitePicturesStack", {...})Deploy and test
Remove migrated code from
cdk-stack.ts
Test checklist:
[ ]
GET /sitespictures/{siteId}/picturesreturns data[ ]
POST /sitespictures/{siteId}/pictures/presigned-uploadreturns URL[ ]
POST /sitespictures/{siteId}/pictures/upload-completesucceeds[ ]
DELETE /sitespictures/{siteId}/picturesworks[ ] Authentication with Cognito works
2.2 GREF Installs Stack (Day 3)
Location: lib/stacks/gref-installs-stack.ts
Resources to migrate:
[ ]
AtriaGrefInstallsS3Bucket[ ]
AtriaGrefInstallsResourcesTable(DynamoDB)[ ]
AtriaGrefInstallsDevicesTable(DynamoDB + GSI)[ ]
AtriaGrefInstallsLambdaRole[ ]
AtriaGrefInstallsLambda[ ]
AtriaGrefInstallsAPI(API Gateway)[ ] All API resources and methods (~20 resources)
Steps:
Create
lib/stacks/gref-installs-stack.tsCopy relevant code from
cdk-stack.tslines ~3150-3600Replace Cognito references with SSM reads
Deploy and test
Remove migrated code from
cdk-stack.ts
Test checklist:
[ ]
GET /gref/sitesreturns site list[ ]
GET /gref/sites/{siteId}returns site details[ ]
POST /gref/sites/{siteId}creates site data[ ]
GET /gref/sites/{siteId}/jsonreturns JSON[ ]
PUT /gref/sites/{siteId}/jsonupdates JSON[ ]
GET /gref/hardware/unassignedreturns hardware[ ]
GET /gref/units/scalarreturns units
2.3 Provisioning Stack (Day 4)
Location: lib/stacks/provisioning-stack.ts
Resources to migrate:
[ ]
AtriaProvisioningS3Bucket[ ]
AtriaProvisioningLambda[ ]
AtriaProvisioningAPI(API Gateway)[ ]
AtriaProvisioningAuthorizer[ ] Proxy resource and methods
Steps:
Create
lib/stacks/provisioning-stack.tsCopy relevant code from
cdk-stack.tslines ~975-1100Replace references with SSM reads
Deploy and test
Remove migrated code
Test checklist:
[ ] API responds to requests
[ ] S3 bucket operations work
[ ] Authentication works
2.4 Device Health Stack (Day 5)
Note: Example already created at lib/stacks/device-health-stack.ts
Resources to migrate:
[ ]
AtriaDeviceHealthUserDevicesTable(DynamoDB)[ ]
AtriaDeviceHealthFeedbackTable(DynamoDB)[ ]
AtriaDeviceHealthLambdaRole[ ]
AtriaDeviceHealthUserDevicesLambda[ ]
AtriaDeviceHealthSiteHealthLambda