Atria - CDK Stack Refactoring Guide

Atria - CDK Stack Refactoring Guide

Overview

This document summarizes the refactoring of the Atria CDK infrastructure from a single monolithic stack to a modular nested stack architecture. The primary motivation was to overcome CloudFormation's 500-resource limit that was blocking deployments.


CDK Stack Refactoring Guide

Overview

This guide explains how to migrate from the monolithic cdk-stack.ts to a nested stack architecture that avoids the CloudFormation 500 resource limit.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐ │ CdkStackRefactored (Main) │ │ STATEFUL (preserved): │ │ • Cognito User Pool, Client, Identity Pool │ │ • Shared S3 Bucket │ │ • Shared Email Notification Lambda │ │ COORDINATION: │ │ • SSM Parameters (write ARNs/IDs for nested stacks) │ │ • Amplify App (collects all API URLs) │ │ • Stack Outputs │ └─────────────────────────────────────────────────────────────────────────────┘ ┌───────────────┬───────────────┼───────────────┬───────────────┐ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │AdminApi │ │DeviceHealth│ │Installer │ │SiteSurvey │ │GrefInstalls│ │Stack │ │Stack │ │Stack │ │Stack │ │Stack │ │~50 res │ │~80 res │ │~150 res │ │~80 res │ │~60 res │ └──────────┘ └───────────┘ └───────────┘ └───────────┘ └───────────┘ │ │ ▼ ▼ ┌───────────┐ ┌───────────────┐ │SitePictures│ │DeviceDashboard│ │Stack │ │Stack (existing)│ │~40 res │ │~50 res │ └───────────┘ └───────────────┘

Key Principles

1. Stateful Resources Stay in Main Stack

These resources contain user data and must be preserved across deployments:

  • Cognito User Pool - User accounts and authentication

  • Cognito Identity Pool - Federated identities

  • Shared S3 Bucket - User uploads, images, exports

2. SSM Parameter Store for Decoupling

Instead of passing construct references (which creates CloudFormation dependencies), we use SSM:

// Main stack WRITES: ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ARN, userPool.userPoolArn) // Nested stack READS: const userPoolArn = ssm.StringParameter.valueForStringParameter( this, SSM_PATHS.COGNITO.USER_POOL_ARN )

Benefits:

  • No circular dependencies

  • Stacks can be deployed independently

  • Easy to reference values across stack boundaries

3. Each Applet = One Nested Stack

Each feature area becomes its own nested stack:

  • DeviceHealthNestedStack - Device health monitoring

  • InstallerNestedStack - Installation workflow

  • SiteSurveyNestedStack - Site survey forms

  • GrefInstallsNestedStack - GREF installation tracking

  • SitePicturesNestedStack - Site photo management

  • ProvisioningNestedStack - Device provisioning

  • AdminApiNestedStack - Admin user management

  • DeviceDashboardStack - Device dashboard (already exists)

Migration Steps

Phase 1: Setup Shared Infrastructure (DO FIRST)

  1. Create the shared utilities:

    • lib/shared/ssm-parameters.ts - SSM path constants and helpers

    • lib/shared/api-utils.ts - Reusable API Gateway configurations

    • lib/shared/index.ts - Exports

  2. Update existing stack to write SSM parameters:

    // Add to existing cdk-stack.ts temporarily const ssmWriter = new SsmParameterWriter(this, "Main") ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ARN, userPool.userPoolArn) // ... etc
  3. Deploy to create SSM parameters.

Phase 2: Migrate One Stack at a Time

Recommended order (least dependencies first):

  1. ✅ DeviceDashboardStack (already done)

  2. ✅ DeviceHealthStack (example provided)

  3. SitePicturesStack (simple, few dependencies)

  4. GrefInstallsStack (self-contained)

  5. ProvisioningStack (self-contained)

  6. SiteSurveyStack (depends on shared email lambda)

  7. InstallerStack (most complex, has step functions)

  8. AdminApiStack (depends on Device Dashboard table)

For each migration:

  1. Create the nested stack file in lib/stacks/

  2. Move the relevant resources from cdk-stack.ts

  3. Replace construct references with SSM parameter reads

  4. Test deployment with cdk diff

  5. Deploy and verify functionality

  6. Remove migrated code from cdk-stack.ts

Phase 3: Switch to Refactored Main Stack

Once all nested stacks are created:

  1. Rename cdk-stack.ts to cdk-stack-legacy.ts

  2. Rename cdk-stack-refactored.ts to cdk-stack.ts

  3. Update bin/cdk.ts to use the new stack

  4. Deploy with cdk deploy

Handling Specific Resources

API Gateway Authorizers

Problem: Cognito authorizers need User Pool reference, creating dependencies.

Solution: Use CfnAuthorizer (L1 construct) with ARN string:

// Read User Pool ARN from SSM const userPoolArn = ssm.StringParameter.valueForStringParameter( this, SSM_PATHS.COGNITO.USER_POOL_ARN ) // Create authorizer using ARN (no construct dependency) const authorizer = new apigateway.CfnAuthorizer(this, "MyAuthorizer", { name: "MyAuthorizer", type: "COGNITO_USER_POOLS", restApiId: api.restApiId, identitySource: "method.request.header.Authorization", providerArns: [userPoolArn], })

Cross-Stack Table Access

Problem: Stack A needs to write to a table in Stack B.

Solution: Use ARN-based permissions:

// Stack B writes table name to SSM ssmWriter.write(SSM_PATHS.DYNAMODB.MY_TABLE, table.tableName) // Stack A reads and grants permission by ARN const tableName = ssm.StringParameter.valueForStringParameter(...) lambdaRole.addToPolicy(new iam.PolicyStatement({ actions: ["dynamodb:GetItem", "dynamodb:PutItem"], resources: [ `arn:aws:dynamodb:${cdk.Aws.REGION}:${cdk.Aws.ACCOUNT_ID}:table/${tableName}`, ], }))

Lambda Cross-Invocation

Problem: Lambda A needs to invoke Lambda B in another stack.

Solution: Write function ARN to SSM:

// Stack B writes Lambda ARN ssmWriter.write(SSM_PATHS.LAMBDA.MY_FUNCTION_ARN, myLambda.functionArn) // Stack A grants invoke permission by ARN const functionArn = ssm.StringParameter.valueForStringParameter(...) lambdaRole.addToPolicy(new iam.PolicyStatement({ actions: ["lambda:InvokeFunction"], resources: [functionArn], }))

SSM Parameter Naming Convention

/atria/{category}/{resource-name} Examples: /atria/cognito/user-pool-id /atria/cognito/user-pool-arn /atria/s3/shared-bucket-name /atria/api/device-health-url /atria/dynamodb/installer-device-details-table /atria/lambda/shared-email-notification-arn

Deployment Strategy

Option A: Blue-Green (Recommended for Production)

  1. Deploy refactored stack with different ID (e.g., AtriaV2)

  2. Test thoroughly

  3. Update DNS/Amplify to point to new stack

  4. Keep old stack running as backup

  5. Delete old stack once verified

Option B: In-Place (Development/Staging)

  1. Backup all stateful resources

  2. Add removalPolicy: cdk.RemovalPolicy.RETAIN to stateful resources

  3. Deploy nested stacks incrementally

  4. Verify each migration step

  5. Remove migrated code from main stack

Avoiding Common Pitfalls

1. Circular Dependencies

Symptom: CDK synth fails with circular dependency error

Fix: Use SSM parameters instead of direct construct references

2. Resource Replacement

Symptom: CDK wants to delete and recreate resources

Fix:

  • Use removalPolicy: RETAIN for stateful resources

  • Keep resource IDs consistent

  • Use tableName, bucketName, etc. to maintain physical resource names

3. CloudFormation Limits

Nested Stack Limits:

  • Each nested stack can have up to 500 resources

  • You can have multiple nested stacks

  • Max stack depth is 5 levels (root → nested1 → nested2 → etc.)

4. SSM Parameter Timing

Symptom: Nested stack can't find SSM parameter

Fix: Deploy in correct order:

  1. Main stack (writes SSM parameters)

  2. Nested stacks (read SSM parameters)

For initial deployment, you may need:

cdk deploy MainStack cdk deploy MainStack/DeviceHealthStack

Testing Checklist

Before each deployment:

  • [ ] Run cdk diff to review changes

  • [ ] Check no stateful resources are being replaced

  • [ ] Verify API URLs are correctly propagated

  • [ ] Test authentication flow works

  • [ ] Verify Lambda environment variables

  • [ ] Check IAM permissions are sufficient

Rollback Plan

If issues occur:

  1. Revert to legacy cdk-stack.ts

  2. Deploy to restore original resources

  3. SSM parameters don't affect functionality if unused

  4. Nested stacks can be deleted without affecting main stack stateful resources

File Structure After Migration

devops/cdk/lib/ ├── shared/ │ ├── index.ts # Shared exports │ ├── ssm-parameters.ts # SSM path constants and helpers │ └── api-utils.ts # API Gateway configurations ├── stacks/ │ ├── device-health-stack.ts # Device Health nested stack │ ├── installer-stack.ts # Installer nested stack │ ├── site-survey-stack.ts # Site Survey nested stack │ ├── gref-installs-stack.ts # GREF Installs nested stack │ ├── site-pictures-stack.ts # Site Pictures nested stack │ ├── provisioning-stack.ts # Provisioning nested stack │ └── admin-api-stack.ts # Admin API nested stack ├── device-dashboard-stack.ts # Existing nested stack ├── cdk-stack.ts # Refactored main stack └── cdk-stack-legacy.ts # Original monolithic stack (backup)

Estimated Resource Counts

Stack

Approx Resources

Status

Stack

Approx Resources

Status

Main (Stateful)

~30

Keep in main

DeviceHealth

~80

✅ Migrated

DeviceDashboard

~50

✅ Already nested

Installer

~150

To migrate

SiteSurvey

~80

To migrate

SitePictures

~40

To migrate

GrefInstalls

~60

To migrate

Provisioning

~35

To migrate

AdminApi

~50

To migrate

Total

~575

Well under limits

With this architecture, each nested stack stays well under the 500 resource limit, and the main stack only contains stateful resources and coordination logic.

Atria CDK Nested Stack Migration Plan

Executive Summary

This document outlines the step-by-step plan to migrate the monolithic cdk-stack.ts (4,058 lines, ~575 resources) into modular nested stacks to resolve the CloudFormation 500 resource limit error.

Timeline Estimate: 2-3 weeks (1-2 days per applet)


Phase 0: Preparation (Day 1)

0.1 Understand the Pattern

Review the example nested stack at lib/stacks/device-health-stack.ts:

// KEY PATTERN 1: Read shared resources from SSM (not direct references) const userPoolArn = ssm.StringParameter.valueForStringParameter( this, SSM_PATHS.COGNITO.USER_POOL_ARN ) // KEY PATTERN 2: Use CfnAuthorizer with ARN string (avoids construct dependency) const authorizer = createCfnAuthorizer(this, "MyAuthorizer", api, userPoolArn) // KEY PATTERN 3: Write outputs to SSM for other stacks ssmWriter.write(SSM_PATHS.API.DEVICE_HEALTH_API_URL, api.url)

0.2 Backup Current State

# Create a backup branch git checkout -b backup/pre-nested-stack-migration git add . git commit -m "Backup before nested stack migration" git push origin backup/pre-nested-stack-migration # Return to main development git checkout dev

0.3 Verify Shared Utilities Exist

Confirm these files are in place:

  • [ ] lib/shared/ssm-parameters.ts

  • [ ] lib/shared/api-utils.ts

  • [ ] lib/shared/index.ts

  • [ ] lib/stacks/device-health-stack.ts (example)


Phase 1: Deploy SSM Parameters (Day 1)

1.1 Add SSM Parameter Writes to Existing Stack

Add this code block to the end of the current cdk-stack.ts (before the closing braces):

import { SsmParameterWriter, SSM_PATHS } from "./shared/ssm-parameters" // ... at the end of constructor, before closing brace ... // ========================================================================= // SSM PARAMETERS (for nested stack migration) // ========================================================================= const ssmWriter = new SsmParameterWriter(this, "Migration") // Cognito ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ID, atriaUserPool.userPoolId) ssmWriter.write(SSM_PATHS.COGNITO.USER_POOL_ARN, atriaUserPool.userPoolArn) ssmWriter.write( SSM_PATHS.COGNITO.USER_POOL_CLIENT_ID, atriaUserPoolClient.userPoolClientId ) ssmWriter.write(SSM_PATHS.COGNITO.IDENTITY_POOL_ID, atriaIdentityPool.ref) // S3 ssmWriter.write(SSM_PATHS.S3.SHARED_BUCKET_NAME, atriaSharedS3Bucket.bucketName) ssmWriter.write(SSM_PATHS.S3.SHARED_BUCKET_ARN, atriaSharedS3Bucket.bucketArn) // Lambda ssmWriter.write( SSM_PATHS.LAMBDA.SHARED_EMAIL_NOTIFICATION_ARN, sharedEmailNotificationLambda.functionArn ) ssmWriter.write( SSM_PATHS.LAMBDA.SHARED_EMAIL_NOTIFICATION_NAME, sharedEmailNotificationLambda.functionName )

1.2 Deploy SSM Parameters

cd devops/cdk npx cdk diff npx cdk deploy

1.3 Verify SSM Parameters

aws ssm get-parameters-by-path --path "/atria/" --recursive

Expected: 8+ parameters created


Phase 2: Migrate Applets (Days 2-10)

Migration Order (Simplest → Most Complex)

Order

Applet

Est. Resources

Dependencies

Complexity

Order

Applet

Est. Resources

Dependencies

Complexity

1

Site Pictures

~40

Shared S3, Cognito

⭐ Simple

2

GREF Installs

~60

Cognito only

⭐ Simple

3

Provisioning

~35

Cognito, S3

⭐ Simple

4

Device Health

~80

Cognito, S3, External tables

⭐⭐ Medium

5

Site Survey

~80

Cognito, S3, Email Lambda

⭐⭐ Medium

6

Admin API

~50

Cognito, Device Dashboard table

⭐⭐ Medium

7

Installer

~150

Cognito, S3, Step Functions, Multiple tables

⭐⭐⭐ Complex


2.1 Site Pictures Stack (Day 2)

Location: lib/stacks/site-pictures-stack.ts

Resources to migrate:

  • [ ] AtriaSitePicturesLambdaRole

  • [ ] AtriaSitePicturesLambda

  • [ ] AtriaSitePicturesAPI (API Gateway)

  • [ ] AtriaSitePicturesAuthorizer

  • [ ] All API resources and methods (~15 resources)

Steps:

  1. Create lib/stacks/site-pictures-stack.ts

  2. Copy relevant code from cdk-stack.ts lines ~2280-2620

  3. Replace atriaUserPool references with SSM parameter reads

  4. Replace atriaSharedS3Bucket references with SSM reads

  5. Add to main stack: new SitePicturesNestedStack(this, "SitePicturesStack", {...})

  6. Deploy and test

  7. Remove migrated code from cdk-stack.ts

Test checklist:

  • [ ] GET /sitespictures/{siteId}/pictures returns data

  • [ ] POST /sitespictures/{siteId}/pictures/presigned-upload returns URL

  • [ ] POST /sitespictures/{siteId}/pictures/upload-complete succeeds

  • [ ] DELETE /sitespictures/{siteId}/pictures works

  • [ ] Authentication with Cognito works


2.2 GREF Installs Stack (Day 3)

Location: lib/stacks/gref-installs-stack.ts

Resources to migrate:

  • [ ] AtriaGrefInstallsS3Bucket

  • [ ] AtriaGrefInstallsResourcesTable (DynamoDB)

  • [ ] AtriaGrefInstallsDevicesTable (DynamoDB + GSI)

  • [ ] AtriaGrefInstallsLambdaRole

  • [ ] AtriaGrefInstallsLambda

  • [ ] AtriaGrefInstallsAPI (API Gateway)

  • [ ] All API resources and methods (~20 resources)

Steps:

  1. Create lib/stacks/gref-installs-stack.ts

  2. Copy relevant code from cdk-stack.ts lines ~3150-3600

  3. Replace Cognito references with SSM reads

  4. Deploy and test

  5. Remove migrated code from cdk-stack.ts

Test checklist:

  • [ ] GET /gref/sites returns site list

  • [ ] GET /gref/sites/{siteId} returns site details

  • [ ] POST /gref/sites/{siteId} creates site data

  • [ ] GET /gref/sites/{siteId}/json returns JSON

  • [ ] PUT /gref/sites/{siteId}/json updates JSON

  • [ ] GET /gref/hardware/unassigned returns hardware

  • [ ] GET /gref/units/scalar returns units


2.3 Provisioning Stack (Day 4)

Location: lib/stacks/provisioning-stack.ts

Resources to migrate:

  • [ ] AtriaProvisioningS3Bucket

  • [ ] AtriaProvisioningLambda

  • [ ] AtriaProvisioningAPI (API Gateway)

  • [ ] AtriaProvisioningAuthorizer

  • [ ] Proxy resource and methods

Steps:

  1. Create lib/stacks/provisioning-stack.ts

  2. Copy relevant code from cdk-stack.ts lines ~975-1100

  3. Replace references with SSM reads

  4. Deploy and test

  5. Remove migrated code

Test checklist:

  • [ ] API responds to requests

  • [ ] S3 bucket operations work

  • [ ] Authentication works


2.4 Device Health Stack (Day 5)

Note: Example already created at lib/stacks/device-health-stack.ts

Resources to migrate:

  • [ ] AtriaDeviceHealthUserDevicesTable (DynamoDB)

  • [ ] AtriaDeviceHealthFeedbackTable (DynamoDB)

  • [ ] AtriaDeviceHealthLambdaRole

  • [ ] AtriaDeviceHealthUserDevicesLambda

  • [ ] AtriaDeviceHealthSiteHealthLambda