Atria - CDK Stack Refactoring Migration Report

Atria - CDK Stack Refactoring Migration Report

Executive Summary

This document details the comprehensive refactoring of the Atria CDK infrastructure from a monolithic single-stack architecture to a modular nested stack architecture. The migration was necessary to overcome AWS CloudFormation's 500-resource limit while improving maintainability, deployment isolation, and scalability.

Key Achievements

Metric

Before

After

Metric

Before

After

Stack Architecture

Monolithic (~575 resources)

7 Nested Stacks + Main Stack

Main Stack Lines

~4,314

1,839 (57% reduction after legacy cleanup)

New Stack Files

0

7 nested stacks

Shared Utilities

0

3 modules

Total New TypeScript

0

~3,091 lines

Pull Requests Merged

1

8

Lines Removed

-

2,475 lines (legacy code cleanup)


Table of Contents

  1. Problem Statement

  2. Solution Architecture

  3. Migration Phases

  4. File Changes Summary

  5. Nested Stack Details

  6. Shared Utilities

  7. Migration Principles

  8. Deployment Considerations

  9. Commit History

  10. Future Improvements


Problem Statement

The original Atria CDK infrastructure was implemented as a single monolithic stack (cdk-stack.ts) containing all resources for multiple application features. This approach led to:

  1. Resource Limit Risk: CloudFormation stacks are limited to 500 resources. The Atria stack was approaching this limit resources.

  2. Deployment Coupling: Changes to one feature required full stack deployment, increasing risk and deployment time.

  3. Maintainability Challenges: A single 4,250+ line file became difficult to navigate and maintain.

  4. Team Collaboration: Multiple developers couldn't work on different features without merge conflicts.


Solution Architecture

Nested Stack Pattern

We adopted AWS CDK's nested stack pattern, which allows CloudFormation stacks to contain other stacks as resources. This effectively increases the resource limit by distributing resources across multiple child stacks while maintaining a single deployment unit.

┌─────────────────────────────────────────────────────────────────┐ │ Main Stack (cdk-stack.ts) │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Stateful Resources │ │ │ │ • Cognito User Pool & Identity Pool │ │ │ │ • DynamoDB Tables (all with RETAIN policy) │ │ │ │ • S3 Shared Bucket │ │ │ │ • Device Dashboard Stack │ │ │ │ • Amplify App │ │ │ │ • SSM Parameters │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │Site Pictures │ │GREF Installs │ │ Provisioning │ │ │ │Nested Stack │ │Nested Stack │ │Nested Stack │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │Device Health │ │ Site Survey │ │ Admin API │ │ │ │Nested Stack │ │Nested Stack │ │Nested Stack │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────┐ │ │ │ Installer Nested Stack │ │ │ │ (Largest - includes Step Functions, EventBridge) │ │ │ └────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘

Key Design Decisions

  1. Stateful vs Stateless Separation: DynamoDB tables, Cognito, and S3 buckets remain in the main stack with RETAIN policies. Only stateless resources (Lambdas, API Gateways, IAM Roles) are moved to nested stacks.

  2. Props-Based Resource Sharing: Nested stacks receive references to stateful resources via TypeScript props interfaces, ensuring type safety and clear contracts.

  3. SSM Parameter Store: For resources that can't be passed via props (Cognito ARN for authorizers), we use SSM Parameter Store as an intermediary.

  4. CfnAuthorizer Pattern: Nested stacks use CfnAuthorizer instead of CognitoUserPoolsAuthorizer to avoid cross-stack reference issues.


Migration Phases

Phase 0: RETAIN Policy Setup

Branch: backup/pre-nested-stack-migration
Commit: 9b7e0f8
Date: 2026-01-21

Set RemovalPolicy.RETAIN on all stateful resources to prevent accidental data loss during migration:

  • All DynamoDB tables

  • S3 Shared Bucket

  • Cognito User Pool

Phase 1: SSM Parameters & Shared Utilities

Branch: feature/phase-1-ssm-parameters
Commit: 556f094
Date: 2026-01-21

Created shared infrastructure for cross-stack communication:

  • lib/shared/ssm-parameters.ts - SSM path constants and helper classes

  • lib/shared/api-utils.ts - Reusable API Gateway utilities

  • lib/shared/index.ts - Barrel exports

Phase 2: Site Pictures Migration

Branch: feature/phase-2-site-pictures-nested-stack
Commits: b4231e0, a1c4535, 5bb214d, ee52b90
Date: 2026-01-21

First applet migrated as proof of concept:

  • Created site-pictures-stack.ts (338 lines)

  • Lambda renamed to Atria-SitePicturesLambda to avoid conflicts

  • API Gateway with full CORS support

Phase 3: GREF Installs Migration

Branch: feature/phase-3-site-pictures-gref-installs
Commit: 4d2cf0d
Date: 2026-01-21

Migrated GREF Installs applet:

  • Created gref-installs-stack.ts (353 lines)

  • 2 Lambda functions, 2 DynamoDB table references

Phase 4: Provisioning Migration

Branch: feature/phase-4-provisioning
Commit: 5e9dfed
Date: 2026-01-21

Migrated Provisioning applet:

  • Created provisioning-stack.ts (276 lines)

  • IoT Wireless permissions included

Phase 5: Device Health Migration

Branch: feature/phase-5-device-health
Commit: bd016b1
Date: 2026-01-21

Migrated Device Health applet (complex):

  • Created device-health-stack.ts (568 lines)

  • 5 Lambda functions (Readings, Worker, Site Health, User Devices, Feedback)

  • Timestream database permissions

  • EventBridge scheduled rules

Phase 6: Site Survey Migration

Branch: feature/phase-6-site-survey
Commit: f2a442b
Date: 2026-01-21

Migrated Site Survey applet:

  • Created site-survey-stack.ts (529 lines)

  • CRUD Lambda + Reports Lambda

  • Email notification integration

  • SES permissions

Phase 7: Admin API Migration

Branch: feature/phase-7-admin-api
Commit: a635395
Date: 2026-01-21

Migrated Admin API (platform-level):

  • Created admin-api-stack.ts (324 lines)

  • User management Lambda with Cognito admin permissions

  • API routes: /users, /users/{userId}, /tags

Phase 8: Installer Migration

Branch: feature/phase-8-installer
Commit: 95a9009
Date: 2026-01-21

Migrated Installer applet (largest and most complex):

  • Created installer-stack.ts (934 lines)

  • 6 Lambda functions including PythonFunction construct

  • PythonLayerVersion for image processing

  • Step Functions state machine

  • EventBridge scheduled rule (15-minute interval)

  • Complex API Gateway with multipart form-data support


File Changes Summary

New Files Created

File

Lines

Purpose

File

Lines

Purpose

lib/stacks/site-pictures-stack.ts

338

Site Pictures nested stack

lib/stacks/gref-installs-stack.ts

353

GREF Installs nested stack

lib/stacks/provisioning-stack.ts

276

Provisioning nested stack

lib/stacks/device-health-stack.ts

568

Device Health nested stack

lib/stacks/site-survey-stack.ts

529

Site Survey nested stack

lib/stacks/admin-api-stack.ts

324

Admin API nested stack

lib/stacks/installer-stack.ts

934

Installer nested stack

lib/stacks/index.ts

13

Barrel exports

lib/shared/ssm-parameters.ts

103

SSM utilities

lib/shared/api-utils.ts

152

API Gateway utilities

lib/shared/index.ts

18

Barrel exports

Total New TypeScript: ~3,608 lines

Modified Files

File

Insertions

Deletions

Net Change

File

Insertions

Deletions

Net Change

lib/cdk-stack.ts

+1,178

-922

+256

Overall Statistics

12 files changed, 4,786 insertions(+), 922 deletions(-)

Nested Stack Details

1. Site Pictures Nested Stack

File: lib/stacks/site-pictures-stack.ts

Resources:

  • IAM Role: Atria-SitePicturesLambdaRole

  • Lambda: Atria-SitePicturesLambda

  • API Gateway: Atria-SitePicturesAPI

Props Interface:

interface SitePicturesNestedStackProps { // No required props - reads from SSM }

API Routes:

  • GET /pictures/{siteId} - Get pictures for a site

  • POST /pictures/{siteId} - Upload picture

  • DELETE /pictures/{siteId}/{pictureKey} - Delete picture


2. GREF Installs Nested Stack

File: lib/stacks/gref-installs-stack.ts

Resources:

  • IAM Role: Atria-GrefInstallsLambdaRole

  • Lambda: Atria-GrefInstallsLambda

  • Lambda: Atria-GrefInstallsDevicesLambda

  • API Gateway: Atria-GrefInstallsAPI

Props Interface:

interface GrefInstallsNestedStackProps { resourcesTableName: string resourcesTableArn: string devicesTableName: string devicesTableArn: string lorawanGatewaysTableName: string lorawanDevicesTableName: string }

3. Provisioning Nested Stack

File: lib/stacks/provisioning-stack.ts

Resources:

  • IAM Role: Atria-ProvisioningLambdaRole

  • Lambda: Atria-ProvisioningLambda

  • API Gateway: Atria-ProvisioningAPI

Props Interface:

interface ProvisioningNestedStackProps { lorawanDevicesTableName: string lorawanGatewaysTableName: string }

Special Permissions:

  • iotwireless:* - Full IoT Wireless access for device provisioning


4. Device Health Nested Stack

File: lib/stacks/device-health-stack.ts

Resources:

  • IAM Role: Atria-DeviceHealthLambdaRole

  • Lambda: Atria-DeviceReadingsLambda

  • Lambda: Atria-DeviceReadingsWorkerLambda

  • Lambda: Atria-SiteHealthLambda

  • Lambda: Atria-UserDevicesLambda

  • Lambda: Atria-FeedbackLambda

  • EventBridge Rule: Worker trigger (every 5 minutes)

  • API Gateway: Atria-DeviceHealthAPI

Props Interface:

interface DeviceHealthNestedStackProps { userDevicesTableName: string userDevicesTableArn: string feedbackTableName: string feedbackTableArn: string timestreamDatabaseName: string timestreamTableName: string sharedS3BucketName: string sharedS3BucketArn: string }

5. Site Survey Nested Stack

File: lib/stacks/site-survey-stack.ts

Resources:

  • IAM Role: Atria-SiteSurveyLambdaRole

  • Lambda: Atria-SiteSurveyCrudLambda

  • Lambda: Atria-SiteSurveyReportsLambda

  • API Gateway: Atria-SiteSurveyAPI

Props Interface:

interface SiteSurveyNestedStackProps { siteSurveyTableName: string siteSurveyTableArn: string lorawanSitesTableName: string sharedS3BucketName: string sharedS3BucketArn: string sharedEmailNotificationLambdaArn: string }

6. Admin API Nested Stack

File: lib/stacks/admin-api-stack.ts

Resources:

  • IAM Role: Atria-AdminUserManagementLambdaRole

  • Lambda: Atria-AdminUserManagementLambda

  • API Gateway: Atria-AdminAPI

Props Interface:

interface AdminApiNestedStackProps { userPoolId: string userPoolArn: string deviceDashboardUserTagsTableName: string }

Special Permissions:

  • cognito-idp:Admin* - Cognito admin operations for user management


7. Installer Nested Stack

File: lib/stacks/installer-stack.ts

Resources:

  • IAM Role: Atria-InstallerLambdaRole

  • IAM Role: Atria-InstallerAdminUserManagementLambdaRole

  • IAM Role: Atria-InstallerMigrationLambdaRole

  • Lambda: Atria-InstallerDeviceDetailsLambda

  • Lambda: Atria-InstallerEntityResourceLambda

  • Lambda: Atria-InstallerProcessImagesLambda (PythonFunction)

  • Lambda: Atria-InstallerInstallationSubmissionLambda

  • Lambda: Atria-InstallerAdminUserManagementLambda

  • Lambda: Atria-InstallerMigrationLambda

  • PythonLayerVersion: Image processing dependencies

  • Step Functions State Machine: Migration workflow

  • EventBridge Rule: Migration trigger (every 15 minutes)

  • API Gateway: Atria-InstallerAPI

Props Interface:

interface InstallerNestedStackProps { userPoolId: string userPoolArn: string sharedS3Bucket: s3.IBucket deviceDetailsTableName: string deviceDetailsTableArn: string entityResourceTableName: string entityResourceTableArn: string lorawanDevicesTableName: string lorawanGatewaysTableName: string siteIdIndexName: string }

Special Features:

  • S3 event trigger (attached from main stack)

  • Binary media type support for file uploads

  • Step Functions for scheduled data migration


Shared Utilities

SSM Parameters (lib/shared/ssm-parameters.ts)

Provides constants and helpers for SSM Parameter Store:

export const SSM_PATHS = { COGNITO: { USER_POOL_ID: "/atria/cognito/user-pool-id", USER_POOL_ARN: "/atria/cognito/user-pool-arn", // ... }, S3: { SHARED_BUCKET_NAME: "/atria/s3/shared-bucket-name", // ... }, // ... } export class SsmParameterWriter { /* ... */ } export function readSsmParameter(scope, path) { /* ... */ }

API Utilities (lib/shared/api-utils.ts)

Reusable API Gateway configurations:

  • CORS configurations

  • Method options with Cognito authentication

  • Lambda integration options

  • Request/response templates


Migration Principles

1. Stateful Resource Preservation

All stateful resources remain in the main stack:

  • DynamoDB tables with RemovalPolicy.RETAIN

  • Cognito User Pool and Identity Pools

  • S3 Buckets

2. Lambda Naming Convention

New Lambda naming pattern to avoid CloudFormation conflicts:

Atria-{FeatureName}Lambda

Examples: Atria-SitePicturesLambda, Atria-DeviceReadingsLambda

3. API URL Handling

API URLs from nested stacks are:

  1. Exposed via public properties on nested stack classes

  2. Passed to Amplify environment variables

  3. Included in FrontendEnvironmentVariables CfnOutput