TerraGuard

AWS Infrastructure

Overview of the AWS services powering TerraGuard in production, including the migration from serverless Lambda to the Go event processor.

Overview

TerraGuard runs on AWS with a mix of managed services and self-hosted components on EC2. The architecture balances cost efficiency with operational simplicity, using App Runner for the backend API and EC2 for services that require persistent connections or specialized runtimes.

Loading diagram...

AWS Services

Compute

ServicePurposeDetails
App RunnerBackend API hostingAuto-scaling, deploys from ECR images
EC2 (Server B)Support servicesc7g.xlarge ARM64, runs crawler + geopop (the standalone search service was removed)

Storage

ServicePurposeDetails
RDS PostgreSQLPrimary databasePostGIS + pgVector extensions enabled
S3Static assets & reportsGenerated PDF reports, uploaded documents
ECRContainer registryDocker images under terra-guard/ namespace

Networking & Delivery

ServicePurposeDetails
CloudFrontCDNCaches static assets, terminates TLS
Route 53DNSDomain management for all services

Application Integrations (Runtime)

Beyond hosting, the Backend API calls these AWS services directly at runtime (via boto3, authenticated with AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION):

ServicePurposeDetails
S3Document storageClient uploads go straight to S3 via presigned URLs; the extraction pipeline reads them back. Bucket from AWS_S3_BUCKET_NAME, keyed under events/documents/.
SESTransactional emailAuth emails, notifications, and reports. In local/dev, USE_MAILPIT=true swaps SES for Mailpit so nothing leaves the machine.
TextractDocument OCROptional, off by default (AWS_TEXTRACT_ENABLED=false). When enabled, extracts text from scanned PDFs/images during document ingestion.

SQS is no longer in the ingestion path. The Go event processor polls sources and writes to PostgreSQL directly; AWS_SQS_URL and the SQS handler remain only for replaying captured message dumps in tests. See Legacy Services.

Legacy Services (Deprecated)

ServiceFormer PurposeStatus
SQSEvent ingestion queueReplaced by Go event processor direct writes
DynamoDBEvent deduplication stateReplaced by PostgreSQL-based dedup
LambdaEvent processing functionsReplaced by Go event processor

AWS CLI Configuration

All AWS CLI commands use the tg profile:

# Configure the profile
aws configure --profile tg

# Example commands
aws ecr get-login-password --profile tg --region us-east-1
aws s3 ls s3://terraguard-assets --profile tg
aws ssm send-command --profile tg --instance-ids i-05033852181296c97 ...

Migration: Lambda + SQS to Go Event Processor

The original event ingestion pipeline used a serverless architecture:

Loading diagram...

This was replaced with the Go event processor for several reasons:

  1. Cold start latency -- Lambda cold starts added 3-5 seconds to event processing, delaying time-sensitive disaster alerts
  2. SQS complexity -- Dead letter queues, retry policies, and visibility timeouts added operational overhead with little benefit for the throughput level
  3. DynamoDB cost -- Per-request pricing for deduplication lookups became expensive as the polling frequency increased
  4. Debugging difficulty -- Distributed traces across Lambda, SQS, and DynamoDB were hard to follow compared to a single-process log stream

The new architecture is a single Go binary that handles polling, deduplication, normalization, and database writes:

Loading diagram...

The event processor runs as a Docker container on Server B alongside the other Go services. It polls data sources on a configurable schedule, deduplicates against PostgreSQL, and sends a webhook to the backend API to trigger the enrichment pipeline via Inngest.

Cost Optimization

Key cost decisions in the current architecture:

  • App Runner over ECS/EKS -- Simpler scaling model for a single backend service, no cluster management overhead
  • Single EC2 instance (Server B) -- Co-locating three lightweight services on one c7g.xlarge is cheaper than running three separate App Runner services
  • ARM64 (Graviton) -- Server B uses Graviton processors for better price/performance on Go and Rust workloads
  • Inngest Cloud over self-hosted -- Managed job queue eliminates the need for a dedicated Redis instance in production (Redis is only used locally)
  • Vercel for frontend -- Free tier covers the Next.js deployment, avoiding App Runner costs for static/SSR content

On this page