TerraGuard

Server B Deployment

Deployment architecture for Server B, the EC2 instance hosting TerraGuard's crawler and geocoding services with Caddy reverse proxy and automated CI/CD.

Overview

Server B is a single EC2 instance that hosts support services behind a Caddy reverse proxy. It provides web content crawling and reverse geocoding to the backend API. (Web/news search is no longer hosted here — it now runs inside the Backend API using Serper.dev and Brave Search; see Search Layer.)

Loading diagram...

Instance Specification

PropertyValue
Instance Typec7g.xlarge (ARM64 Graviton3)
vCPUs4
Memory8 GB
Instance IDi-05033852181296c97
Elastic IP34.200.207.223
OSAmazon Linux 2023 (ARM64)
Regionus-east-1

Services

TerraGuard support services run as Docker containers managed by Docker Compose:

ServiceDomainInternal PortDescription
tg-web-crawlercrawler.terraguard.ai8091Async web crawling with strategy fallback
tg-geo-popgeopop.terraguard.ai8077Reverse geocoding & population analysis

Supporting containers that are not publicly exposed:

ServiceInternal PortDescription
Crawler Worker8092Python crawl4ai browser-based extraction

Caddy Reverse Proxy

Caddy serves as the entry point for all external traffic. It handles TLS termination, automatic HTTPS certificate provisioning via Let's Encrypt, and API key authentication.

Automatic HTTPS

Caddy automatically obtains and renews TLS certificates from Let's Encrypt for each domain it serves. No manual certificate management is required.

API Key Authentication

All service endpoints are protected at the Caddy layer with API key validation. Requests must include the correct key in the X-API-Key header:

curl -H "X-API-Key: your-api-key" \
  https://crawler.terraguard.ai/v1/health

This means the individual services do not need to implement their own authentication -- Caddy rejects unauthenticated requests before they reach the backend containers.

Caddyfile Structure

The Caddy configuration routes each domain to its corresponding container:

crawler.terraguard.ai {
    @authenticated header X-API-Key {env.API_KEY}
    handle @authenticated {
        reverse_proxy tg-web-crawler-api:8091
    }
    respond 401
}

geopop.terraguard.ai {
    @authenticated header X-API-Key {env.API_KEY}
    handle @authenticated {
        reverse_proxy geopop-api:8077
    }
    respond 401
}

Docker Compose

All services are orchestrated with a single docker-compose.yml file:

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# Restart a specific service
docker compose restart tg-web-crawler-api

# Pull latest images and recreate
docker compose pull && docker compose up -d

Images are pulled from ECR. Each service image is built and pushed by its GitHub Actions CI/CD pipeline.

Deployment Process

Deployments are fully automated via GitHub Actions. The flow is:

Loading diagram...

Manual Deployment

If needed, you can deploy manually via SSM:

aws ssm send-command \
  --profile tg \
  --instance-ids i-05033852181296c97 \
  --document-name "AWS-RunShellScript" \
  --parameters 'commands=["cd /opt/terraguard && docker compose pull && docker compose up -d"]'

Or SSH directly (for debugging only):

ssh -i terraguard-search-vps.pem ec2-user@34.200.207.223

Health Checks

Verify all services are running:

# Crawler API
curl -H "X-API-Key: $API_KEY" https://crawler.terraguard.ai/v1/health

# GeoPop API
curl -H "X-API-Key: $API_KEY" https://geopop.terraguard.ai/api/v1/health

All endpoints should return HTTP 200 with a JSON health status.

On this page