Deploy Apache Superset on AWS with ECS, RDS, ElastiCache, and ALB. Production-grade architecture, security, and scaling patterns for analytics at scale.
Deploying Apache Superset to production on AWS requires more than spinning up an EC2 instance and pointing a load balancer at it. You need a hardened, scalable architecture that handles concurrent users, query latency, state management, and security at the level your organization demands. This guide walks you through the production deployment patterns that teams at scale-ups and mid-market companies use to run analytics infrastructure without the operational overhead of Looker or Tableau.
We'll cover the AWS stack—ECS, RDS, ElastiCache, Application Load Balancer (ALB), and IAM—and explain the architectural decisions that separate a weekend project from a production system that scales to hundreds of concurrent dashboard users.
AWS is the natural home for Superset deployments at scale. Unlike managed SaaS platforms, you retain full control over your deployment, data residency, and customization. You avoid the per-seat licensing and vendor lock-in that comes with Looker or Tableau. At the same time, AWS services like ECS, RDS, and ElastiCache abstract away infrastructure management, letting your team focus on analytics and dashboards rather than server patching.
The cost model is transparent and predictable. You pay for compute (ECS tasks), database capacity (RDS), caching (ElastiCache), and data transfer—not per user or per dashboard. For teams embedding analytics or running internal BI platforms, this often costs 30-50% less than enterprise BI vendors while giving you the flexibility to customize every layer of the stack.
Apache Superset itself is battle-tested. It powers analytics at companies like Airbnb, where it was born, and thousands of organizations worldwide. When you deploy Superset on AWS, you're building on a proven, open-source foundation rather than betting on a proprietary platform's roadmap.
A production Superset deployment on AWS follows a three-tier architecture: a stateless application layer (ECS), a persistent data layer (RDS), and a caching layer (ElastiCache). This separation of concerns is critical.
Application Layer (ECS): Superset runs as a containerized workload on Amazon ECS (Elastic Container Service). ECS is AWS's managed container orchestration service—simpler than Kubernetes if you're not already running it, and deeply integrated with other AWS services. You define a task definition that specifies the Docker image, CPU and memory allocation, environment variables, and secrets. ECS launches multiple copies of this task across availability zones, and an Application Load Balancer distributes traffic across them. If a task crashes, ECS automatically replaces it. If traffic spikes, you can scale up the task count in seconds.
Data Layer (RDS): Superset's metadata—dashboards, charts, user accounts, permissions, and query history—lives in a relational database. In production, this is Amazon RDS (Relational Database Service) running PostgreSQL. RDS handles automated backups, multi-AZ failover, and point-in-time recovery. You don't manage patches or replication; AWS does. The database is not publicly accessible; it lives in a private subnet and communicates with ECS tasks via a security group.
Caching Layer (ElastiCache): Superset uses Redis or Memcached for session storage, query result caching, and background job queues. ElastiCache is AWS's managed Redis/Memcached service. It gives you high-performance, in-memory storage without running your own Redis cluster. For production, you'll use ElastiCache Redis with multi-AZ replication and automatic failover. This ensures that if a cache node fails, your sessions don't disappear and your queries don't restart.
These three layers communicate over private networks. Users reach Superset through an Application Load Balancer (ALB), which terminates TLS, routes traffic to healthy ECS tasks, and handles SSL/TLS offloading. Everything is secured with security groups and IAM roles—no task has direct internet access, and no database credentials are hardcoded in your Docker image.
Before you can run Superset on ECS, you need a Docker image. You have two options: use the official Apache Superset image from Docker Hub, or build your own.
For most teams, starting with the official image is the right choice. You can pull apache/superset:latest from Docker Hub. However, you'll likely want to customize it—adding Python packages for database drivers, installing custom fonts for dashboards, or baking in your own configurations.
Here's a minimal Dockerfile that extends the official image:
FROM apache/superset:latest
# Install additional database drivers
RUN pip install psycopg2-binary snowflake-sqlalchemy pymongo
# Copy custom configuration
COPY superset_config.py /app/superset_config.py
ENV SUPERSET_CONFIG_PATH=/app/superset_config.pyThis image includes PostgreSQL drivers (psycopg2), Snowflake support, and MongoDB connectivity—common requirements for mid-market analytics teams. You build it, tag it with your AWS account ID and ECR (Elastic Container Registry) URI, and push it to ECR. ECS pulls from ECR during task launch.
Next, you define an ECS task definition. This is a JSON template that tells ECS how to run your Superset container. Key fields include:
Container Definition:
Task Role and Execution Role:
Here's a simplified task definition snippet:
{
"family": "superset-prod",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::ACCOUNT_ID:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::ACCOUNT_ID:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "superset",
"image": "ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com/superset:latest",
"portMappings": [
{
"containerPort": 8088,
"protocol": "tcp"
}
],
"environment": [
{
"name": "SUPERSET_ENV",
"value": "production"
},
{
"name": "REDIS_URL",
"value": "redis://superset-cache.abc123.ng.0001.use1.cache.amazonaws.com:6379/0"
}
],
"secrets": [
{
"name": "SQLALCHEMY_DATABASE_URI",
"valueFrom": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:superset-db-uri"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/superset",
"awslogs-region": "REGION",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}This tells ECS to run Superset with 512 CPU units (0.5 vCPU) and 1024 MB of memory. The container pulls its database URI and other secrets from AWS Secrets Manager at runtime. Logs go to CloudWatch, where you can search, filter, and alert on them. For detailed guidance on task definition parameters, the Amazon ECS Task Definition Parameters documentation covers every field.
Superset's metadata database is its source of truth. Every dashboard, chart, user, and permission is stored here. In production, you absolutely cannot use SQLite (the default development database). You need a managed, backed-up, replicated database.
Amazon RDS PostgreSQL is the standard choice. Here's what a production RDS instance looks like:
Instance Configuration:
Network Configuration:
Before your first ECS task starts, you need to initialize the RDS database. Superset includes a CLI command to set up the schema:
superset db upgrade
superset load_examples # optional: loads sample dashboards
superset fab create-admin --username admin --firstname Admin --lastname User --email [email protected] --password yourpasswordYou run these commands once, either in a separate ECS task or locally with network access to RDS. After initialization, your ECS tasks connect to the pre-initialized database at startup.
Store your RDS endpoint and credentials in AWS Secrets Manager. Your task definition pulls them as secrets, never exposing them in environment variables or logs. The connection string looks like:
postgresql://superset_user:[email protected]:5432/superset
Superset uses Redis for three critical functions: session storage, query result caching, and background job queues. Without Redis, sessions are lost when a task restarts, cached query results are discarded, and long-running queries have nowhere to queue.
Amazon ElastiCache for Redis is a managed Redis service that handles replication, failover, and scaling. For production Superset, configure it like this:
Cluster Configuration:
Network Configuration:
ElastiCache gives you a Redis endpoint like superset-cache.abc123.ng.0001.use1.cache.amazonaws.com. Your ECS task connects via the REDIS_URL environment variable:
redis://superset-cache.abc123.ng.0001.use1.cache.amazonaws.com:6379/0
Superset automatically uses Redis for:
If Redis becomes unavailable, Superset degrades gracefully but loses some functionality. Sessions are lost on task restart, cached results expire immediately, and async jobs fail. This is why multi-AZ with automatic failover is non-negotiable in production.
Users don't connect directly to ECS tasks. Instead, an Application Load Balancer (ALB) sits in front of them, distributing traffic, terminating TLS, and providing a stable endpoint.
Here's the ALB setup:
Listener Configuration:
Target Group:
/health with 30-second interval, 2 consecutive successes to mark healthySecurity Group:
The ALB also handles HTTP-to-HTTPS redirects automatically. If a user visits http://analytics.yourcompany.com, the ALB redirects them to HTTPS.
When you launch an ECS service, you attach it to the target group. ECS automatically registers new tasks and deregisters old ones as you deploy updates. The ALB continuously health-checks tasks and removes unhealthy ones from rotation.
Every AWS resource needs permissions. ECS tasks need to pull Docker images, write logs, and retrieve secrets. Superset itself might need to access S3 or assume roles for cross-account database access. IAM roles enforce least-privilege access—each component gets only the permissions it needs.
ECS Task Execution Role: Allows ECS to manage the task on your behalf:
ecr:GetAuthorizationToken: pull Docker images from ECRecr:BatchGetImage and ecr:GetDownloadUrlForLayer: download image layerslogs:CreateLogStream and logs:PutLogEvents: write logs to CloudWatchsecretsmanager:GetSecretValue: retrieve database credentials and other secretsECS Task Role: Allows the running Superset container to access AWS services:
s3:GetObject and s3:PutObject: export dashboards to S3sts:AssumeRole: assume roles in other AWS accounts (for cross-account database access)cloudwatch:PutMetricData: publish custom metricsCreate these roles with IAM, attach the appropriate policies, and reference them in your task definition. Never hardcode AWS credentials in your Docker image or environment variables.
Superset's behavior is controlled by a Python configuration file. Create a superset_config.py file and bake it into your Docker image (or mount it from a ConfigMap if using Kubernetes).
Key production settings:
# Security
SECRET_KEY = os.environ.get('SECRET_KEY') # from Secrets Manager
SESSION_COOKIE_SECURE = True
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = 'Lax'
# Database
SQLALCHEMY_DATABASE_URI = os.environ.get('SQLALCHEMY_DATABASE_URI') # from Secrets Manager
SQLALCHEMY_TRACK_MODIFICATIONS = False
SQLALCHEMY_POOL_SIZE = 10
SQLALCHEMY_POOL_RECYCLE = 3600
# Redis
RESULTS_BACKEND = 'cache'
CACHE_DEFAULT_TIMEOUT = 86400 # 1 day
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': os.environ.get('REDIS_URL'),
}
# Celery (background jobs)
CELERY_BROKER_URL = os.environ.get('REDIS_URL')
CELERY_RESULT_BACKEND = 'redis://...'
# Feature flags
FEATURE_FLAGS = {
'ENABLE_TEMPLATE_PROCESSING': True,
'VERSIONED_EXPORT': True,
'DASHBOARD_RBAC': True, # role-based access control
}
# Logging
LOG_LEVEL = 'INFO'This configuration:
For detailed security best practices, the Securing Your Superset Installation for Production guide covers HTTPS, reverse proxies, and authentication patterns.
Once your infrastructure is in place, deploying Superset is straightforward.
Initial Deployment:
Scaling: ECS has two scaling mechanisms:
For a typical mid-market deployment:
Blue-Green Deployments: For zero-downtime updates, use ECS deployment controller with blue-green strategy. Create a new task definition, update the service to point to it, and ECS gradually shifts traffic from old tasks to new ones. If something breaks, you can instantly roll back.
You can't operate what you can't see. Set up comprehensive monitoring from day one.
CloudWatch Logs: ECS tasks write logs to CloudWatch. Create log groups for application logs, database logs, and ALB access logs. Use CloudWatch Insights to query logs:
fields @timestamp, @message, @duration
| filter @message like /ERROR/
| stats count() by bin(5m)
CloudWatch Metrics: Monitor:
Alarms: Set up alarms for:
When alarms trigger, send notifications to PagerDuty, Slack, or email. For detailed monitoring setup, D23 provides expert data consulting to optimize your infrastructure and alert strategies.
Production Superset deployments handle sensitive business data. Security isn't optional.
Network Security:
Secrets Management:
Authentication and Authorization:
Data Encryption:
Compliance:
Running Superset on AWS is cost-effective, but you can optimize further.
Compute:
Database:
Caching:
Data Transfer:
Typical monthly cost for a mid-market deployment:
Pitfall 1: Stateful ECS Tasks If you store files or session data on the task's local filesystem, they're lost when the task restarts. Always use external storage (S3, EBS volumes, or Redis).
Pitfall 2: Hardcoded Secrets Never put database passwords in your Dockerfile or environment variables. Use Secrets Manager, and rotate regularly.
Pitfall 3: Undersized RDS A db.t3.micro works for development but not production. Start with db.t3.small and monitor CPU/connections. If you see frequent connection pool exhaustion, scale up.
Pitfall 4: No Cache Invalidation If you cache query results for too long, dashboards show stale data. Set appropriate TTLs (e.g., 1 hour for hourly reports, 1 day for slower-moving data).
Pitfall 5: Ignoring Logs Without CloudWatch Logs and alarms, you won't know when something breaks. Set up monitoring before you need it.
You might wonder: why not use Preset, which is Superset-as-a-Service? Or stick with Looker or Tableau?
Preset is convenient but expensive (per-seat pricing, starting at $200/user/month). You lose control over infrastructure, data residency, and customization.
Looker and Tableau are powerful but proprietary, expensive ($2000-10000/month for typical teams), and lock you into their ecosystem. You can't customize the query engine or embed dashboards without premium licensing.
D23, a managed Apache Superset platform, sits in the middle. You get the operational simplicity of Preset with the control and cost-effectiveness of self-hosted Superset. D23 provides managed hosting, AI-powered analytics, and expert consulting to accelerate your deployment and optimize your infrastructure.
For teams that need:
Deploying Apache Superset to production on AWS is well within reach for engineering and data teams. The architecture is straightforward: stateless ECS tasks, managed RDS, ElastiCache for caching, and an ALB for load balancing. Security, monitoring, and cost optimization are built in from the start.
The key is treating infrastructure as code, automating deployments, and monitoring relentlessly. Start with a 2-task deployment, scale horizontally as needed, and use auto-scaling to handle traffic spikes. Your total cost will be a fraction of enterprise BI platforms, and you'll have the flexibility to customize every layer.
For teams that want to skip the operational overhead, D23 offers managed Apache Superset with AI analytics, API-first architecture, and expert data consulting. Whether you deploy on AWS yourself or use a managed service, Apache Superset gives you production-grade analytics without the platform overhead.
Start small, monitor closely, and iterate. Your analytics infrastructure will scale with your business.