Learn how AWS PrivateLink secures Apache Superset deployments by keeping analytics traffic on private networks. Technical guide for data teams.
When you're running Apache Superset at scale—especially in regulated industries or handling sensitive customer data—network security becomes as critical as query performance. AWS PrivateLink is a foundational tool for keeping your analytics infrastructure isolated from the public internet while maintaining secure, low-latency connectivity across your AWS environment.
What is AWS PrivateLink? At its core, PrivateLink creates private connectivity between your VPCs, AWS services, and third-party services without routing traffic through the internet. Instead of your Superset instance communicating with databases, data warehouses, or downstream services across the public internet (or even through VPN tunnels that add latency), PrivateLink establishes private network endpoints that keep all traffic within AWS's internal network backbone.
For analytics teams, this means:
This is particularly important for D23, which manages Apache Superset deployments for data teams at scale-ups and mid-market companies. When you're embedding self-serve BI or AI-powered analytics into your product, or standardizing dashboards across portfolio companies, network security can't be an afterthought.
Most Superset deployments start simple. You spin up an EC2 instance or ECS task, point it at your RDS database, and expose it via an Application Load Balancer with a public IP. For development, this works fine. But as you move toward production—especially when handling customer data or financial metrics—this architecture creates unnecessary risk.
Here's what typically happens:
PrivateLink solves these problems by design. Instead of routing through internet gateways, all traffic stays within AWS's private network infrastructure.
Understanding PrivateLink's architecture helps you deploy it correctly. There are three main components:
A VPC endpoint is your entry point into PrivateLink. There are two types relevant to Superset deployments:
Interface endpoints create an elastic network interface (ENI) inside your VPC with a private IP address. When your Superset instance needs to reach an AWS service (like RDS, Redshift, or S3), it connects to this ENI instead of routing through the internet. The endpoint then routes traffic privately to the service.
Gateway endpoints are simpler and specifically for S3 and DynamoDB. They don't require an ENI; instead, you add a route table entry that directs traffic to the endpoint.
For most Superset deployments, you'll use interface endpoints to connect to:
PrivateLink establishes a provider-consumer relationship. In a typical Superset setup:
When you create a VPC endpoint for RDS, you're essentially saying: "I want my Superset instance to reach RDS privately." AWS handles the plumbing behind the scenes.
PrivateLink uses DNS to route traffic seamlessly. When your Superset instance tries to connect to superset-db.us-east-1.rds.amazonaws.com, PrivateLink intercepts that DNS query and returns the private IP of the VPC endpoint instead of the public IP.
The entire connection happens over AWS's private backbone—no internet gateway, no NAT, no public IP involved. For security teams, this is auditable: you can verify that all traffic stays within your VPC and AWS's private network using VPC Flow Logs.
Implementing PrivateLink for Superset involves several steps, depending on your architecture. Here's a practical guide:
Start by mapping every service your Superset instance needs to reach:
For each service, determine whether it's AWS-native (and has PrivateLink support) or third-party (requiring a different approach).
Using the AWS Console, CLI, or Infrastructure as Code (Terraform, CloudFormation), create VPC endpoints:
For RDS:
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345678 \
--service-name com.amazonaws.us-east-1.rds \
--vpc-endpoint-type Interface \
--subnet-ids subnet-12345678 subnet-87654321 \
--security-group-ids sg-12345678
This creates a private endpoint for RDS. Your Superset instance can now reach RDS databases without routing through the internet.
For S3:
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345678 \
--service-name com.amazonaws.us-east-1.s3 \
--vpc-endpoint-type Gateway \
--route-table-ids rtb-12345678
S3 gateway endpoints are simpler—they don't require a separate ENI or security group.
For Secrets Manager and other services:
Repeat the interface endpoint creation for each service. The service name follows the pattern: com.amazonaws.<region>.<service-name>.
PrivateLink endpoints need proper security group rules to allow inbound traffic from your Superset instance.
For the VPC endpoint's security group:
Ingress Rule:
Protocol: TCP
Port: 443 (for RDS, Secrets Manager, etc.)
Source: Security group of your Superset instance
For your Superset instance's security group:
Egress Rule:
Protocol: TCP
Port: 443
Destination: Security group of the VPC endpoint
If you're using network ACLs, ensure they allow bidirectional traffic on the necessary ports.
Here's where the magic happens. When you create a VPC endpoint, AWS provides a private DNS name. For example, an RDS endpoint might have:
superset-db.us-east-1.rds.amazonaws.com (public IP)superset-db.us-east-1.rds.amazonaws.com (resolves to private IP via PrivateLink)Superset's database connection strings don't need to change if you enable private DNS hostname when creating the endpoint. This makes the migration seamless:
# superset_config.py
SQLALCHEMY_DATABASE_URI = "postgresql://user:[email protected]:5432/superset"The DNS resolution automatically uses the private endpoint. No code changes required.
For more complex setups, Apache Superset on AWS - AWS Integration and Automation provides CloudFormation templates that handle this configuration automatically.
Once endpoints are created, verify that Superset can reach your data sources:
# From within your Superset instance
psql -h superset-db.us-east-1.rds.amazonaws.com -U postgres -d superset -c "SELECT 1"
```n
If the connection succeeds, traffic is flowing through PrivateLink. To confirm, check VPC Flow Logs:
```bash
aws logs filter-log-events \
--log-group-name /aws/vpc/flowlogs \
--filter-pattern "[version, account, interface_id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, tcpflags, type]"You should see traffic to your endpoint's private IP, not the public IP of your RDS instance.
For larger organizations—especially private equity firms or venture capital firms managing multiple portfolio companies—you often need Superset to access data sources across different AWS accounts or VPCs.
PrivateLink supports this through endpoint services. Here's the pattern:
Governing and securing AWS PrivateLink service access at scale details how to manage this at scale using Service Control Policies (SCPs) and centralized governance.
For Superset deployments across multiple portfolio companies, this pattern is essential. Each portfolio company's Superset instance can securely reach centralized data platforms or other companies' data warehouses—all over private PrivateLink connections.
PrivateLink itself is secure by design, but you can layer additional protections:
Create endpoint policies that restrict which principals can use the endpoint:
{
"Statement": [
{
"Principal": "arn:aws:iam::123456789012:role/superset-instance-role",
"Effect": "Allow",
"Action": "execute-api:Invoke",
"Resource": "*"
}
]
}This ensures only your Superset instance (via its IAM role) can use the endpoint.
For RDS databases, combine PrivateLink with Support IAM authentication for AWS RDS databases, which eliminates the need for hardcoded passwords. Your Superset instance uses temporary, time-limited credentials from its IAM role to authenticate to RDS.
Even though PrivateLink traffic never touches the internet, encrypt it anyway using TLS. For RDS:
# superset_config.py
SQLALCHEMY_DATABASE_URI = "postgresql://[email protected]:5432/superset?sslmode=require"For Redshift, enable SSL in the cluster configuration.
Enable VPC Flow Logs to audit all traffic through your endpoints:
aws ec2 create-flow-logs \
--resource-type VPC \
--resource-ids vpc-12345678 \
--traffic-type ALL \
--log-destination-type cloud-watch-logs \
--log-group-name /aws/vpc/flowlogsMonitor these logs for unexpected traffic patterns or failed connections.
You're building a SaaS platform and want to embed Superset dashboards for your customers. Your architecture:
Without PrivateLink, your Superset instance either needs a public IP (security risk) or you route through NAT gateways (adds latency and cost). With PrivateLink:
This is exactly the use case that D23 was built for—embedding analytics without the platform overhead.
You're a PE firm with 15 portfolio companies. Each company has its own AWS account and data warehouse. You want to build a centralized Superset instance in your main account to create consolidated KPI dashboards across all companies.
Without PrivateLink:
With PrivateLink:
This pattern scales to dozens of accounts and is fully automated with Terraform or CloudFormation.
You're a healthcare analytics company using Superset to analyze patient data. HIPAA requires that protected health information (PHI) never be transmitted over the internet.
PrivateLink is essential:
Similarly, financial services companies handling PCI-DSS data or processing credit card information benefit from PrivateLink's private connectivity.
If you're using D23 for managed Apache Superset, PrivateLink integration is built into the platform. Here's how it works:
Default setup: D23 deploys your Superset instance in a D23-managed VPC. By default, you can reach it via a private endpoint within your AWS environment, or securely from the internet via TLS.
PrivateLink for data sources: D23 automatically creates VPC endpoints for your RDS, Redshift, and S3 connections, eliminating internet exposure for data flows.
Cross-account access: If you're accessing data across multiple AWS accounts (common for portfolio companies or multi-tenant SaaS), D23 sets up cross-account PrivateLink connections so Superset can reach data warehouses in other accounts—all privately.
API security: If you're embedding Superset dashboards or using the API, D23 provides private API endpoints via PrivateLink, so your backend services reach Superset without internet exposure.
This approach eliminates the operational overhead of managing PrivateLink yourself while maintaining the security benefits. For data teams that need production-grade analytics without platform overhead, this is ideal.
Symptom: superset-db.us-east-1.rds.amazonaws.com resolves to a public IP instead of a private IP.
Cause: You didn't enable "Private DNS hostname" when creating the VPC endpoint.
Fix: Modify the endpoint to enable private DNS:
aws ec2 modify-vpc-endpoint \
--vpc-endpoint-id vpce-12345678 \
--private-dns-enabledSymptom: Superset can't reach RDS; connections timeout.
Cause: Security group rules on the endpoint or Superset instance are incorrect.
Fix: Verify security groups:
# Check endpoint security group allows inbound on port 443
aws ec2 describe-security-groups --group-ids sg-endpoint
# Check Superset instance security group allows outbound on port 443
aws ec2 describe-security-groups --group-ids sg-supersetSymptom: Queries are slow even with PrivateLink.
Cause: The VPC endpoint is in a different availability zone from your Superset instance, or you're using a single-AZ endpoint.
Fix: Create multi-AZ endpoints:
aws ec2 create-vpc-endpoint \
--vpc-id vpc-12345678 \
--service-name com.amazonaws.us-east-1.rds \
--vpc-endpoint-type Interface \
--subnet-ids subnet-az1 subnet-az2 subnet-az3 # Multiple AZsSymptom: You're trying to access a PrivateLink endpoint service from another account, but the connection fails.
Cause: The endpoint service owner hasn't accepted your connection request, or the endpoint policy restricts access.
Fix: The service owner must accept the connection request:
# Service owner runs:
aws ec2 accept-vpc-endpoint-service-connections \
--vpc-endpoint-service-name com.amazonaws.vpce.us-east-1.vpce-svc-12345678Every public IP or internet-facing endpoint is a potential attack vector. Design your Superset deployment so:
For D23 customers, this is handled automatically.
Create dedicated VPC endpoints for RDS, Redshift, S3, and Secrets Manager. This allows granular security policies and easier troubleshooting.
Enable CloudWatch metrics for your endpoints:
aws ec2 describe-vpc-endpoint-service-configurations \
--filters Name=service-name,Values=com.amazonaws.us-east-1.rdsTrack bytes in/out, connection counts, and error rates.
Don't leave endpoint policies open. Restrict access to specific principals:
{
"Statement": [
{
"Principal": "arn:aws:iam::123456789012:role/superset-role",
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}Create a diagram showing:
This helps with onboarding, troubleshooting, and compliance audits.
Ensure your PrivateLink setup is resilient:
PrivateLink significantly simplifies compliance for regulated industries. Here's why:
PrivateLink helps you meet SOC 2 requirements for:
PrivateLink satisfies HIPAA's technical safeguards:
For payment card data:
When auditors ask, "How do you ensure cardholder data doesn't touch the internet?" you can show them VPC Flow Logs proving all traffic stayed private.
As analytics evolve, PrivateLink's importance grows:
When Superset uses text-to-SQL or other AI features to query data, those queries flow through PrivateLink, keeping sensitive data private even during AI processing.
As more applications embed analytics APIs, PrivateLink ensures that API calls between your application and Superset never touch the internet.
For SaaS platforms serving multiple customers, PrivateLink enables secure, isolated data flows between each customer's data and their Superset instance.
AWS PrivateLink is not a luxury for Superset deployments at scale—it's a foundational security practice. It eliminates entire classes of network-based threats, simplifies compliance, reduces costs, and improves performance.
The implementation is straightforward: identify your data sources, create VPC endpoints, update security groups, and let DNS handle the rest. For teams using D23, this is handled automatically as part of the managed platform.
Whether you're embedding Superset in a SaaS product, consolidating analytics across portfolio companies, or handling regulated data in healthcare or finance, PrivateLink should be your default architecture—not an afterthought.
For more details on Superset security best practices, see the official Apache Superset documentation on production security, which covers TLS enforcement, HSTS headers, and session management alongside network security.
Start with PrivateLink for your data sources, add IAM authentication for credentials, enable VPC Flow Logs for auditing, and you've built a security foundation that scales with your analytics platform.