Master Apache Superset user provisioning with SCIM and JIT SAML. Automate identity sync, reduce overhead, and scale securely for enterprise teams.
User provisioning is the process of creating, updating, and removing user accounts and their access rights within an application. In Apache Superset, this traditionally meant manual intervention—an admin logs in, creates users one by one, assigns roles, and manages permissions through the UI. For teams managing hundreds or thousands of users across multiple data products, this approach becomes a bottleneck.
Automated user provisioning solves this by connecting your identity provider (Okta, Azure AD, Auth0, or another SAML/OIDC system) directly to Superset. When a new employee joins, their account appears in Superset automatically. When they leave, their access revokes instantly. This eliminates manual user management, reduces security risk, and scales without adding operational burden.
The two primary patterns for automating this workflow are SCIM (System for Cross-Domain Identity Management) and Just-in-Time (JIT) provisioning via SAML. Both solve the same problem—synchronizing identity state between your HR system and Superset—but they work differently in practice. Understanding when and how to use each is critical for teams building production analytics infrastructure.
At D23, we've implemented both patterns for customers managing Superset at scale. This guide walks through the mechanics, trade-offs, and concrete implementation steps so you can choose the right approach for your organization.
SCIM is a standardized protocol for automating user and group provisioning across cloud applications. It defines a REST API contract that identity providers use to push user data to applications. Instead of each app building its own provisioning integration, SCIM creates a common language.
Here's how SCIM works in practice:
The SCIM Flow:
This is fundamentally different from JIT provisioning. SCIM is push-based: your identity provider proactively sends user data to Superset. JIT is pull-based: Superset creates users on first login.
For detailed protocol specifications and how major providers implement SCIM, see the SCIM Protocol Documentation and SCIM User Provisioning with Okta. These resources cover the technical structure of SCIM 2.0 requests and responses.
SCIM Request Example:
When Okta provisions a user, it sends something like this:
POST /scim/v2/Users HTTP/1.1
Host: superset.yourcompany.com
Authorization: Bearer your-scim-token
Content-Type: application/scim+json
{
"schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
"userName": "[email protected]",
"name": {
"givenName": "Alice",
"familyName": "Chen"
},
"emails": [{
"value": "[email protected]",
"primary": true
}],
"groups": [{
"value": "analytics-team"
}],
"active": true
}Your Superset instance receives this payload, parses it, and creates a user record. If that user is part of the "analytics-team" group, you can automatically assign them to a Superset role with the appropriate dataset and dashboard permissions.
Key advantages of SCIM:
For comprehensive security guidance on implementing SCIM in a production Superset environment, consult the Securing Your Superset Installation for Production guide and the Tutorial - Develop a SCIM endpoint for user provisioning from Microsoft, which covers SCIM 2.0 endpoint development patterns.
Just-in-Time provisioning takes a different approach. Instead of your identity provider pushing user data, Superset creates users on their first login. The identity provider (via SAML assertions) tells Superset who the user is and what groups they belong to, and Superset creates them on the fly.
The JIT Flow:
This is simpler to set up than SCIM because SAML is already widely supported in Superset. You don't need to build a SCIM endpoint. You just configure SAML and tell Superset how to map SAML attributes to user fields and groups.
SAML Attribute Mapping in Superset:
Superset's Flask AppBuilder security model (which underlies Superset's authentication) supports SAML attribute mapping. You configure it in superset_config.py:
FROM_EMAIL = "[email protected]"
SECURITY_MANAGER_CLASS = 'superset.security.SupersetSecurityManager'
AUTH_TYPE = 5 # SAML auth
SAML_METADATA_URL = "https://idp.yourcompany.com/metadata.xml"
# Map SAML attributes to Superset user fields
SAML_ATTRIBUTE_MAPPING = {
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "email",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname": "first_name",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname": "last_name",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/groups": "groups"
}
# Map SAML groups to Superset roles
SAML_GROUP_MAPPING = {
"analytics-team": "Analytics",
"data-engineers": "Data Engineer",
"executives": "Admin"
}When a user logs in with this configuration, Superset automatically creates them and assigns them to the appropriate roles based on their SAML group membership. No manual provisioning needed.
For enterprise identity providers like Auth0, detailed integration guides are available. See SCIM Provisioning with Auth0 for Auth0-specific patterns, and Provision Users Using SCIM with AWS IAM Identity Center for AWS-based deployments.
Key advantages of JIT provisioning:
Key limitations of JIT provisioning:
Both patterns solve the user provisioning problem, but they're optimized for different scenarios.
Use SCIM when:
Use JIT provisioning when:
In practice, many organizations use both. They might use SCIM for internal employees and JIT for partner/customer access, or they use JIT initially and migrate to SCIM as they scale.
If you decide SCIM is right for your organization, you'll need to implement a SCIM endpoint in your Superset instance. This is an HTTP API that your identity provider calls to provision users.
Superset doesn't ship with a built-in SCIM endpoint, so you'll need to build one. Here's the architecture:
SCIM Endpoint Structure:
A minimal SCIM endpoint implements these endpoints:
POST /scim/v2/Users — Create a new userGET /scim/v2/Users/{id} — Retrieve a userPATCH /scim/v2/Users/{id} — Update a userDELETE /scim/v2/Users/{id} — Delete a userGET /scim/v2/Groups — List groupsPOST /scim/v2/Groups — Create a groupPATCH /scim/v2/Groups/{id} — Update a groupYou'll also need a .well-known/scim-configuration endpoint that tells identity providers about your SCIM capabilities.
Implementation in Flask (Superset's Web Framework):
Here's a skeleton of a SCIM endpoint for Superset:
from flask import Blueprint, request, jsonify
from superset.security import SupersetSecurityManager
from superset.models.core import User
from superset import db
import uuid
scim_bp = Blueprint('scim', __name__, url_prefix='/scim/v2')
@scim_bp.route('/Users', methods=['POST'])
def create_user():
data = request.get_json()
# Validate SCIM request
if 'userName' not in data:
return {"error": "userName required"}, 400
# Extract user details
email = data.get('emails', [{}])[0].get('value')
first_name = data.get('name', {}).get('givenName', '')
last_name = data.get('name', {}).get('familyName', '')
username = data['userName']
is_active = data.get('active', True)
# Check if user exists
existing_user = db.session.query(User).filter_by(username=username).first()
if existing_user:
return {"error": "User already exists"}, 409
# Create user
user = User(
username=username,
email=email,
first_name=first_name,
last_name=last_name,
is_active=is_active
)
db.session.add(user)
db.session.commit()
# Assign groups/roles if provided
for group in data.get('groups', []):
group_name = group.get('value')
role = db.session.query(Role).filter_by(name=group_name).first()
if role:
user.roles.append(role)
db.session.commit()
return {
"schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
"id": user.id,
"userName": user.username,
"emails": [{"value": user.email, "primary": True}],
"name": {
"givenName": user.first_name,
"familyName": user.last_name
},
"active": user.is_active
}, 201
@scim_bp.route('/Users/<user_id>', methods=['PATCH'])
def update_user(user_id):
user = db.session.query(User).filter_by(id=user_id).first()
if not user:
return {"error": "User not found"}, 404
data = request.get_json()
# Handle attribute updates
for operation in data.get('Operations', []):
op = operation.get('op')
path = operation.get('path')
value = operation.get('value')
if op == 'replace':
if path == 'active':
user.is_active = value
elif path == 'name.givenName':
user.first_name = value
elif path == 'name.familyName':
user.last_name = value
db.session.commit()
return {"id": user.id}, 200
@scim_bp.route('/Users/<user_id>', methods=['DELETE'])
def delete_user(user_id):
user = db.session.query(User).filter_by(id=user_id).first()
if not user:
return {"error": "User not found"}, 404
db.session.delete(user)
db.session.commit()
return {}, 204This is a simplified example. Production implementations need:
startIndex and count for listing usersFor detailed specifications on SCIM endpoint implementation, refer to the Tutorial - Develop a SCIM endpoint for user provisioning and Security — Apache Superset documentation for Superset-specific security considerations.
At D23, we provide managed SCIM endpoints as part of our Superset hosting service, so you don't have to build this yourself. But understanding the mechanics helps you evaluate whether SCIM is right for your organization.
JIT provisioning is simpler to set up than SCIM. You just need to configure SAML in Superset and define how SAML attributes map to Superset user fields.
Step 1: Install Flask-SAML
Superset uses Flask-SAML for SAML authentication. Make sure it's installed:
pip install flask-saml2Step 2: Configure SAML in superset_config.py
import os
# Enable SAML authentication
AUTH_TYPE = 5 # SAML
# SAML metadata URL (provided by your identity provider)
SAML_METADATA_URL = "https://idp.yourcompany.com/metadata.xml"
# Or use a local metadata file
# SAML_METADATA_FILE = "/path/to/metadata.xml"
# Entity ID (must match what's configured in your identity provider)
SAML_ENTITY_ID = "https://superset.yourcompany.com/metadata/"
# Assertion Consumer Service URL (where the identity provider sends assertions)
SAML_ASSERTION_CONSUMER_SERVICE_URL = "https://superset.yourcompany.com/acs"
# Single Logout Service URL
SAML_SINGLE_LOGOUT_SERVICE_URL = "https://superset.yourcompany.com/sls"
# Map SAML attributes to Superset user fields
SAML_ATTRIBUTE_MAPPING = {
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "email",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname": "first_name",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname": "last_name",
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/groups": "groups"
}
# Map SAML groups to Superset roles
SAML_GROUP_MAPPING = {
"analytics-team": "Analytics",
"data-engineers": "Data Engineer",
"finance-team": "Finance",
"executives": "Admin"
}
# Allow users to be created on first login
AUTH_ROLE_ADMIN = "Admin"
AUTH_ROLE_PUBLIC = "Public"
# Optional: Set default role for new users
AUTH_USER_REGISTRATION_ROLE = "Viewer"Step 3: Restart Superset
Restart your Superset application for configuration changes to take effect:
superset db upgrade
superset load_examples
superset initStep 4: Configure Your Identity Provider
In your identity provider (Okta, Azure AD, etc.), add Superset as an application:
https://superset.yourcompany.com/acshttps://superset.yourcompany.com/metadata/For Okta specifically, see SCIM User Provisioning with Okta for both SAML and SCIM configuration.
Step 5: Test the Configuration
Have a test user log into Superset. They should be redirected to your identity provider, authenticate, and be automatically created in Superset with the appropriate role.
If users aren't being created, check the Superset logs for SAML errors:
tail -f /var/log/superset/superset.log | grep -i samlBoth SCIM and JIT provisioning support automatic role assignment based on group membership. This is critical for scaling access control without manual intervention.
How It Works:
Example: Analytics Team Access
Let's say you want all members of the "analytics-team" group to have access to a specific set of datasets and dashboards.
In your identity provider:
Group: analytics-team
Members: [email protected], [email protected], [email protected]
In Superset configuration:
SAML_GROUP_MAPPING = {
"analytics-team": "Analytics"
}In Superset's role management:
The "Analytics" role is configured with permissions to view specific datasets and dashboards. When users are provisioned, they're automatically assigned this role.
Nested Groups:
Some identity providers support nested groups (e.g., "company:analytics-team"). You can handle these with regex or explicit mapping:
SAML_GROUP_MAPPING = {
"company:analytics-team": "Analytics",
"company:data-engineers": "Data Engineer",
"company:executives": "Admin"
}Dynamic Role Assignment:
For more complex scenarios, you might want to assign roles based on multiple attributes. For example, assign the "Finance" role only if a user is in the "finance-team" group AND has the "analyst" job title attribute.
This requires custom logic in Superset's security manager or in your SCIM endpoint. D23 handles this through custom attribute mapping and role assignment rules.
Automating user provisioning introduces new security considerations. Here are the key things to think about:
SCIM Token Security:
Your SCIM endpoint must be protected by a strong authentication mechanism. Use:
Authorization headerExample token validation:
@scim_bp.before_request
def validate_scim_token():
auth_header = request.headers.get('Authorization', '')
token = auth_header.replace('Bearer ', '')
if not token or token != os.getenv('SCIM_TOKEN'):
return {"error": "Unauthorized"}, 401Deprovisioning Delays:
When a user is deprovisioned (removed from your identity provider), there's often a delay before Superset is notified. During this window, the user can still access Superset.
Mitigation:
Attribute Injection:
If your identity provider is compromised, an attacker could provision themselves with admin privileges. Protect against this by:
For comprehensive security guidance, see Securing Your Superset Installation for Production.
Audit Logging:
Maintain detailed logs of all provisioning operations:
This is critical for compliance audits and incident investigation.
Once user provisioning is live, you need to monitor it and handle failures gracefully.
Common Issues:
Issue 1: Users Not Being Created
Issue 2: Groups Not Mapping to Roles
SAML_GROUP_MAPPING exactly match the identity provider's group names (case-sensitive)Issue 3: SCIM Token Failures
Monitoring Best Practices:
Health Check Endpoint:
Implement a health check endpoint that your monitoring system can poll:
@scim_bp.route('/health', methods=['GET'])
def health_check():
try:
# Check database connectivity
db.session.execute('SELECT 1')
return {"status": "healthy"}, 200
except Exception as e:
return {"status": "unhealthy", "error": str(e)}, 500If you're running multiple Superset instances (for high availability or multi-region deployments), user provisioning becomes more complex.
Challenge: Consistency Across Instances
When a user is provisioned via SCIM, only one Superset instance receives the request. The other instances don't know about the new user until they query the shared database.
If you're using a shared PostgreSQL database (recommended for production), this isn't a problem—all instances read from the same user table. But if you're using local SQLite databases per instance, you need a synchronization mechanism.
Solution 1: Shared Database (Recommended)
Use a single PostgreSQL database shared by all Superset instances. This is the simplest approach and ensures consistency:
# All instances connect to the same database
export SQLALCHEMY_DATABASE_URI="postgresql://user:[email protected]/superset"Solution 2: Event-Driven Sync
If you must use local databases, implement event-driven synchronization:
This is more complex and introduces operational overhead, so we recommend the shared database approach.
To help you choose the right pattern, here's a comparison:
| Criterion | SCIM | JIT |
|---|---|---|
| Setup Complexity | High (requires SCIM endpoint) | Low (configure SAML) |
| Deprovisioning | Automatic | Manual |
| Group Sync | Real-time | On next login |
| Operational Overhead | Medium | Low |
| Audit Trail | Excellent | Good |
| Scalability | Excellent (for large teams) | Good (for small teams) |
| Identity Provider Support | Limited (major providers only) | Universal (any SAML provider) |
| Cost | Higher (more infrastructure) | Lower (simpler setup) |
Building and maintaining user provisioning infrastructure is complex. At D23, we handle this for you. Our managed Superset platform includes:
This means you can focus on building dashboards and insights, not managing identity infrastructure. We handle the operational complexity, security hardening, and scaling.
If you're evaluating Superset for embedded analytics or self-serve BI, user provisioning is a critical part of the decision. Contact our team to discuss your specific requirements and see how we can accelerate your Superset deployment.
User provisioning is a foundational part of any production Superset deployment. Whether you choose SCIM or JIT depends on your organization's size, complexity, and requirements:
Regardless of which pattern you choose, focus on:
The investment in automated user provisioning pays dividends as your organization grows. No more manual user management, faster onboarding, and better security. That's the promise of SCIM and JIT provisioning in Superset.
For more guidance on securing your Superset deployment and managing users at scale, explore our Privacy Policy and Terms of Service to understand how we handle user data. Or reach out to learn how D23 simplifies Superset operations for data teams.