Learn how to connect Apache Superset to Microsoft Fabric lakehouses via OneLake and Delta tables. Step-by-step integration guide for production analytics.
Microsoft Fabric lakehouses represent a modern approach to data architecture—combining the flexibility of data lakes with the queryability of data warehouses. If you're running Apache Superset as your analytics platform, connecting it directly to a Fabric lakehouse unlocks a powerful workflow: ingest raw data into OneLake, transform it with Delta tables, and surface insights through Superset dashboards without moving data between systems.
This guide walks through the technical setup, authentication patterns, and best practices for integrating Superset with Fabric lakehouses. Whether you're building embedded analytics for your product or centralizing BI across your organization, this integration eliminates the need for intermediate data exports and keeps your analytics layer lightweight and responsive.
Before diving into connection details, it helps to understand what each component does and how they fit together.
Apache Superset is an open-source business intelligence platform that sits on top of your data layer. It handles visualization, dashboard creation, SQL querying, and—when configured with AI—text-to-SQL query generation. Superset doesn't store data itself; it connects to external databases and data warehouses via database drivers and executes queries against them. This architecture makes Superset lightweight and agnostic to your underlying data platform.
Microsoft Fabric is an integrated analytics platform that combines data engineering, data science, and business analytics under one SaaS umbrella. Within Fabric, a lakehouse is a unified data store that supports both structured and unstructured data. Lakehouses in Fabric are built on Delta Lake format (the same open standard used by Databricks) and are accessible through a SQL analytics endpoint—a managed SQL interface that lets external tools query lakehouse tables as if they were in a traditional data warehouse.
OneLake is the underlying cloud storage layer in Microsoft Fabric. When you create a lakehouse, its data is stored in OneLake, Microsoft's tenant-wide, multi-cloud storage abstraction. OneLake uses Delta format by default, which provides ACID transactions, schema enforcement, and time-travel capabilities—all valuable for analytics workloads.
The integration pattern is straightforward: Superset connects to the SQL analytics endpoint of a Fabric lakehouse, executes queries against Delta tables, and renders results in dashboards. This approach avoids data duplication and keeps your single source of truth in Fabric.
Several scenarios make this integration compelling:
Cost Efficiency: Fabric's consumption-based pricing model (measured in Fabric Capacity Units) can be more predictable than per-seat BI tool licensing. By using Superset—an open-source tool—alongside Fabric, you avoid per-user seat costs from Looker, Tableau, or Power BI while maintaining enterprise-grade analytics infrastructure.
Embedded Analytics: If you're building analytics features into your product, Superset's lightweight architecture and API-first design make it ideal for embedding. Connecting it to Fabric lakehouses lets you scale analytics without managing separate databases or ETL pipelines. D23's managed Superset platform, for instance, specializes in exactly this use case—providing hosted Superset with AI integration and API endpoints for embedding dashboards and self-serve BI into applications.
Data Lakehouse Flexibility: Fabric lakehouses support both structured (via SQL endpoint) and unstructured data (via OneLake). Superset queries the structured layer, but you maintain the flexibility to run Apache Spark notebooks, Python scripts, or other data engineering workloads against the same data.
Reduced Data Movement: Without this integration, you might export data from Fabric to a separate analytics database. Direct connection eliminates that step, reducing latency, storage costs, and the risk of stale data.
Open-Source Advantage: Unlike Power BI (which is tightly integrated with Fabric), Superset remains vendor-neutral. You can query Fabric lakehouses, Snowflake, BigQuery, or any supported database from the same platform. This flexibility matters for organizations with heterogeneous data stacks or those evaluating multiple analytics tools.
Before connecting Superset to a Fabric lakehouse, ensure you have:
Fabric Environment:
Superset Deployment:
Authentication Credentials:
SQL Driver:
pyodbc library (Superset's default for SQL Server connections)pymssql libraryFor detailed guidance on setting up Fabric lakehouse connections, refer to Microsoft Fabric's official lakehouse connection documentation and the Matillion authentication guide for Fabric Lakehouses, which covers SQL analytics endpoint authentication patterns applicable to any SQL client.
The SQL analytics endpoint is the gateway for external tools to query a Fabric lakehouse. Here's how to find it:
In Microsoft Fabric:
<workspace-name>-<lakehouse-name>.<region>.fabric.microsoft.comExample Connection Details:
contoso-analytics-lakehouse.westus2.fabric.microsoft.comcontoso_analytics_lakehouse1433 (default SQL Server port)Keep these details handy—you'll need them when configuring Superset.
Fabric supports multiple authentication methods for SQL analytics endpoints. The most common for external tools like Superset are:
Service Principal Authentication (Recommended for Production)
A service principal is a non-human identity in Microsoft Entra ID that can be granted permissions to access Fabric resources. This approach is ideal for automated, unattended scenarios like analytics dashboards.
To set up a service principal:
For detailed steps, consult the Fabric OneLake ingestion documentation, which covers service principal setup in the context of Fabric data access.
User Account Authentication
If you're connecting from a single user's machine or a development environment, you can use your Microsoft Entra ID credentials directly. Superset will prompt for username and password or use integrated Windows authentication if available.
Token-Based Authentication
For advanced scenarios, you can obtain an access token from Microsoft Entra ID and pass it to the SQL connection. This is more complex but useful if you're building custom integrations.
For production deployments, service principal authentication is strongly recommended because it doesn't rely on individual user credentials and can be rotated independently.
Once you have your SQL analytics endpoint details and authentication credentials, add the connection to Superset.
Access the Superset Database Configuration:
Select the Database Type:
Look for "Microsoft SQL Server" or "SQL Server" in the list of supported databases. Superset uses the pyodbc or pymssql driver for SQL Server connections.
Enter Connection Details:
The connection form will ask for:
contoso_analytics_lakehouse)contoso-analytics-lakehouse.westus2.fabric.microsoft.com)1433 (default)Connection String (Advanced):
If Superset's UI doesn't provide all necessary fields, you can enter a raw connection string. For service principal authentication with pyodbc, use:
mssql+pyodbc://CLIENT_ID:CLIENT_SECRET@SERVER_NAME:1433/DATABASE_NAME?driver=ODBC+Driver+17+for+SQL+Server&Encrypt=yes&TrustServerCertificate=no&Connection+Timeout=30
Replace:
CLIENT_ID with your service principal's client IDCLIENT_SECRET with the client secretSERVER_NAME with your Fabric SQL analytics endpointDATABASE_NAME with your lakehouse nameFor detailed guidance on Superset database connections, refer to the official Superset database documentation, which covers connection string formats and driver configuration for SQL-based databases.
Enable Additional Options:
Test the Connection:
Before saving, click "Test Connection" to verify that Superset can reach the SQL analytics endpoint and authenticate successfully. If the test fails, check:
Once the test passes, save the connection.
After the database connection is established, Superset needs to know which tables to expose for dashboard building.
Refresh the Table List:
Superset will query the SQL analytics endpoint's metadata and list all Delta tables in your lakehouse.
Create Datasets (Optional but Recommended)
Datasets are Superset's abstraction layer over raw tables. They allow you to:
To create a dataset:
Datasets are optional—you can build dashboards directly against raw tables—but they provide governance and consistency, especially in self-serve BI environments.
With tables imported and datasets created, you can now build dashboards using Fabric lakehouse data.
Create a New Dashboard:
Add Charts:
Use SQL Lab for Complex Queries:
If you need to write custom SQL against Fabric Delta tables, use SQL Lab:
Example query against a Fabric lakehouse:
SELECT
DATE_TRUNC('month', order_date) AS month,
product_category,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT order_id) AS order_count
FROM sales_fact
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date), product_category
ORDER BY month DESC, total_revenue DESCThis query groups sales by month and product category, pulling data directly from Delta tables in your Fabric lakehouse.
Connecting Superset to Fabric lakehouses introduces some performance considerations. Here are strategies to optimize:
Index and Partition Delta Tables
Fabric's SQL analytics endpoint benefits from well-designed Delta tables. Partition large tables by date or region to reduce query scans:
# Example: Partitioned Delta table in Fabric notebook
df.write.format("delta") \
.mode("overwrite") \
.partitionBy("year", "month") \
.save("abfss://[email protected]/Tables/sales_fact")Partitioning allows the SQL analytics endpoint to prune partitions during queries, significantly reducing latency.
Caching and Materialized Views
For frequently accessed aggregations, consider creating materialized views or pre-aggregated tables in Fabric:
CREATE TABLE revenue_by_month AS
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS total_revenue
FROM sales_fact
GROUP BY DATE_TRUNC('month', order_date)Then query this aggregated table in Superset instead of the raw fact table. This reduces query time and Fabric capacity consumption.
Use Superset's Caching
Superset includes built-in query result caching. Configure it to cache dashboard queries:
This prevents redundant queries to Fabric when multiple users view the same dashboard.
Limit Query Complexity
Avoid overly complex queries with many joins or subqueries. If a query takes more than 30 seconds to execute, consider:
Monitor Fabric Capacity
Fabric charges for query execution and data movement. Monitor your capacity usage:
For architectural guidance on optimizing Fabric lakehouses, see the comprehensive guide to building lakehouse architecture with Microsoft Fabric.
One of the most powerful features of modern Superset deployments is AI-assisted query generation—text-to-SQL. This allows users to ask questions in natural language, and the system generates SQL automatically.
If you're using a managed Superset platform like D23, text-to-SQL is often pre-configured. For self-hosted Superset, you can enable this feature by integrating with an LLM provider (OpenAI, Anthropic, etc.).
Example Text-to-SQL Workflow:
User: "Show me total revenue by product category for the last quarter"
Superset's AI layer generates:
SELECT
product_category,
SUM(revenue) AS total_revenue
FROM sales_fact
WHERE order_date >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '1 quarter'
GROUP BY product_category
ORDER BY total_revenue DESCThis query is then executed against your Fabric lakehouse and results are visualized. Text-to-SQL dramatically accelerates self-serve BI adoption because users don't need SQL skills to explore data.
For more on how Superset compares to other BI tools in the context of AI integration, see the Preset blog comparing Apache Superset to Power BI, which covers Superset's strengths in open-source flexibility and extensibility.
When connecting Superset to Fabric lakehouses, security is paramount.
Service Principal Isolation
Create a dedicated service principal for Superset's Fabric connection. This allows you to:
Row-Level Security (RLS)
Fabric supports row-level security on Delta tables. If you have sensitive data (e.g., sales data for specific regions), you can restrict access at the table level:
Note: This requires additional configuration and is more commonly used in Power BI. For Superset, consider applying filters at the dataset level instead.
Encryption and Network Security
Audit Logging
Both Superset and Fabric maintain audit logs:
Review these logs periodically to identify unusual activity or unauthorized access attempts.
"Connection Timeout" Error
This usually indicates a network issue:
"Authentication Failed" Error
Double-check credentials:
"Table Not Found" Error
This means Superset can connect but can't see tables:
Slow Query Performance
If dashboards load slowly:
If you're evaluating Superset for production use, D23 provides a managed platform that simplifies deployment and adds enterprise features.
D23 offers:
When using D23 with Fabric lakehouses, the Fabric connection is managed through D23's database configuration interface, but the underlying process is identical to self-hosted Superset. D23 handles scaling, security, and performance optimization, freeing your team to focus on analytics rather than infrastructure.
For teams at scale-ups and mid-market companies, D23 eliminates the operational overhead of managing Superset while providing the flexibility of open-source BI without the per-seat costs of Looker, Tableau, or Power BI. Visit D23's homepage to learn more about managed Superset and embedded analytics capabilities.
Let's walk through a concrete example: building a sales dashboard that pulls data from a Fabric lakehouse.
Scenario: A B2B SaaS company stores customer transactions in a Fabric lakehouse (via daily Dataflows from their transactional database). They want to build a dashboard showing revenue trends, top customers, and regional performance.
Step 1: Prepare Data in Fabric
In Fabric, they have Delta tables:
customers (customer_id, customer_name, region, industry)orders (order_id, customer_id, order_date, amount, product_category)Step 2: Connect Superset to Fabric
Following the steps above, they add their Fabric lakehouse as a database connection in Superset.
Step 3: Create Datasets
They create a dataset combining orders and customers via a join:
SELECT
o.order_id,
o.order_date,
o.amount,
o.product_category,
c.customer_name,
c.region,
c.industry
FROM orders o
JOIN customers c ON o.customer_id = c.customer_idThey define metrics: total_revenue (SUM of amount), order_count (COUNT of order_id), avg_order_value (AVG of amount).
Step 4: Build Charts
Step 5: Assemble Dashboard
They create a dashboard called "Sales Overview" and add all four charts. They add filters at the dashboard level:
Users can now slice the data dynamically without touching SQL.
Step 6: Enable Embedding (Optional)
If the company wants to embed this dashboard in their product (e.g., showing customer-specific revenue in their customer portal), they use Superset's embedding API. With D23's managed platform, this is simplified—the platform provides pre-built embedding SDKs and handles authentication.
For data engineering teams, Fabric notebooks allow you to run Apache Spark code directly against lakehouse data. While Superset can't directly execute Spark code, you can use notebooks to create pre-aggregated tables that Superset queries:
Example Spark Notebook in Fabric:
# Read Delta table from lakehouse
df = spark.read.table("sales_fact")
# Complex transformation
result = df.groupBy("product_category", "region") \
.agg({"revenue": "sum", "order_id": "count"}) \
.withColumnRenamed("sum(revenue)", "total_revenue") \
.withColumnRenamed("count(order_id)", "order_count")
# Write aggregated result back to lakehouse
result.write.format("delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("sales_summary")Then in Superset, query sales_summary directly. This pattern allows data engineers to build complex transformations while analytics teams use Superset for visualization.
For more on Apache Superset's capabilities and setup, see the dbt blog post explaining Apache Superset features and database connectivity.
Connecting Apache Superset to Microsoft Fabric lakehouses creates a modern, cost-effective analytics stack. Superset's lightweight architecture and open-source flexibility pair well with Fabric's lakehouse model, giving you a single source of truth for data while maintaining the agility to build dashboards, embed analytics, and empower self-serve BI.
The integration is straightforward: configure SQL analytics endpoint access, add the connection to Superset, import tables, and build dashboards. Performance and security follow standard practices—partition your Delta tables, use service principals for authentication, enable caching, and monitor Fabric capacity.
For teams looking to avoid the per-seat costs of Looker, Tableau, or Power BI while maintaining enterprise-grade analytics, this approach delivers significant value. If managing Superset infrastructure feels like overhead, managed platforms like D23 handle deployment, scaling, and AI integration, letting you focus on analytics rather than operations.
Whether you're a data leader at a scale-up, an engineering team embedding analytics into your product, or a CTO evaluating open-source BI alternatives, Superset + Fabric lakehouses is a compelling, flexible, and cost-effective choice.