New: AI & text-to-SQL on your own SupersetBook a demo

Apache Superset18 Apr 2026

Connecting Apache Superset to Microsoft Fabric Lakehouses

Learn how to connect Apache Superset to Microsoft Fabric lakehouses via OneLake and Delta tables. Step-by-step integration guide for production analytics.

DTD23 Team

15 minutes read

Connecting Apache Superset to Microsoft Fabric Lakehouses

Microsoft Fabric lakehouses represent a modern approach to data architecture—combining the flexibility of data lakes with the queryability of data warehouses. If you're running Apache Superset as your analytics platform, connecting it directly to a Fabric lakehouse unlocks a powerful workflow: ingest raw data into OneLake, transform it with Delta tables, and surface insights through Superset dashboards without moving data between systems.

This guide walks through the technical setup, authentication patterns, and best practices for integrating Superset with Fabric lakehouses. Whether you're building embedded analytics for your product or centralizing BI across your organization, this integration eliminates the need for intermediate data exports and keeps your analytics layer lightweight and responsive.

Understanding the Architecture: Superset, Fabric, and OneLake

Before diving into connection details, it helps to understand what each component does and how they fit together.

Apache Superset is an open-source business intelligence platform that sits on top of your data layer. It handles visualization, dashboard creation, SQL querying, and—when configured with AI—text-to-SQL query generation. Superset doesn't store data itself; it connects to external databases and data warehouses via database drivers and executes queries against them. This architecture makes Superset lightweight and agnostic to your underlying data platform.

Microsoft Fabric is an integrated analytics platform that combines data engineering, data science, and business analytics under one SaaS umbrella. Within Fabric, a lakehouse is a unified data store that supports both structured and unstructured data. Lakehouses in Fabric are built on Delta Lake format (the same open standard used by Databricks) and are accessible through a SQL analytics endpoint—a managed SQL interface that lets external tools query lakehouse tables as if they were in a traditional data warehouse.

OneLake is the underlying cloud storage layer in Microsoft Fabric. When you create a lakehouse, its data is stored in OneLake, Microsoft's tenant-wide, multi-cloud storage abstraction. OneLake uses Delta format by default, which provides ACID transactions, schema enforcement, and time-travel capabilities—all valuable for analytics workloads.

The integration pattern is straightforward: Superset connects to the SQL analytics endpoint of a Fabric lakehouse, executes queries against Delta tables, and renders results in dashboards. This approach avoids data duplication and keeps your single source of truth in Fabric.

Why Connect Superset to Fabric Lakehouses?

Several scenarios make this integration compelling:

Cost Efficiency: Fabric's consumption-based pricing model (measured in Fabric Capacity Units) can be more predictable than per-seat BI tool licensing. By using Superset—an open-source tool—alongside Fabric, you avoid per-user seat costs from Looker, Tableau, or Power BI while maintaining enterprise-grade analytics infrastructure.

Embedded Analytics: If you're building analytics features into your product, Superset's lightweight architecture and API-first design make it ideal for embedding. Connecting it to Fabric lakehouses lets you scale analytics without managing separate databases or ETL pipelines. D23's managed Superset platform, for instance, specializes in exactly this use case—providing hosted Superset with AI integration and API endpoints for embedding dashboards and self-serve BI into applications.

Data Lakehouse Flexibility: Fabric lakehouses support both structured (via SQL endpoint) and unstructured data (via OneLake). Superset queries the structured layer, but you maintain the flexibility to run Apache Spark notebooks, Python scripts, or other data engineering workloads against the same data.

Reduced Data Movement: Without this integration, you might export data from Fabric to a separate analytics database. Direct connection eliminates that step, reducing latency, storage costs, and the risk of stale data.

Open-Source Advantage: Unlike Power BI (which is tightly integrated with Fabric), Superset remains vendor-neutral. You can query Fabric lakehouses, Snowflake, BigQuery, or any supported database from the same platform. This flexibility matters for organizations with heterogeneous data stacks or those evaluating multiple analytics tools.

Prerequisites and Setup Requirements

Before connecting Superset to a Fabric lakehouse, ensure you have:

Fabric Environment:

A Microsoft Fabric workspace with appropriate licensing (Fabric capacity or trial)
At least one lakehouse created in that workspace
Delta tables populated with data (via Dataflows, pipelines, notebooks, or direct ingestion)
Access to the lakehouse's SQL analytics endpoint URL

Superset Deployment:

A running instance of Apache Superset (self-hosted or managed via a provider like D23)
Admin access to add database connections
Network connectivity to the Fabric SQL analytics endpoint (ensure firewall rules permit outbound HTTPS to Microsoft Fabric endpoints)

Authentication Credentials:

A Microsoft Entra ID (Azure AD) service principal or user account with permission to query the lakehouse
Service principal credentials (client ID, client secret, and tenant ID) if using service principal authentication
Or, username and password if using SQL authentication (less common for Fabric)

SQL Driver:

The Microsoft ODBC Driver for SQL Server or the Python pyodbc library (Superset's default for SQL Server connections)
Alternatively, the pymssql library

For detailed guidance on setting up Fabric lakehouse connections, refer to Microsoft Fabric's official lakehouse connection documentation and the Matillion authentication guide for Fabric Lakehouses, which covers SQL analytics endpoint authentication patterns applicable to any SQL client.

Step 1: Locate Your Fabric Lakehouse SQL Analytics Endpoint

The SQL analytics endpoint is the gateway for external tools to query a Fabric lakehouse. Here's how to find it:

In Microsoft Fabric:

Navigate to your workspace and open the lakehouse
In the top-right corner, click "SQL analytics endpoint"
In the SQL analytics endpoint view, look for the connection string or server name. It typically follows this pattern: <workspace-name>-<lakehouse-name>.<region>.fabric.microsoft.com
Copy the full connection string or note the server name and database name (which is the lakehouse name)

Example Connection Details:

Server: contoso-analytics-lakehouse.westus2.fabric.microsoft.com
Database: contoso_analytics_lakehouse
Port: 1433 (default SQL Server port)

Keep these details handy—you'll need them when configuring Superset.

Step 2: Authenticate to Fabric Lakehouses

Fabric supports multiple authentication methods for SQL analytics endpoints. The most common for external tools like Superset are:

Service Principal Authentication (Recommended for Production)

A service principal is a non-human identity in Microsoft Entra ID that can be granted permissions to access Fabric resources. This approach is ideal for automated, unattended scenarios like analytics dashboards.

To set up a service principal:

In Azure AD, create a new app registration (or use an existing one)
Generate a client secret and note the client ID and tenant ID
In your Fabric workspace, grant the service principal access:
- Go to Workspace Settings → Manage Access
- Add the service principal with "Member" or "Admin" role
In the SQL analytics endpoint, ensure the service principal has query permissions (this is inherited from workspace access)

For detailed steps, consult the Fabric OneLake ingestion documentation, which covers service principal setup in the context of Fabric data access.

User Account Authentication

If you're connecting from a single user's machine or a development environment, you can use your Microsoft Entra ID credentials directly. Superset will prompt for username and password or use integrated Windows authentication if available.

Token-Based Authentication

For advanced scenarios, you can obtain an access token from Microsoft Entra ID and pass it to the SQL connection. This is more complex but useful if you're building custom integrations.

For production deployments, service principal authentication is strongly recommended because it doesn't rely on individual user credentials and can be rotated independently.

Step 3: Configure the Database Connection in Superset

Once you have your SQL analytics endpoint details and authentication credentials, add the connection to Superset.

Access the Superset Database Configuration:

Log into Superset as an admin user
Navigate to Settings → Database Connections (or Data → Databases, depending on your Superset version)
Click "+ Database" to add a new connection

Select the Database Type:

Look for "Microsoft SQL Server" or "SQL Server" in the list of supported databases. Superset uses the pyodbc or pymssql driver for SQL Server connections.

Enter Connection Details:

The connection form will ask for:

Database Name: The name of your lakehouse (e.g., contoso_analytics_lakehouse)
Host: The SQL analytics endpoint server name (e.g., contoso-analytics-lakehouse.westus2.fabric.microsoft.com)
Port: 1433 (default)
Username: For service principal auth, use the client ID; for user auth, use your email or UPN
Password: The client secret (for service principal) or your password (for user auth)

Connection String (Advanced):

If Superset's UI doesn't provide all necessary fields, you can enter a raw connection string. For service principal authentication with pyodbc, use:

mssql+pyodbc://CLIENT_ID:CLIENT_SECRET@SERVER_NAME:1433/DATABASE_NAME?driver=ODBC+Driver+17+for+SQL+Server&Encrypt=yes&TrustServerCertificate=no&Connection+Timeout=30

Replace:

CLIENT_ID with your service principal's client ID
CLIENT_SECRET with the client secret
SERVER_NAME with your Fabric SQL analytics endpoint
DATABASE_NAME with your lakehouse name

For detailed guidance on Superset database connections, refer to the official Superset database documentation, which covers connection string formats and driver configuration for SQL-based databases.

Enable Additional Options:

Allow DML: Uncheck this unless you need to run INSERT, UPDATE, or DELETE queries (usually not needed for analytics)
Allow CTAS: Leave unchecked to prevent accidental table creation
Allow CVAS: Leave unchecked similarly
Expose in SQL Lab: Check this to allow users to write custom SQL queries against the lakehouse

Test the Connection:

Before saving, click "Test Connection" to verify that Superset can reach the SQL analytics endpoint and authenticate successfully. If the test fails, check:

Network connectivity (firewall rules, VPN, IP allowlisting)
Authentication credentials (client ID, secret, or password)
Service principal workspace permissions
Lakehouse SQL analytics endpoint status (ensure it's not paused)

Once the test passes, save the connection.

Step 4: Import Tables and Create Datasets

After the database connection is established, Superset needs to know which tables to expose for dashboard building.

Refresh the Table List:

Go to Data → Databases
Click on your Fabric lakehouse connection
Click "Refresh" or "Sync Tables" to import all available tables from the lakehouse

Superset will query the SQL analytics endpoint's metadata and list all Delta tables in your lakehouse.

Create Datasets (Optional but Recommended)

Datasets are Superset's abstraction layer over raw tables. They allow you to:

Define virtual columns (computed fields)
Set default aggregations
Create metrics and dimensions
Control which columns users can see
Simplify the dashboard-building experience

To create a dataset:

Go to Data → Datasets
Click "+ Dataset"
Select your Fabric lakehouse database and choose a table
Define columns, metrics, and filters as needed
Save the dataset

Datasets are optional—you can build dashboards directly against raw tables—but they provide governance and consistency, especially in self-serve BI environments.

Step 5: Build Dashboards Against Fabric Data

With tables imported and datasets created, you can now build dashboards using Fabric lakehouse data.

Create a New Dashboard:

Go to Dashboards → + Dashboard
Give it a name and description
Click "Create"

Add Charts:

Click "+ Chart" to add a visualization
Select your Fabric dataset or table
Choose a chart type (bar, line, table, map, etc.)
Define dimensions (grouping columns) and metrics (aggregations like SUM, COUNT, AVG)
Configure filters, sorting, and drill-down behavior
Save the chart to the dashboard

Use SQL Lab for Complex Queries:

If you need to write custom SQL against Fabric Delta tables, use SQL Lab:

Go to SQL → SQL Lab
Select your Fabric lakehouse database
Write SQL queries against the lakehouse tables
Execute and visualize results
Save as a chart or dataset for reuse

Example query against a Fabric lakehouse:

SELECT 
  DATE_TRUNC('month', order_date) AS month,
  product_category,
  SUM(revenue) AS total_revenue,
  COUNT(DISTINCT order_id) AS order_count
FROM sales_fact
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date), product_category
ORDER BY month DESC, total_revenue DESC

This query groups sales by month and product category, pulling data directly from Delta tables in your Fabric lakehouse.

Performance Optimization and Best Practices

Connecting Superset to Fabric lakehouses introduces some performance considerations. Here are strategies to optimize:

Index and Partition Delta Tables

Fabric's SQL analytics endpoint benefits from well-designed Delta tables. Partition large tables by date or region to reduce query scans:

# Example: Partitioned Delta table in Fabric notebook
df.write.format("delta") \
  .mode("overwrite") \
  .partitionBy("year", "month") \
  .save("abfss://[email protected]/Tables/sales_fact")

Partitioning allows the SQL analytics endpoint to prune partitions during queries, significantly reducing latency.

Caching and Materialized Views

For frequently accessed aggregations, consider creating materialized views or pre-aggregated tables in Fabric:

CREATE TABLE revenue_by_month AS
SELECT 
  DATE_TRUNC('month', order_date) AS month,
  SUM(revenue) AS total_revenue
FROM sales_fact
GROUP BY DATE_TRUNC('month', order_date)

Then query this aggregated table in Superset instead of the raw fact table. This reduces query time and Fabric capacity consumption.

Use Superset's Caching

Superset includes built-in query result caching. Configure it to cache dashboard queries:

Go to Settings → Advanced → Cache Configuration
Set a cache timeout (e.g., 1 hour)
Enable caching for dashboards

This prevents redundant queries to Fabric when multiple users view the same dashboard.

Limit Query Complexity

Avoid overly complex queries with many joins or subqueries. If a query takes more than 30 seconds to execute, consider:

Breaking it into multiple smaller charts
Pre-aggregating data in Fabric
Using Superset's drill-down filters to reduce result sets

Monitor Fabric Capacity

Fabric charges for query execution and data movement. Monitor your capacity usage:

In Fabric, go to Admin Portal → Capacity → Usage
Identify heavy queries and optimize them
Consider increasing capacity if usage is consistently high

For architectural guidance on optimizing Fabric lakehouses, see the comprehensive guide to building lakehouse architecture with Microsoft Fabric.

AI-Powered Analytics: Text-to-SQL with Superset

One of the most powerful features of modern Superset deployments is AI-assisted query generation—text-to-SQL. This allows users to ask questions in natural language, and the system generates SQL automatically.

If you're using a managed Superset platform like D23, text-to-SQL is often pre-configured. For self-hosted Superset, you can enable this feature by integrating with an LLM provider (OpenAI, Anthropic, etc.).

Example Text-to-SQL Workflow:

User: "Show me total revenue by product category for the last quarter"

Superset's AI layer generates:

SELECT 
  product_category,
  SUM(revenue) AS total_revenue
FROM sales_fact
WHERE order_date >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '1 quarter'
GROUP BY product_category
ORDER BY total_revenue DESC

This query is then executed against your Fabric lakehouse and results are visualized. Text-to-SQL dramatically accelerates self-serve BI adoption because users don't need SQL skills to explore data.

For more on how Superset compares to other BI tools in the context of AI integration, see the Preset blog comparing Apache Superset to Power BI, which covers Superset's strengths in open-source flexibility and extensibility.

Security and Governance Considerations

When connecting Superset to Fabric lakehouses, security is paramount.

Service Principal Isolation

Create a dedicated service principal for Superset's Fabric connection. This allows you to:

Control exactly which lakehouse tables Superset can access
Rotate credentials independently
Audit queries to Fabric (all queries appear as coming from the service principal)
Revoke access without affecting other applications

Row-Level Security (RLS)

Fabric supports row-level security on Delta tables. If you have sensitive data (e.g., sales data for specific regions), you can restrict access at the table level:

In Fabric, configure RLS on the Delta table
Map Superset users to Fabric security roles
Queries automatically filter results based on the authenticated user

Note: This requires additional configuration and is more commonly used in Power BI. For Superset, consider applying filters at the dataset level instead.

Encryption and Network Security

Ensure SSL/TLS encryption for all connections between Superset and Fabric (HTTPS for API calls, encrypted SQL connections)
Use private endpoints if connecting from a private network
Enable IP allowlisting on the Fabric SQL analytics endpoint if possible

Audit Logging

Both Superset and Fabric maintain audit logs:

Superset logs all dashboard views, query executions, and user actions
Fabric logs all data access and queries

Review these logs periodically to identify unusual activity or unauthorized access attempts.

Troubleshooting Common Connection Issues

"Connection Timeout" Error

This usually indicates a network issue:

Verify the SQL analytics endpoint URL is correct
Check that your firewall allows outbound HTTPS to Microsoft Fabric endpoints
Ensure the lakehouse SQL analytics endpoint is not paused (it may auto-pause after inactivity)

"Authentication Failed" Error

Double-check credentials:

For service principal: Verify client ID, secret, and tenant ID
For user auth: Confirm username (email or UPN) and password
Ensure the service principal or user has workspace access in Fabric

"Table Not Found" Error

This means Superset can connect but can't see tables:

Click "Refresh" on the database connection to re-sync tables
Verify tables exist in the lakehouse SQL analytics endpoint (you can query them directly in Fabric)
Check that the service principal or user has SELECT permissions on the tables

Slow Query Performance

If dashboards load slowly:

Check Fabric capacity usage (may be at limit)
Optimize the underlying Delta table (add partitioning, remove unnecessary columns)
Enable Superset query caching
Break complex queries into multiple simpler queries

Integration with D23's Managed Superset Platform

If you're evaluating Superset for production use, D23 provides a managed platform that simplifies deployment and adds enterprise features.

D23 offers:

Pre-configured Superset with AI/LLM integration for text-to-SQL and natural language dashboards
API-first architecture for embedding analytics into products
Managed hosting with automatic scaling, backups, and security updates
Expert data consulting to help design schemas, optimize queries, and architect self-serve BI
MCP server integration for connecting Superset to various data sources and LLMs

When using D23 with Fabric lakehouses, the Fabric connection is managed through D23's database configuration interface, but the underlying process is identical to self-hosted Superset. D23 handles scaling, security, and performance optimization, freeing your team to focus on analytics rather than infrastructure.

For teams at scale-ups and mid-market companies, D23 eliminates the operational overhead of managing Superset while providing the flexibility of open-source BI without the per-seat costs of Looker, Tableau, or Power BI. Visit D23's homepage to learn more about managed Superset and embedded analytics capabilities.

Real-World Example: Building a Sales Dashboard

Let's walk through a concrete example: building a sales dashboard that pulls data from a Fabric lakehouse.

Scenario: A B2B SaaS company stores customer transactions in a Fabric lakehouse (via daily Dataflows from their transactional database). They want to build a dashboard showing revenue trends, top customers, and regional performance.

Step 1: Prepare Data in Fabric

In Fabric, they have Delta tables:

customers (customer_id, customer_name, region, industry)
orders (order_id, customer_id, order_date, amount, product_category)

Step 2: Connect Superset to Fabric

Following the steps above, they add their Fabric lakehouse as a database connection in Superset.

Step 3: Create Datasets

They create a dataset combining orders and customers via a join:

SELECT 
  o.order_id,
  o.order_date,
  o.amount,
  o.product_category,
  c.customer_name,
  c.region,
  c.industry
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id

They define metrics: total_revenue (SUM of amount), order_count (COUNT of order_id), avg_order_value (AVG of amount).

Step 4: Build Charts

Revenue by Month: Line chart, X-axis = order_date (truncated to month), Y-axis = total_revenue
Top 10 Customers: Bar chart, X-axis = customer_name, Y-axis = total_revenue, sorted descending
Revenue by Region: Pie chart, showing proportion of revenue by region
Product Category Performance: Table showing product_category, total_revenue, order_count, avg_order_value

Step 5: Assemble Dashboard

They create a dashboard called "Sales Overview" and add all four charts. They add filters at the dashboard level:

Date range filter (order_date)
Region filter (multi-select)
Industry filter (multi-select)

Users can now slice the data dynamically without touching SQL.

Step 6: Enable Embedding (Optional)

If the company wants to embed this dashboard in their product (e.g., showing customer-specific revenue in their customer portal), they use Superset's embedding API. With D23's managed platform, this is simplified—the platform provides pre-built embedding SDKs and handles authentication.

Advanced: Using Apache Spark with Superset and Fabric

For data engineering teams, Fabric notebooks allow you to run Apache Spark code directly against lakehouse data. While Superset can't directly execute Spark code, you can use notebooks to create pre-aggregated tables that Superset queries:

Example Spark Notebook in Fabric:

# Read Delta table from lakehouse
df = spark.read.table("sales_fact")
 
# Complex transformation
result = df.groupBy("product_category", "region") \
  .agg({"revenue": "sum", "order_id": "count"}) \
  .withColumnRenamed("sum(revenue)", "total_revenue") \
  .withColumnRenamed("count(order_id)", "order_count")
 
# Write aggregated result back to lakehouse
result.write.format("delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("sales_summary")

Then in Superset, query sales_summary directly. This pattern allows data engineers to build complex transformations while analytics teams use Superset for visualization.

For more on Apache Superset's capabilities and setup, see the dbt blog post explaining Apache Superset features and database connectivity.

Conclusion

Connecting Apache Superset to Microsoft Fabric lakehouses creates a modern, cost-effective analytics stack. Superset's lightweight architecture and open-source flexibility pair well with Fabric's lakehouse model, giving you a single source of truth for data while maintaining the agility to build dashboards, embed analytics, and empower self-serve BI.

The integration is straightforward: configure SQL analytics endpoint access, add the connection to Superset, import tables, and build dashboards. Performance and security follow standard practices—partition your Delta tables, use service principals for authentication, enable caching, and monitor Fabric capacity.

For teams looking to avoid the per-seat costs of Looker, Tableau, or Power BI while maintaining enterprise-grade analytics, this approach delivers significant value. If managing Superset infrastructure feels like overhead, managed platforms like D23 handle deployment, scaling, and AI integration, letting you focus on analytics rather than operations.

Whether you're a data leader at a scale-up, an engineering team embedding analytics into your product, or a CTO evaluating open-source BI alternatives, Superset + Fabric lakehouses is a compelling, flexible, and cost-effective choice.