Build a production-grade analytics stack with BigQuery, dbt, and Apache Superset. Learn architecture, best practices, and why open-source BI wins.
You're evaluating analytics platforms and you keep hearing the same story: proprietary tools like Looker, Tableau, and Power BI lock you into expensive licensing, vendor-specific workflows, and limited flexibility. Meanwhile, your engineering team is already familiar with open-source tooling, version control, and infrastructure-as-code. You want something that plays well with your existing data stack, doesn't require a separate contract negotiation every year, and lets you own your analytics layer the same way you own your application code.
The combination of BigQuery, dbt, and Apache Superset addresses exactly that need. This isn't a theoretical architecture—it's the backbone of how data-driven organizations at scale-ups and mid-market companies are building modern analytics without platform overhead.
BigQuery serves as your data warehouse: a fully managed, serverless compute engine that scales to petabytes without you managing infrastructure. dbt transforms raw data into trusted, modeled datasets using SQL and version control. Apache Superset provides the visualization and exploration layer, letting teams query those modeled datasets without writing SQL. Together, they form a cohesive, open-source-friendly stack that separates concerns cleanly and keeps costs predictable.
This article walks you through why this combination works, how to architect it, and the real-world considerations that separate a proof-of-concept from a production system.
BigQuery is Google Cloud's enterprise data warehouse. Unlike traditional data warehouses that require you to provision and manage clusters, BigQuery is fully managed and serverless—you pay only for the data you scan and the compute you use.
Key characteristics that make it a natural fit for this stack:
For analytics specifically, BigQuery's columnar architecture means that a query aggregating a single column across a billion rows scans only that column's data, not the entire table. That efficiency translates directly to lower query costs and faster dashboard load times—critical when you're embedding analytics or serving hundreds of concurrent users.
The BigQuery Documentation provides comprehensive guides on setup, querying, and integration patterns, including how to optimize queries for cost and performance.
dbt (data build tool) is a command-line tool and development framework that lets you write data transformations in SQL, version control them, test them, and document them—all the practices you already use for application code.
Instead of writing one-off SQL scripts or using a visual ETL tool, you define transformations as dbt models (SQL SELECT statements) organized in a project structure. dbt handles the orchestration: it figures out the dependency graph, runs models in the right order, and materializes them as tables or views in your warehouse.
Why dbt matters for this stack:
The dbt Documentation covers modeling patterns, testing strategies, and integration with warehouses like BigQuery. For many teams, dbt becomes the single source of truth for how data is transformed and what it means.
Apache Superset is an open-source, modern data visualization and business intelligence tool. Unlike Tableau or Looker, which are proprietary and expensive, Superset is a community-driven project maintained under the Apache Software Foundation.
Superset lets you:
Superset's strength is flexibility combined with simplicity. It's not as polished as Tableau out of the box, but it's deeply customizable, and you're not locked into a vendor's design philosophy. You can D23 - Dashboards, Embedded Analytics & Self-Serve BI on Apache Superset™ to get a managed, production-ready version with expert support, or run it yourself.
The Apache Superset Documentation provides installation guides, feature overviews, and integration patterns for connecting to BigQuery and other data sources.
Understanding the flow of data through this stack clarifies why it's so powerful:
Raw data lands in BigQuery: Your data pipelines (Dataflow, Airflow, or other tools) ingest raw data into BigQuery tables. This might be transactional data from your application, logs from your infrastructure, or third-party APIs.
dbt transforms the raw data: dbt models read from those raw tables and apply business logic—joins, aggregations, filtering, calculations. The output is a set of clean, modeled tables (or views) that represent your business metrics, customer dimensions, and fact tables.
Superset queries the modeled data: Superset connects to BigQuery and queries the dbt-modeled tables. Users either build dashboards by selecting columns and metrics, or they write ad-hoc SQL queries in SQL Lab to explore the data.
Dashboards and insights flow to stakeholders: Whether embedded in a product, shared via URL, or accessed through a web interface, Superset serves the analytics to end users—data teams, executives, customers, or internal tools.
This separation of concerns is crucial. Your data engineers own the dbt project and ensure data quality. Your analytics engineers build Superset dashboards on top of trusted dbt models. Your business users explore data without needing to understand the underlying schema or write complex SQL.
Each layer has a clear responsibility, and changes propagate cleanly: if a dbt model is updated (e.g., a metric calculation is fixed), every Superset dashboard using that model automatically reflects the change.
Here's how this stack typically looks on Google Cloud:
Data ingestion layer: Cloud Pub/Sub, Dataflow, or Cloud Composer (managed Airflow) ingests data and lands it in BigQuery raw tables.
Transformation layer: dbt runs on Cloud Run or Compute Engine on a schedule (e.g., nightly) or triggered by data ingestion events. dbt reads raw tables, applies transformations, and materializes modeled tables back into BigQuery.
Analytics layer: Apache Superset runs on Cloud Run or Compute Engine (or via a managed service like D23 - Dashboards, Embedded Analytics & Self-Serve BI on Apache Superset™). It connects to BigQuery using a service account and serves dashboards to end users.
Storage: All data lives in BigQuery. dbt models are stored in a Cloud Source Repository or GitHub. Superset configurations (dashboards, charts, users) can be backed up to Cloud Storage or version-controlled separately.
This architecture has several advantages:
You have two paths:
Self-hosted: You run Apache Superset yourself on infrastructure you control. This gives you maximum flexibility but requires you to manage updates, security patches, backups, and scaling. For small teams, this overhead is manageable. For teams scaling to hundreds of users or embedding analytics in products, it becomes a burden.
Managed: Services like D23 - Dashboards, Embedded Analytics & Self-Serve BI on Apache Superset™ handle infrastructure, updates, security, and scaling. You focus on building dashboards and embedding analytics. This is particularly valuable if you're embedding Superset into a product or need high availability and support.
For BigQuery and dbt, the self-hosted vs. managed distinction is less relevant. BigQuery is already managed by Google. dbt is a command-line tool that runs wherever you want (local machine, CI/CD pipeline, cloud VM). The main decision is where dbt jobs run and who manages that infrastructure.
Let's walk through a concrete example: building a customer metrics dashboard.
You have a raw_customers table in BigQuery with columns like customer_id, created_at, email, plan_type, and mrr (monthly recurring revenue). You also have a raw_events table with customer_id, event_type, event_date.
In your dbt project, you create models:
models/
staging/
stg_customers.sql -- clean and standardize customer data
stg_events.sql -- clean and standardize event data
marts/
dim_customers.sql -- customer dimension with attributes
fct_customer_metrics.sql -- fact table with customer KPIs
The dim_customers.sql model might look like:
select
customer_id,
email,
plan_type,
created_at,
date_diff(current_date(), date(created_at), day) as days_since_signup
from {{ ref('stg_customers') }}
where deleted_at is nullThe fct_customer_metrics.sql might aggregate events:
select
c.customer_id,
c.email,
count(distinct e.event_id) as total_events,
count(distinct case when e.event_type = 'login' then e.event_id end) as login_count,
max(e.event_date) as last_activity_date
from {{ ref('dim_customers') }} c
left join {{ ref('stg_events') }} e on c.customer_id = e.customer_id
group by 1, 2You add tests to ensure data quality:
models:
- name: dim_customers
columns:
- name: customer_id
tests:
- unique
- not_nullYou run dbt run and dbt compiles these to BigQuery SQL, creates the tables, and validates the tests. Your team reviews the dbt PR in GitHub, and once merged, the models are part of your source of truth.
The dbt Documentation covers modeling best practices, testing patterns, and advanced features like snapshots and incremental models.
In Superset, you add a new database connection:
Superset discovers your BigQuery datasets and tables. You select the fct_customer_metrics table as your data source.
In Superset, you create charts by selecting columns and metrics:
customer_id)total_events)created_at)email and login_count)You arrange these charts on a dashboard, add filters for plan_type and date range, and publish. End users can now explore customer metrics without writing SQL.
The Apache Superset Documentation covers dashboard creation, chart types, filters, and interactive features.
Looker, Tableau, and Power BI charge per user (or per named user, or per viewer). If you have 100 users, that's a meaningful recurring cost. BigQuery + dbt + Superset flips the model:
For a team with 100 users, this stack might cost $500-2000/month. Looker or Tableau would cost $5000-15000+/month. That's not a small difference.
With Looker, you're building in Looker's modeling language (LookML). With Tableau, you're building in Tableau's visual paradigm. With this stack, you're building in SQL (dbt) and open standards (Superset). If you need to switch visualization tools later, your dbt models are portable. Your data transformations are just SQL.
You own your data, your transformations, and your analytics layer. No vendor lock-in.
Your engineering team already uses Git, CI/CD pipelines, code review, and testing. dbt brings those practices to analytics. Your analytics code is reviewed like application code. Changes are tracked. Tests run automatically. This is how serious organizations operate.
Superset's API and extensibility mean you can embed analytics in your product, integrate with your identity provider (SAML, OAuth), and customize the UI to match your brand.
BigQuery is fast, but poorly written queries can still be slow and expensive. A query that scans 100 GB when you only need 10 GB is wasting money.
Solutions:
The BigQuery Documentation includes optimization guides and cost analysis tools.
If your dbt transformations are wrong, every dashboard built on top of them is wrong. If your data is stale, dashboards show outdated metrics.
Solutions:
The dbt Documentation covers testing strategies and incremental models in detail.
As your analytics footprint grows, you might have hundreds of dashboards and thousands of users. This puts pressure on Superset.
Solutions:
As you scale, you need to control who can see what data, who can modify dashboards, and who can access the underlying tables.
Solutions:
This stack is part of the broader "modern data stack" movement. The The Modern Data Stack blog post explains the history and philosophy: instead of monolithic data warehouses and BI tools, the modern stack is modular, cloud-native, and open-source-friendly.
Other popular combinations include:
The The Modern Data Stack: Open-source Edition provides a comprehensive analysis of open-source tools in the modern data stack, including dbt, Superset, and warehouse options.
The Top Modern Data Stack Tools for 2025 offers an updated overview of tools gaining traction in 2025, many of which complement or extend this stack.
The Modern Data Stack guide by Simon Späti dives deep into open-source tools and analytics pipelines, with practical examples.
One of the most exciting developments in analytics is AI-assisted querying. Instead of writing SQL or clicking a UI, users ask questions in natural language: "What's our churn rate by plan type?" An LLM translates this to SQL, which runs against your data.
Superset has built-in support for this through integrations with LLMs. You can use OpenAI's GPT, Anthropic's Claude, or open-source models like Llama. The LLM sees your dbt-generated schema documentation and generates SQL based on that.
This requires:
The benefit is massive: self-serve analytics becomes truly self-serve. Users don't need to know SQL or even how the data is organized. They ask questions and get answers.
If you're a B2B SaaS company, you might want to embed analytics dashboards in your product so customers can see their own data. Superset's API makes this straightforward.
You can:
Combined with row-level security, this lets you give each customer a personalized view of their data without building a custom analytics system.
If you're convinced this stack is right for you, here's how to get started:
BigQuery + dbt + Apache Superset is not a revolutionary combination. Each component is mature, well-documented, and battle-tested. The revolution is in the philosophy: a modular, open-source, cost-effective alternative to monolithic proprietary platforms.
For data and analytics leaders at scale-ups and mid-market companies, this stack offers:
The stack does require more setup and operational knowledge than a fully managed platform like Looker. You need to understand BigQuery, SQL, dbt concepts, and Superset architecture. But for teams that already have these skills (or are willing to learn), the payoff is substantial.
If you're evaluating analytics platforms, don't dismiss open-source just because it's not as polished out of the box. The long-term benefits—cost, flexibility, and control—often outweigh the short-term friction of setup and learning.
For teams that want the benefits of this stack without the operational overhead, D23 - Dashboards, Embedded Analytics & Self-Serve BI on Apache Superset™ provides managed hosting, expert consulting, and support. But whether you self-host or use a managed service, the architectural principles remain the same: modular, open, and data-driven.
Start small, learn the fundamentals, and scale as your needs grow. That's the modern way.