New: AI & text-to-SQL on your own SupersetBook a demo

Apache Superset18 Apr 2026

Apache Superset for Insurance Analytics: Underwriting to Claims

Build production-grade analytics dashboards for insurance underwriting, claims, and reinsurance ops with Apache Superset. Reduce time-to-insight.

DTD23 Team

18 minutes read

Apache Superset for Insurance Analytics: Underwriting to Claims

Insurance is a data-intensive business. Every premium quoted, every claim processed, and every reinsurance contract negotiated hinges on your ability to analyze risk, detect patterns, and make decisions fast. Yet many insurers still rely on fragmented reporting systems—spreadsheets, legacy BI platforms, and custom SQL queries buried in email threads.

Apache Superset changes that equation. Built as a lightweight, API-first data visualization and exploration platform, Apache Superset gives insurance teams the ability to build self-serve dashboards, embed analytics into operational workflows, and query data in real time without the bloat (or cost) of traditional enterprise BI tools.

This guide walks you through how to architect and deploy Superset across the insurance analytics lifecycle—from underwriting risk assessment through claims adjudication and reinsurance operations. We'll focus on concrete patterns, real-world data models, and the technical decisions that matter when you're scaling analytics in a regulated, data-sensitive industry.

Why Apache Superset for Insurance Analytics

Insurance companies face a unique analytics challenge: speed and compliance must coexist. You need dashboards that update in seconds, not hours. You need role-based access controls that respect underwriting confidentiality. And you need to avoid the licensing costs that make Tableau or Looker prohibitive when you're embedding analytics across dozens of business units.

Apache Superset addresses all three constraints. It's open-source, so you control the codebase and can audit it for regulatory compliance. It's lightweight—you can run it on standard Kubernetes clusters without dedicated infrastructure teams. And it's API-first, meaning you can embed dashboards directly into operational systems (claims platforms, underwriting portals, reinsurance management tools) without building custom integration layers.

For insurance teams specifically, Superset excels at:

Real-time underwriting dashboards that surface risk metrics, application volumes, and approval rates to underwriters and managers
Claims analytics that track claim status, reserve adequacy, fraud signals, and processing timelines
Reinsurance reporting that aggregates portfolio performance, loss ratios, and treaty profitability across multiple carriers
Regulatory reporting that feeds compliance dashboards and audit trails with minimal manual intervention
Self-serve exploration that lets business users slice claims data by peril, geography, and policy holder without waiting on the analytics team

The platform integrates seamlessly with the data infrastructure insurance companies already use—data warehouses like Snowflake or Redshift, data lakes on cloud object storage, and streaming pipelines that ingest real-time claims or quote data.

The Insurance Analytics Data Model: From Quote to Claim

Before you build dashboards, you need to understand the data. Insurance analytics spans three interconnected domains: underwriting, claims, and reinsurance. Each has its own fact tables, dimensions, and operational rhythms.

Underwriting Data: Risk Assessment and Premium Pricing

Underwriting is where the insurance business starts. An applicant submits a quote request—for auto, property, or commercial liability coverage. Your underwriting system collects application data: driver history, property characteristics, industry classification, loss history. Your underwriting team (or an automated model) evaluates that data against underwriting guidelines and issues a decision: approve at standard rates, approve with surcharges, or decline.

The core underwriting fact table tracks each application:

Application ID (unique identifier)
Quote Date (when the application was received)
Decision Date (when underwriting was completed)
Decision (approve, decline, refer)
Premium (the quoted or issued premium)
Risk Score (output from your underwriting model or manual assessment)
Underwriter ID (who made the decision)
Product Line (auto, home, commercial, etc.)

Dimensions join to this fact table:

Applicant Dimension: age, location, occupation, credit score, prior loss history
Risk Dimension: coverage type, limits, deductibles, special endorsements
Market Dimension: agent/broker, distribution channel, campaign source

With this structure, you can build dashboards that answer operational questions: How many applications are in queue? What's the average decision time by underwriter? Which product lines have the highest decline rate? What's the relationship between risk score and issued premium?

As described in the Underwriting Analytics Reference Architecture for Insurance, modern insurers are layering machine learning models on top of these data structures to automate and accelerate underwriting decisions. Superset can visualize both the inputs (application features) and outputs (model predictions, confidence scores) in real time, giving underwriting teams visibility into how automated models are performing.

Claims Data: Adjudication and Reserve Adequacy

When a policyholder experiences a loss, they file a claim. Your claims system ingests that claim, assigns it to an adjuster, and tracks it through investigation, liability determination, reserve estimation, and eventual closure or litigation.

The claims fact table is more complex because a single claim can have multiple transactions:

Claim ID (unique identifier)
Policy ID (which policy the claim is under)
Claim Date (when the loss occurred)
Report Date (when the claim was reported)
Adjuster ID (who is handling the claim)
Claim Status (open, closed, litigated)
Reported Reserve (initial estimate of liability)
Paid Amount (cumulative payments to date)
Outstanding Reserve (estimated future payments)
Peril (fire, theft, collision, etc.)
Claim Type (property damage, bodily injury, medical payments)

Dimensions include:

Policyholder Dimension: demographics, claim history, total exposure
Loss Dimension: location, cause code, injury type
Adjuster Dimension: caseload, approval authority, historical accuracy

Claims analytics dashboards typically focus on operational efficiency and reserve adequacy. How many claims are aging (open beyond 30, 60, 90 days)? What's the average time-to-closure by claim type? Are reserves adequate relative to historical payout patterns? Which adjusters have the highest accuracy in reserve estimation?

As noted in research on Machine Learning–Augmented ETL Pipelines for Fraud–Resistant Insurance Claims, insurers are increasingly using machine learning to flag potentially fraudulent claims in real time. Superset can surface these signals—anomaly scores, network graphs of related claims, historical comparison metrics—directly to claims investigators, accelerating fraud detection and reducing payout leakage.

Reinsurance Data: Portfolio Performance and Treaty Management

Reinsurers and insurers managing reinsurance programs need visibility into how their portfolio is performing against treaty terms. This involves aggregating claims data across multiple underlying policies and comparing actual loss experience to expected loss (premium × expected loss ratio).

The reinsurance fact table is typically built as a summary or derived table:

Treaty ID (unique identifier for the reinsurance agreement)
Underlying Claim ID (links to the original claim)
Claim Date (when the loss occurred)
Loss Amount (the portion of the claim that falls under the treaty)
Recovery (what the reinsurer will pay)
Treaty Type (quota share, excess of loss, stop loss)
Attachment Point (threshold at which the treaty kicks in)
Limit (maximum the treaty will pay)

Reinsurance dashboards typically track:

Loss Ratio: actual losses divided by premium earned
Combined Ratio: loss ratio plus expense ratio (a key profitability metric)
Incurred But Not Reported (IBNR): estimated claims that have occurred but not yet been reported
Treaty Utilization: how much of the treaty limit has been consumed

These metrics feed into pricing models for renewal, reserve adequacy assessments, and capital allocation decisions. Because reinsurance data is often sourced from multiple insurers or subsidiaries, a Data Warehouse for Insurance Company that consolidates and normalizes this data is critical—and Superset's ability to connect to centralized data warehouses makes it ideal for this consolidation layer.

Building Underwriting Dashboards with Superset

Let's build a concrete underwriting dashboard that an insurance manager would use to monitor daily operations.

The Underwriting Operations Dashboard

This dashboard should answer: Are we hitting our SLAs? Are underwriters productive? Are we approving the right risks?

Key Metrics:

Applications in Queue: count of applications not yet decided, segmented by product line and age
Average Decision Time: median time from quote receipt to decision, trended by week
Approval Rate: percentage of applications approved (vs. declined or referred), with targets by product
Premium per Approval: average issued premium, to spot pricing drift
Risk Score Distribution: histogram of risk scores assigned, to detect model drift

Filters and Dimensions:

Date range (to compare week-over-week or month-over-month)
Product line (auto, home, commercial)
Underwriter (to see individual performance)
Agent/Broker (to identify high-quality sources)
Risk score band (to analyze decision-making by risk tier)

Implementation in Superset:

You'd create a dataset that joins the application fact table to the underwriter and product dimensions. Then you'd build individual charts:

A metric card showing applications in queue (count of applications where decision_date is null)
A line chart showing decision time trend (x-axis: week, y-axis: median days from quote to decision)
A pie chart or stacked bar showing approval rate by product line
A scatter plot showing the relationship between risk score and issued premium (to spot underpricing)
A histogram showing risk score distribution

Each chart includes drill-down capability—click on a product line in the approval rate chart, and the dashboard filters to show only that product's metrics. Click on a specific underwriter, and you see that person's queue and performance.

Superset's SQL Lab feature lets underwriting managers write ad hoc queries without leaving the platform. If a manager wants to know "How many applications from California are in queue, and what's the average decision time?", they can write that query directly, get results in seconds, and save the query as a reusable dataset for future dashboards.

Self-Serve Underwriting Analytics

One of Superset's killer features for insurance is D23's managed Apache Superset approach to self-serve analytics. Rather than forcing all underwriting questions through the data team, you can publish datasets (pre-aggregated views of the application data) and let underwriting managers explore them.

For underwriting, this means publishing datasets like:

Applications by Product and Decision: raw application records with key fields (quote date, decision date, decision, risk score, premium)
Underwriter Performance: aggregated metrics by underwriter (applications processed, approval rate, average decision time, average premium)
Agent/Broker Performance: aggregated metrics by distribution channel (applications sourced, approval rate, average premium, claim frequency)

Underwriting managers can then create their own charts and dashboards from these datasets. A regional manager might build a dashboard showing applications by state and approval rate. A product manager might compare approval rates across different underwriting models. The data team publishes the data; the business owns the analysis.

This self-serve model reduces the analytics team's workload while empowering the business to answer its own questions faster.

Building Claims Analytics Dashboards with Superset

Claims operations is where speed and accuracy matter most. A claim sitting in an adjuster's queue for 90 days ties up capital (through reserves) and damages customer relationships. Dashboards need to surface bottlenecks and drive action.

The Claims Operations Dashboard

This dashboard should answer: Which claims need attention? Are we processing claims efficiently? Are reserves adequate?

Key Metrics:

Claims by Status: count of open, closed, and litigated claims
Aging Claims: count of open claims by age bucket (0-30 days, 30-60 days, 60-90 days, 90+ days)
Average Days to Close: median time from report date to closure, trended by month
Reserve Accuracy: actual payout divided by initial reserve, to spot over- or under-reserving
Fraud Flags: count of claims flagged by fraud detection model, approval rate of those flags

Filters and Dimensions:

Date range
Claim status (open, closed, litigated)
Peril (fire, theft, collision, etc.)
Adjuster
Policyholder segment (new vs. renewal, high-value vs. standard)

Implementation in Superset:

You'd create a dataset joining the claims fact table to adjuster and loss dimensions. Charts would include:

A stacked bar chart showing claims by status, with color coding (green for closed, yellow for open, red for litigated)
A horizontal bar chart showing aging claims by age bucket, sorted to highlight the 90+ day bucket
A line chart showing average days to close, trended by month, with a target line overlaid
A scatter plot showing reserve accuracy (x-axis: initial reserve, y-axis: actual payout), with a diagonal line representing perfect accuracy
A metric card showing total fraud flags and approval rate

Claims adjusters would also have access to a case-level dashboard. When they log in, they see their assigned cases sorted by age, with key metrics (reserve amount, days open, fraud score). They can click through to see detailed case notes, payment history, and related claims (to spot fraud rings).

Superset's ability to embed dashboards via API means you can embed this case-level view directly into your claims management system (Guidewire, Sapiens, or custom-built). Adjusters never leave their workflow; they get the analytics they need inline.

Real-Time Claims Monitoring

Many insurers now stream claims data in real time using platforms like Kafka or Confluent. As described in the article on Providing Real-Time Insurance Quotes via Data Streaming, real-time data pipelines enable faster decision-making in insurance operations.

Superset can consume this real-time data through connectors to streaming platforms or through frequent polling of your data warehouse. A dashboard showing claims filed in the last hour, sorted by potential severity (based on peril and limit), gives claims managers visibility into incoming volume and helps them allocate resources proactively.

You can also layer in Predictive Underwriting In A Nutshell concepts—predictive models that estimate claim severity or likelihood of litigation based on early signals. Superset can surface these predictions, helping claims managers triage cases and allocate experienced adjusters to high-risk claims.

Reinsurance and Portfolio Analytics with Superset

Reinsurance teams operate at a higher level of abstraction than underwriting or claims. They need to see portfolio-level metrics—loss ratios, combined ratios, treaty utilization—across multiple underlying books of business.

The Reinsurance Portfolio Dashboard

This dashboard should answer: How is our portfolio performing? Are we on track to hit our profitability targets? Which treaties are at risk of hitting limits?

Key Metrics:

Premium Earned: cumulative premium earned to date, by treaty and by underlying line of business
Losses Incurred: cumulative losses incurred (paid + reserve) to date
Loss Ratio: losses incurred divided by premium earned (target: 60-70% for most lines)
Combined Ratio: loss ratio plus expense ratio (target: <100% for profitability)
IBNR Estimate: estimated claims incurred but not yet reported, based on historical development patterns
Treaty Utilization: percentage of treaty limit consumed to date
Renewal Pricing: estimated renewal rate change based on current loss experience

Filters and Dimensions:

Treaty (to compare performance across reinsurance agreements)
Underlying line of business (auto, property, casualty, specialty)
Peril (to spot concentration risk)
Accident year (to compare cohorts)
Geography (to identify regional loss patterns)

Implementation in Superset:

Reinsurance data is typically more aggregated and slower-moving than underwriting or claims data, so you'd likely build this dashboard on a summary table updated daily or weekly. Charts would include:

A heatmap showing loss ratio by treaty and peril (rows: treaties, columns: perils, color intensity: loss ratio)
A waterfall chart showing the bridge from premium earned to net income (premium earned → losses incurred → expenses → net income)
A bar chart showing combined ratio by treaty, with a target line at 100%
A line chart showing treaty utilization trend over time, with a warning line at 80% of limit
A table showing IBNR estimates by accident year, to help reserve adequacy assessments

Reinsurance teams often need to present these metrics to external stakeholders (capital providers, rating agencies, brokers). Superset's ability to export dashboards to PDF or email them on a schedule means you can automate portfolio reporting.

Master Data Management and Data Quality in Superset

Insurance analytics is only as good as the underlying data. As covered in Master Data Management for Insurance Underwriting Accuracy, insurers must maintain clean, consistent master data (policyholder records, agent records, product definitions) to ensure accurate underwriting and claims decisions.

Superset doesn't manage master data directly, but it can surface data quality issues and integrate with master data management (MDM) systems.

Data Quality Monitoring in Superset

You can build dashboards that monitor data quality:

Null/Missing Values: count of records with missing critical fields (e.g., missing risk score in underwriting, missing adjuster assignment in claims)
Duplicate Records: count of potential duplicates (same applicant, same claim, same policy)
Outliers: values outside expected ranges (premium > 10x the mean, decision time > 365 days)
Freshness: time since last data load, to alert if pipelines are stalled

These dashboards help your data engineering team identify and fix data issues before they propagate downstream.

Integrating with MDM Systems

If your organization uses a dedicated MDM system (like Informatica or Talend), Superset can query the MDM system's output. For example, instead of querying raw applicant records, you query the MDM-validated applicant dimension. This ensures that all analytics are built on a single source of truth for master data.

API-First Analytics and Embedded Dashboards

One of Superset's biggest advantages for insurance is its API-first architecture. Rather than forcing users to log into a separate BI portal, you can embed dashboards directly into operational systems.

Embedding Dashboards in Claims Systems

Your claims management system (Guidewire, Sapiens, etc.) has its own user interface and workflow. Rather than asking adjusters to switch to a separate BI tool, you can embed a Superset dashboard directly into the claims system.

For example, when an adjuster opens a claim, a Superset dashboard embedded in an iframe shows:

Related claims (other claims from the same policyholder, other claims in the same geographic area, other claims with similar characteristics)
Fraud risk score and related fraud signals
Historical settlement patterns for similar claims
Reserve adequacy metrics

Superset's REST API lets you:

Authenticate the adjuster (using their existing claims system credentials)
Fetch the dashboard definition
Embed it in an iframe with pre-filtered data (claim ID passed as a parameter)
Update the dashboard in real time as new data arrives

D23's managed Apache Superset platform handles the infrastructure and API management, so your engineering team doesn't have to build and maintain a custom Superset deployment.

Embedding Analytics in Agent Portals

Insurance agents and brokers often have access to an online portal where they can submit applications, track claims, and view their business metrics. You can embed Superset dashboards in this portal to show agents their own performance:

Applications submitted and approval rate
Claims filed and claims frequency
Premium written and loss ratio
Comparison to peer agents

This self-service visibility helps agents identify opportunities (e.g., "I should focus on home insurance; my approval rate is higher there") and builds engagement.

AI and Text-to-SQL for Insurance Analytics

Superset's integration with large language models (LLMs) enables text-to-SQL capabilities—users can ask questions in natural language, and the system generates SQL queries automatically.

For insurance, this is powerful. An underwriting manager might ask: "How many applications from California are pending, and what's the average decision time?" The LLM translates that to SQL, runs the query, and returns results in seconds.

For claims, a manager might ask: "Show me claims from the last week that are over reserve, sorted by the biggest overages." The LLM generates the query, and the manager gets a results table.

This democratizes analytics—business users don't need SQL skills to explore data. They just ask questions in English.

D23's text-to-SQL capabilities are particularly useful for insurance because insurance data models are complex (many tables, many joins). The LLM can learn your schema and generate accurate queries even for sophisticated questions.

Cloud-Native Architecture for Insurance Analytics

As described in Cloud-Native Microservice Architectures for Insurance, modern insurers are moving to cloud-native architectures that enable scalability, resilience, and rapid deployment.

Superset fits naturally into this architecture:

Data Layer: Your data warehouse (Snowflake, Redshift, BigQuery) or data lake (Delta Lake, Apache Iceberg) sits in cloud object storage
Analytics Layer: Superset runs on Kubernetes, scaling horizontally as query load increases
Presentation Layer: Dashboards are served via REST API and embedded in web applications
Integration Layer: Superset connects to your data warehouse via standard SQL connectors; it integrates with identity providers (Okta, Azure AD) for authentication

This architecture is stateless, scalable, and easy to operate. You can spin up new Superset instances for different business units (underwriting, claims, reinsurance) without duplicating infrastructure.

Governance, Compliance, and Security in Insurance Analytics

Insurance is a regulated industry. Your analytics platform must support:

Role-Based Access Control (RBAC): Underwriters can see underwriting data; claims adjusters can see claims data; only authorized users can see sensitive data like fraud scores or reserve estimates
Audit Logging: Every query, every dashboard view, every data export must be logged for compliance audits
Data Masking: Personally identifiable information (PII) like Social Security numbers or driver's license numbers should be masked in dashboards
Encryption: Data in transit (TLS) and at rest should be encrypted

Superset supports all of these through:

Database-level RBAC: Connect Superset to your data warehouse, and use the warehouse's native RBAC to control which users can access which tables
Superset RBAC: Assign users to roles (Admin, Alpha, Gamma) and control which dashboards and datasets they can view
Row-Level Security (RLS): Filter data based on the logged-in user (e.g., adjusters see only claims assigned to them)
Audit Logs: Superset logs all user actions; export these logs to your SIEM for compliance monitoring
Data Masking: Use SQL views or database-level masking to hide sensitive columns

D23 provides additional governance features—single sign-on (SSO) integration, advanced audit logging, and compliance templates for insurance.

Performance Optimization for Insurance Analytics

Insurance dashboards can be slow if not optimized. Here's why:

Large Datasets: Claims tables can have millions of rows; querying them without aggregation is slow
Complex Joins: Underwriting dashboards often join application, underwriter, product, and market dimensions; this multiplies query complexity
Real-Time Requirements: Managers want dashboards to update in seconds, not minutes

Superset optimization strategies for insurance:

Pre-Aggregation

Instead of querying raw fact tables, query pre-aggregated summary tables. For example, instead of querying all claims and grouping by status, query a table that's already aggregated by status and updated hourly.

Superset's dataset feature lets you define these summary tables as datasets, then build dashboards on top of them. This dramatically improves query performance.

Caching

Superset caches query results. If two users run the same query within a short window, the second user gets cached results instead of re-running the query. For insurance dashboards that are viewed by many users (all underwriters viewing the same operations dashboard), caching can reduce query load by 10x.

Database Indexing

Work with your data warehouse team to index the columns used in dashboard filters and joins. For underwriting dashboards, index on quote_date, decision_date, product_line, and underwriter_id. For claims dashboards, index on report_date, claim_status, and peril.

Incremental Queries

For real-time dashboards, use incremental queries that only fetch new data since the last refresh. Instead of querying all claims, query claims where report_date >= yesterday. This keeps query time constant even as the claims table grows.

Implementation Roadmap: From Underwriting to Reinsurance

If you're implementing Superset across your insurance organization, here's a phased approach:

Phase 1: Underwriting (Months 1-3)

Build the underwriting operations dashboard (applications in queue, decision time, approval rate)
Publish underwriting datasets for self-serve analytics
Train underwriting managers to use Superset
Measure impact: reduction in average decision time, improvement in data-driven underwriting decisions

Phase 2: Claims (Months 3-6)

Build the claims operations dashboard (aging claims, reserve accuracy, fraud flags)
Embed case-level dashboards in your claims management system
Train claims adjusters to use embedded analytics
Integrate fraud detection model outputs into dashboards
Measure impact: reduction in days to close, improvement in reserve accuracy, fraud detection rate

Phase 3: Reinsurance (Months 6-9)

Build the reinsurance portfolio dashboard (loss ratio, combined ratio, treaty utilization)
Integrate with treaty management system
Automate portfolio reporting (daily/weekly email exports)
Train reinsurance team and external stakeholders to use dashboards
Measure impact: faster treaty renewal decisions, improved capital allocation

Phase 4: Advanced Analytics (Months 9-12)

Layer in predictive models (churn prediction, fraud prediction, reserve estimation)
Build text-to-SQL capabilities for ad hoc analysis
Expand self-serve analytics across the organization
Implement advanced governance and compliance features

Measuring Success: KPIs for Insurance Analytics

How do you know if your Superset implementation is working? Track these KPIs:

Underwriting:

Average decision time (target: reduce by 20%)
Approval rate (maintain or improve)
Premium per approval (ensure no pricing drift)
Underwriter productivity (applications processed per FTE)

Claims:

Average days to close (target: reduce by 15%)
Reserve accuracy (target: within 10% of actual payout)
Fraud detection rate (target: increase by 30%)
Customer satisfaction (claims processing speed)

Reinsurance:

Loss ratio accuracy (forecast vs. actual)
Treaty renewal cycle time (target: reduce by 25%)
Data-driven pricing decisions (percentage of renewals using analytics)

Organizational:

Analytics team efficiency (queries answered per analyst per week)
Self-serve adoption (percentage of dashboards created by business users)
Time to insight (from question to answer)

Conclusion: Superset as Your Insurance Analytics Foundation

Apache Superset is not a replacement for your data warehouse or your claims system. It's the visualization and exploration layer that sits on top of your data infrastructure and makes that data actionable.

For insurance companies, Superset solves a critical problem: the gap between data and decision-making. You have data—applications, claims, policies, transactions. But that data is locked in databases. Superset unlocks it, making it visible to underwriters, claims adjusters, reinsurance teams, and executives.

By building dashboards that span underwriting, claims, and reinsurance operations, you create a single source of truth for insurance analytics. You reduce decision time, improve data quality, and enable self-serve exploration. And because Superset is open-source and API-first, you avoid the licensing costs and integration headaches of traditional BI platforms.

If you're evaluating analytics platforms for your insurance organization, explore D23's managed Superset offering to see how you can accelerate your analytics roadmap without the platform overhead.