New: AI & text-to-SQL on your own SupersetBook a demo

AI Analytics18 Apr 2026

Why Agent Orchestration Beats Workflow Engines for AI-Native Analytics

Agent orchestration outperforms declarative DAGs for AI analytics. Learn why agentic systems beat workflow engines for real-time, adaptive data intelligence.

DTD23 Team

15 minutes read

The Declarative DAG Era Is Ending

For a decade, directed acyclic graphs (DAGs) dominated data orchestration. Tools like Airflow, Prefect, and dbt defined workflows as static, declaratively-mapped sequences: extract → transform → load → visualize. The model was clean, deterministic, and predictable. It solved a real problem—coordinating batch jobs across sprawling data stacks.

But AI-native analytics demands something different. When your analytics layer includes large language models (LLMs), real-time decision-making, dynamic schema discovery, and adaptive query optimization, a DAG becomes a bottleneck. You can't declare in advance what an AI agent will discover about your data, what questions users will ask, or how a text-to-SQL engine should route a query through your warehouse.

Agent orchestration—the runtime coordination of autonomous, goal-driven AI systems—is now the better primitive for analytics workloads that require intelligence, flexibility, and responsiveness. This isn't hype. It's a fundamental shift in how we build analytics infrastructure for teams that operate at scale.

Understanding the Core Difference: Declarative vs. Agentic

Before diving into why agents win, let's establish what we're comparing.

Declarative DAGs (the traditional model) define workflows as explicit, predetermined sequences. You specify every step, every dependency, every branching condition upfront. A DAG in Airflow is a Python script that says: "Task A runs, then Task B, then Task C, with retry logic here and notifications there." The entire execution path is known before the first task starts.

This approach has real strengths:

Predictability: You know exactly what will run and in what order.
Auditability: Every step is logged and traceable.
Resource planning: You can pre-allocate compute for known workloads.
Debugging: When something fails, the path is clear.

Agent orchestration inverts the model. Instead of declaring a fixed workflow, you define goals, constraints, and available tools. An agent—powered by an LLM or other decision-making system—autonomously decides which tools to use, in what order, and how to respond to runtime conditions. The agent observes the state of the system, reasons about what to do next, takes action, and repeats until the goal is reached.

A practical example: Instead of declaring "fetch user data from Postgres, join with events from Snowflake, aggregate by cohort, write to Redis," you give an agent access to a data catalog, query engine, and storage layer. The agent reads a user's natural language question, decides which tables to query, optimizes the joins based on cardinality, and streams results to the appropriate destination—all without a pre-written workflow.

Why DAGs Fail for AI Analytics

The friction between declarative workflows and AI-driven analytics shows up in three concrete ways.

1. Static Workflows Can't Adapt to Dynamic Queries

In a traditional BI stack, dashboards are pre-built. An analyst or engineer designs a dashboard in Tableau, Looker, or Power BI, and users interact with filters and drill-downs within that fixed structure. The data pipeline that feeds the dashboard is a DAG: it runs on a schedule, produces a known set of tables, and those tables power the visualizations.

But when you introduce AI—specifically, natural language query interfaces or text-to-SQL—the query surface explodes. A user asks, "Show me cohort retention by signup source for users acquired in Q3 2024." That's a question your dashboard might not have anticipated. A text-to-SQL agent needs to:

Parse the question and identify relevant tables.
Discover the schema and understand column relationships.
Generate and optimize the SQL.
Execute the query.
Format and return results.

None of these steps fit neatly into a pre-declared DAG. If you tried to encode every possible user question as a separate DAG task, you'd have infinite workflows. An agent, by contrast, handles the reasoning and adaptation at runtime. It's built for variability.

Tools like D23's managed Apache Superset platform integrate AI-powered analytics directly into the BI layer, allowing teams to ask questions in natural language without pre-building every query path. That flexibility is impossible with pure DAG orchestration.

2. Feedback Loops Require Real-Time Reasoning

DAGs are typically batch-oriented. They run on a schedule—every hour, every day—and produce static artifacts (tables, files, dashboards). The feedback loop is slow: you run a DAG, inspect the output, debug issues, and deploy a fix that runs in the next scheduled window.

AI agents operate in real-time feedback loops. An agent observes a state, acts, observes the result, and decides what to do next—all within a single execution. This is essential for analytics use cases like:

Anomaly detection with context gathering: An agent detects an anomaly in a metric. Rather than alerting a human and waiting for them to investigate, the agent autonomously queries related tables, computes breakdowns, and identifies the root cause—all in seconds.

Dynamic dashboard optimization: An agent monitoring query performance sees that a frequently-used dashboard is slow. It doesn't wait for the next DAG run to reindex tables. It immediately recommends schema changes, materializes intermediate results, or rewrites the query—and reports back to the user.

Portfolio-wide KPI reconciliation: For PE and VC firms using D23 for portfolio analytics, an agent can detect inconsistencies in reported metrics across companies, automatically query source systems for clarification, and flag discrepancies—all without a pre-written workflow.

These workflows require decision-making at runtime. A DAG can't do that; an agent can.

3. Schema Discovery and Catalog Reasoning Are Inherently Agentic

When you embed analytics into your product—whether that's a SaaS platform, a data app, or an internal dashboard—users bring their own schemas, table structures, and naming conventions. A DAG-based approach forces you to pre-map these structures: write a transformation for each source, register each table, maintain a static catalog.

An agent-based approach treats schema discovery as an ongoing, autonomous task. An agent can:

Query information schema tables to discover available tables and columns.
Infer relationships by analyzing foreign keys and naming patterns.
Ask clarifying questions when ambiguity arises ("Did you mean users_created_at or users_updated_at?").
Adapt its understanding as new tables are added or schemas evolve.

This is why research on agentic AI systems, such as the work covered in Anthropic's guide to building effective AI agents, emphasizes multi-agent orchestration for complex, evolving environments. A single agent handling catalog discovery, query generation, and result formatting can adapt far more gracefully to change than a static DAG.

Agent Orchestration: The Technical Advantages

Beyond the conceptual shift, agent orchestration delivers concrete technical wins for analytics.

Tool Use and Dynamic Routing

An agent is fundamentally a system that reasons about available tools and decides which to use. In analytics, tools might include:

A query engine (Postgres, Snowflake, BigQuery).
A search index for semantic table discovery.
A caching layer (Redis, Memcached).
A data catalog API.
A cost estimation service.
A materialization engine for expensive queries.

When a user asks a question, the agent doesn't blindly execute a pre-written query. It reasons: "This question requires a join across three tables. One of those tables has 100M rows. I should check if a materialized view exists. If not, I'll estimate the cost and decide whether to materialize on-the-fly or suggest a simpler query."

This routing is dynamic. The agent's decision changes based on data volume, query complexity, and system load. A DAG can't adapt this way without becoming a tangled mess of conditional branches.

Error Recovery and Graceful Degradation

When a DAG task fails, the entire workflow typically halts. You can set retry logic, but the default behavior is failure. An agent, by contrast, is designed to recover gracefully.

Example: A user asks for a report that requires joining data from Postgres and Snowflake. The Snowflake query times out. A well-designed agent doesn't fail; it adapts. It might:

Retry with a simpler query (fewer columns, narrower date range).
Suggest a pre-aggregated view that's available in Postgres.
Return partial results with a note that some data is unavailable.
Recommend a scheduled materialization for future queries of this type.

This resilience is critical for production analytics, especially when embedded in user-facing products. Users expect responsiveness, not timeouts.

Human-in-the-Loop Workflows

DAGs can include human approval steps, but these are typically blocking checkpoints: the workflow pauses and waits. Agents can integrate humans more fluidly. An agent can:

Ask clarifying questions when ambiguous.
Propose a query and ask the user to confirm before executing (useful for expensive queries).
Explain its reasoning ("I'm joining these tables because...").
Learn from feedback ("That result was wrong because I misunderstood column X").

For analytics teams, this means agents can support collaborative workflows where humans and AI work together, rather than humans waiting for pre-scripted automation.

Real-World Analytics Use Cases Where Agents Win

Let's ground this in concrete scenarios where agent orchestration outperforms DAGs.

Embedded Analytics in SaaS Products

Imagine you're building a SaaS platform for e-commerce companies. You want to embed analytics so your customers can analyze their own data without leaving your product. Each customer has a different schema, different metrics, and different questions.

With a DAG approach, you'd need to:

Onboard each customer's data warehouse.
Map their schema to a canonical data model.
Build customer-specific transformations.
Pre-build dashboards or query templates.

This is slow and doesn't scale. With agent orchestration, you:

Onboard the connection once.
Let an agent discover and understand the schema.
Empower users to ask questions in natural language.
The agent generates queries, optimizes them, and returns results.

No pre-built dashboards. No static transformations. Just an agent that understands the data and responds to questions. This is the model D23 enables through managed Apache Superset with AI-powered query generation and self-serve BI.

Portfolio Analytics for PE/VC Firms

Private equity and venture capital firms manage dozens or hundreds of portfolio companies, each with its own data infrastructure. They need consistent KPI reporting, but each company reports metrics differently.

A DAG approach requires building a centralized data warehouse and ETL pipelines for each company—months of work per integration. An agent-based approach is faster: deploy an agent to each company's data environment, give it access to their systems, and let it autonomously discover metrics, reconcile definitions, and report back to the fund's central dashboard.

When metrics don't match, the agent investigates. When a new company joins the portfolio, the agent onboards it automatically. When a metric definition changes, the agent adapts.

Real-Time Anomaly Detection with Context

A DAG can detect anomalies (a metric drops below a threshold), but it can't explain them. An agent can. When an agent detects an anomaly, it:

Queries related metrics to understand the scope (is it just this metric, or a broader issue?).
Breaks down the metric by dimension (geography, segment, user cohort) to isolate the problem.
Compares to historical patterns to assess severity.
Suggests potential root causes based on recent changes in the data.
Escalates to a human with a full context report.

All of this happens automatically, in seconds, without a pre-written workflow.

The Architecture: How Agent Orchestration Works

Understanding the technical architecture clarifies why agents are better suited for analytics.

The Agent Loop

At its core, an agent runs a loop:

Observe state → Reason about goal → Choose action → Execute → Observe new state → Repeat until goal reached

In analytics, this might look like:

Observe: User asks, "What was our churn rate by cohort last quarter?"
Reason: "I need to find churn definitions, identify cohorts, filter for last quarter, aggregate."
Choose: "I'll query the events table, join with user cohorts, compute churn by group."
Execute: Run the query.
Observe: Results returned. Is the answer complete? Did the query succeed?
Repeat: If the answer is incomplete or the query failed, try a different approach.

This loop is the opposite of a DAG. A DAG declares the entire path upfront; an agent discovers the path at runtime.

Multi-Agent Orchestration

For complex analytics workloads, multiple agents can coordinate. Research on AI orchestration platforms emphasizes that multi-agent systems outperform single-agent systems for complex workflows. In analytics, you might have:

A schema discovery agent that understands the data catalog.
A query generation agent that writes SQL.
A optimization agent that rewrites queries for performance.
A explanation agent that describes results to users.

Each agent is specialized, and they communicate through a shared state and message passing. This is far more modular and resilient than a monolithic DAG.

Tool Use and Function Calling

Agents interact with external systems through tools—defined APIs that the agent can call. In analytics, tools include:

query_warehouse(sql_query): Execute a SQL query.
estimate_cost(sql_query): Estimate query cost without running it.
list_tables(): Discover available tables.
get_table_schema(table_name): Retrieve column definitions.
materialize_view(query, name): Create a cached view for expensive queries.

The agent learns which tools are appropriate for different tasks and chains them together. A DAG, by contrast, has predetermined tool sequences.

Overcoming Agent Orchestration Challenges

Agent orchestration isn't a silver bullet. It introduces new challenges that you need to address.

Determinism and Reproducibility

DAGs are deterministic: given the same input, they produce the same output every time. Agents, especially those powered by LLMs, can be non-deterministic. An agent might generate different SQL for the same question on different runs.

To address this in production analytics:

Use deterministic models: For analytics, you often don't need cutting-edge LLMs. Smaller, more deterministic models work well for structured tasks like SQL generation.
Add validation: Have the agent validate its own outputs. Did the generated query run successfully? Do the results make sense?
Log reasoning: Capture the agent's reasoning process so you can debug non-deterministic behavior.
Cache decisions: For repeated queries, cache the agent's decision (the generated SQL) rather than re-generating it.

Research on agentic AI systems highlights the importance of deterministic orchestration, especially in high-stakes applications. Analytics is high-stakes—incorrect metrics can drive wrong business decisions.

Cost Control

Agents can be expensive. An agent that explores multiple query approaches, calls the cost estimation tool repeatedly, or iterates on results can rack up API costs and compute charges quickly.

Strategies for cost control:

Set budgets and limits: Define a maximum cost per query and have the agent respect it.
Prefer cached results: Check if a similar query has been run recently and reuse results.
Batch and schedule: For non-urgent queries, batch them and run during off-peak hours.
Use cheaper models: Reserve expensive LLMs for complex reasoning; use cheaper models for routine tasks.

Latency

Agents add latency because they reason about what to do before doing it. For interactive analytics, this can be a problem. A user asks a question and expects results in seconds, not minutes.

To minimize latency:

Optimize the reasoning loop: Use smaller, faster models for initial reasoning, then escalate to larger models if needed.
Parallelize tool calls: If an agent needs to call multiple tools (query two tables, estimate cost, check cache), do these in parallel.
Use streaming: Return partial results as they become available, rather than waiting for the complete result.
Precompute common patterns: For frequently-asked questions, precompute results and have the agent recognize when a new query matches a precomputed pattern.

Comparing Agent Orchestration to DAG Alternatives

Let's directly compare agent orchestration to the leading DAG-based alternatives for analytics.

Agent Orchestration vs. Airflow

Airflow is the dominant DAG orchestration tool. It's battle-tested, has a huge ecosystem, and scales to thousands of tasks.

For static, batch-oriented workflows, Airflow wins: predictability, auditability, resource planning.

For dynamic, AI-driven analytics, agent orchestration wins: adaptability, real-time reasoning, schema discovery.

In practice, many teams use both. Airflow handles the stable, batch ETL pipelines. Agent orchestration handles the dynamic, user-facing analytics layer. Tools like LangGraph provide stateful multi-agent orchestration that complements Airflow.

Agent Orchestration vs. dbt

dbt is a declarative transformation tool. You define data transformations as SQL, and dbt orchestrates the execution.

dbt excels at: version control for analytics code, testing, documentation, lineage tracking.

dbt doesn't handle: dynamic schema discovery, real-time query optimization, natural language interfaces, multi-step reasoning.

Again, teams often use both. dbt for the transformation layer (turning raw data into clean analytics tables), agent orchestration for the query and reasoning layer (turning tables into answers).

Agent Orchestration vs. BI Tools (Tableau, Looker, Power BI)

Traditional BI tools are excellent for pre-built dashboards and static reports. But they're not agents. They don't reason, adapt, or explore autonomously.

When you layer agent orchestration on top of a BI tool—as D23 does with Apache Superset—you get the best of both worlds: beautiful visualizations and interactive dashboards, plus AI-driven query generation and autonomous exploration.

Building Analytics Agents: Practical Guidance

If you're considering agent orchestration for your analytics stack, here's how to start.

Define Your Use Case

Agent orchestration isn't the answer for everything. Start with use cases where:

Queries are dynamic: Users ask questions you didn't anticipate.
Adaptation is valuable: The system benefits from real-time reasoning and feedback.
Automation is possible: The agent can autonomously gather information and make decisions.

Good candidates:

Natural language query interfaces.
Embedded analytics in SaaS products.
Anomaly detection with automated investigation.
Portfolio analytics with autonomous reconciliation.

Poor candidates:

Scheduled batch jobs with fixed outputs.
Simple metric calculations with no reasoning required.
One-off reports that don't need adaptation.

Choose Your Orchestration Framework

Several frameworks exist for building agentic systems. Research on agent orchestration platforms identifies key players:

LangGraph: A stateful orchestration framework from LangChain. Excellent for multi-agent systems and human-in-the-loop workflows.
AutoGen: Microsoft's framework for multi-agent conversations. Good for collaborative scenarios.
Crew AI: A newer framework focused on role-based agents. Useful for domain-specific agents (analyst, engineer, manager).

For analytics specifically, LangGraph is the most mature. It handles state management, tool use, and multi-agent coordination well.

Integrate with Your BI Platform

If you're using a managed BI platform like D23's Apache Superset offering, the integration is straightforward. The platform provides APIs for query generation, catalog access, and result formatting. Your agent orchestration layer sits on top, using these APIs as tools.

If you're building from scratch, you need:

A query engine (Postgres, Snowflake, BigQuery).
A catalog API (custom or from your data warehouse).
A caching layer (Redis, Memcached).
An LLM API (OpenAI, Anthropic, open-source).

Wire these together through an orchestration framework, and you have an agent-based analytics system.

Start Small and Iterate

Don't try to build a fully autonomous, multi-agent system on day one. Start with a single agent handling one task—e.g., natural language to SQL translation. Get it working reliably, measure latency and cost, then expand.

Common expansion paths:

Add a schema discovery agent.
Add query optimization.
Add anomaly detection.
Add multi-agent coordination.

Each step adds complexity, but you're building on proven foundations.

The Future: Agent Orchestration as the Standard

The shift from declarative DAGs to agent orchestration mirrors earlier transitions in software. Imperative programming gave way to declarative approaches (SQL, configuration management) because declarative systems are easier to reason about. Now, as workloads become more dynamic and intelligent, agentic approaches are winning because they're more adaptive.

For analytics, this shift is already underway. Teams building embedded analytics, natural language interfaces, and autonomous insights are adopting agent orchestration. Teams managing traditional batch ETL are sticking with DAGs. Both are right for their use cases.

But if you're building analytics for the next generation of users—users who expect AI-driven insights, natural language interfaces, and adaptive systems—agent orchestration is the foundation you need. It's not about replacing DAGs; it's about building a layer on top that brings intelligence and adaptability to your analytics.

Platforms like D23 are designed for this world. They combine the stability and power of Apache Superset with modern AI integration, API-first architecture, and agent-ready orchestration. The result is analytics infrastructure that scales with your team, adapts to your data, and responds intelligently to your questions.

The declarative DAG era was necessary and valuable. But for AI-native analytics, agent orchestration is the better primitive. The data and engineering leaders who recognize this shift early will build more responsive, more intelligent, and more scalable analytics systems.

Conclusion: Choosing the Right Tool for Your Analytics Stack

Declarative DAGs and agent orchestration are both valuable tools. The question isn't which is better in absolute terms—it's which is better for your specific use case.

Use DAGs for:

Scheduled, batch-oriented workflows.
Transformations with fixed inputs and outputs.
Data pipelines where predictability and auditability are paramount.

Use agent orchestration for:

Dynamic, user-driven queries.
Real-time reasoning and decision-making.
Adaptive systems that learn and improve over time.
Multi-step workflows with uncertain paths.

For modern analytics—especially analytics embedded in products or serving dynamic user queries—agent orchestration is the better choice. It's faster to build, more flexible to adapt, and more intelligent in execution.

If you're evaluating analytics platforms, look for those that support agent orchestration natively. If you're building your own, invest in orchestration frameworks that make agentic systems easy to develop and operate. The future of analytics is agentic, and the sooner you adopt these patterns, the faster you'll move.