Agent orchestration outperforms declarative DAGs for AI analytics. Learn why agentic systems beat workflow engines for real-time, adaptive data intelligence.
For a decade, directed acyclic graphs (DAGs) dominated data orchestration. Tools like Airflow, Prefect, and dbt defined workflows as static, declaratively-mapped sequences: extract → transform → load → visualize. The model was clean, deterministic, and predictable. It solved a real problem—coordinating batch jobs across sprawling data stacks.
But AI-native analytics demands something different. When your analytics layer includes large language models (LLMs), real-time decision-making, dynamic schema discovery, and adaptive query optimization, a DAG becomes a bottleneck. You can't declare in advance what an AI agent will discover about your data, what questions users will ask, or how a text-to-SQL engine should route a query through your warehouse.
Agent orchestration—the runtime coordination of autonomous, goal-driven AI systems—is now the better primitive for analytics workloads that require intelligence, flexibility, and responsiveness. This isn't hype. It's a fundamental shift in how we build analytics infrastructure for teams that operate at scale.
Before diving into why agents win, let's establish what we're comparing.
Declarative DAGs (the traditional model) define workflows as explicit, predetermined sequences. You specify every step, every dependency, every branching condition upfront. A DAG in Airflow is a Python script that says: "Task A runs, then Task B, then Task C, with retry logic here and notifications there." The entire execution path is known before the first task starts.
This approach has real strengths:
Agent orchestration inverts the model. Instead of declaring a fixed workflow, you define goals, constraints, and available tools. An agent—powered by an LLM or other decision-making system—autonomously decides which tools to use, in what order, and how to respond to runtime conditions. The agent observes the state of the system, reasons about what to do next, takes action, and repeats until the goal is reached.
A practical example: Instead of declaring "fetch user data from Postgres, join with events from Snowflake, aggregate by cohort, write to Redis," you give an agent access to a data catalog, query engine, and storage layer. The agent reads a user's natural language question, decides which tables to query, optimizes the joins based on cardinality, and streams results to the appropriate destination—all without a pre-written workflow.
The friction between declarative workflows and AI-driven analytics shows up in three concrete ways.
In a traditional BI stack, dashboards are pre-built. An analyst or engineer designs a dashboard in Tableau, Looker, or Power BI, and users interact with filters and drill-downs within that fixed structure. The data pipeline that feeds the dashboard is a DAG: it runs on a schedule, produces a known set of tables, and those tables power the visualizations.
But when you introduce AI—specifically, natural language query interfaces or text-to-SQL—the query surface explodes. A user asks, "Show me cohort retention by signup source for users acquired in Q3 2024." That's a question your dashboard might not have anticipated. A text-to-SQL agent needs to:
None of these steps fit neatly into a pre-declared DAG. If you tried to encode every possible user question as a separate DAG task, you'd have infinite workflows. An agent, by contrast, handles the reasoning and adaptation at runtime. It's built for variability.
Tools like D23's managed Apache Superset platform integrate AI-powered analytics directly into the BI layer, allowing teams to ask questions in natural language without pre-building every query path. That flexibility is impossible with pure DAG orchestration.
DAGs are typically batch-oriented. They run on a schedule—every hour, every day—and produce static artifacts (tables, files, dashboards). The feedback loop is slow: you run a DAG, inspect the output, debug issues, and deploy a fix that runs in the next scheduled window.
AI agents operate in real-time feedback loops. An agent observes a state, acts, observes the result, and decides what to do next—all within a single execution. This is essential for analytics use cases like:
Anomaly detection with context gathering: An agent detects an anomaly in a metric. Rather than alerting a human and waiting for them to investigate, the agent autonomously queries related tables, computes breakdowns, and identifies the root cause—all in seconds.
Dynamic dashboard optimization: An agent monitoring query performance sees that a frequently-used dashboard is slow. It doesn't wait for the next DAG run to reindex tables. It immediately recommends schema changes, materializes intermediate results, or rewrites the query—and reports back to the user.
Portfolio-wide KPI reconciliation: For PE and VC firms using D23 for portfolio analytics, an agent can detect inconsistencies in reported metrics across companies, automatically query source systems for clarification, and flag discrepancies—all without a pre-written workflow.
These workflows require decision-making at runtime. A DAG can't do that; an agent can.
When you embed analytics into your product—whether that's a SaaS platform, a data app, or an internal dashboard—users bring their own schemas, table structures, and naming conventions. A DAG-based approach forces you to pre-map these structures: write a transformation for each source, register each table, maintain a static catalog.
An agent-based approach treats schema discovery as an ongoing, autonomous task. An agent can:
users_created_at or users_updated_at?").This is why research on agentic AI systems, such as the work covered in Anthropic's guide to building effective AI agents, emphasizes multi-agent orchestration for complex, evolving environments. A single agent handling catalog discovery, query generation, and result formatting can adapt far more gracefully to change than a static DAG.
Beyond the conceptual shift, agent orchestration delivers concrete technical wins for analytics.
An agent is fundamentally a system that reasons about available tools and decides which to use. In analytics, tools might include:
When a user asks a question, the agent doesn't blindly execute a pre-written query. It reasons: "This question requires a join across three tables. One of those tables has 100M rows. I should check if a materialized view exists. If not, I'll estimate the cost and decide whether to materialize on-the-fly or suggest a simpler query."
This routing is dynamic. The agent's decision changes based on data volume, query complexity, and system load. A DAG can't adapt this way without becoming a tangled mess of conditional branches.
When a DAG task fails, the entire workflow typically halts. You can set retry logic, but the default behavior is failure. An agent, by contrast, is designed to recover gracefully.
Example: A user asks for a report that requires joining data from Postgres and Snowflake. The Snowflake query times out. A well-designed agent doesn't fail; it adapts. It might:
This resilience is critical for production analytics, especially when embedded in user-facing products. Users expect responsiveness, not timeouts.
DAGs can include human approval steps, but these are typically blocking checkpoints: the workflow pauses and waits. Agents can integrate humans more fluidly. An agent can:
For analytics teams, this means agents can support collaborative workflows where humans and AI work together, rather than humans waiting for pre-scripted automation.
Let's ground this in concrete scenarios where agent orchestration outperforms DAGs.
Imagine you're building a SaaS platform for e-commerce companies. You want to embed analytics so your customers can analyze their own data without leaving your product. Each customer has a different schema, different metrics, and different questions.
With a DAG approach, you'd need to:
This is slow and doesn't scale. With agent orchestration, you:
No pre-built dashboards. No static transformations. Just an agent that understands the data and responds to questions. This is the model D23 enables through managed Apache Superset with AI-powered query generation and self-serve BI.
Private equity and venture capital firms manage dozens or hundreds of portfolio companies, each with its own data infrastructure. They need consistent KPI reporting, but each company reports metrics differently.
A DAG approach requires building a centralized data warehouse and ETL pipelines for each company—months of work per integration. An agent-based approach is faster: deploy an agent to each company's data environment, give it access to their systems, and let it autonomously discover metrics, reconcile definitions, and report back to the fund's central dashboard.
When metrics don't match, the agent investigates. When a new company joins the portfolio, the agent onboards it automatically. When a metric definition changes, the agent adapts.
A DAG can detect anomalies (a metric drops below a threshold), but it can't explain them. An agent can. When an agent detects an anomaly, it:
All of this happens automatically, in seconds, without a pre-written workflow.
Understanding the technical architecture clarifies why agents are better suited for analytics.
At its core, an agent runs a loop:
Observe state → Reason about goal → Choose action → Execute → Observe new state → Repeat until goal reached
In analytics, this might look like:
This loop is the opposite of a DAG. A DAG declares the entire path upfront; an agent discovers the path at runtime.
For complex analytics workloads, multiple agents can coordinate. Research on AI orchestration platforms emphasizes that multi-agent systems outperform single-agent systems for complex workflows. In analytics, you might have:
Each agent is specialized, and they communicate through a shared state and message passing. This is far more modular and resilient than a monolithic DAG.
Agents interact with external systems through tools—defined APIs that the agent can call. In analytics, tools include:
query_warehouse(sql_query): Execute a SQL query.estimate_cost(sql_query): Estimate query cost without running it.list_tables(): Discover available tables.get_table_schema(table_name): Retrieve column definitions.materialize_view(query, name): Create a cached view for expensive queries.The agent learns which tools are appropriate for different tasks and chains them together. A DAG, by contrast, has predetermined tool sequences.
Agent orchestration isn't a silver bullet. It introduces new challenges that you need to address.
DAGs are deterministic: given the same input, they produce the same output every time. Agents, especially those powered by LLMs, can be non-deterministic. An agent might generate different SQL for the same question on different runs.
To address this in production analytics:
Research on agentic AI systems highlights the importance of deterministic orchestration, especially in high-stakes applications. Analytics is high-stakes—incorrect metrics can drive wrong business decisions.
Agents can be expensive. An agent that explores multiple query approaches, calls the cost estimation tool repeatedly, or iterates on results can rack up API costs and compute charges quickly.
Strategies for cost control:
Agents add latency because they reason about what to do before doing it. For interactive analytics, this can be a problem. A user asks a question and expects results in seconds, not minutes.
To minimize latency:
Let's directly compare agent orchestration to the leading DAG-based alternatives for analytics.
Airflow is the dominant DAG orchestration tool. It's battle-tested, has a huge ecosystem, and scales to thousands of tasks.
For static, batch-oriented workflows, Airflow wins: predictability, auditability, resource planning.
For dynamic, AI-driven analytics, agent orchestration wins: adaptability, real-time reasoning, schema discovery.
In practice, many teams use both. Airflow handles the stable, batch ETL pipelines. Agent orchestration handles the dynamic, user-facing analytics layer. Tools like LangGraph provide stateful multi-agent orchestration that complements Airflow.
dbt is a declarative transformation tool. You define data transformations as SQL, and dbt orchestrates the execution.
dbt excels at: version control for analytics code, testing, documentation, lineage tracking.
dbt doesn't handle: dynamic schema discovery, real-time query optimization, natural language interfaces, multi-step reasoning.
Again, teams often use both. dbt for the transformation layer (turning raw data into clean analytics tables), agent orchestration for the query and reasoning layer (turning tables into answers).
Traditional BI tools are excellent for pre-built dashboards and static reports. But they're not agents. They don't reason, adapt, or explore autonomously.
When you layer agent orchestration on top of a BI tool—as D23 does with Apache Superset—you get the best of both worlds: beautiful visualizations and interactive dashboards, plus AI-driven query generation and autonomous exploration.
If you're considering agent orchestration for your analytics stack, here's how to start.
Agent orchestration isn't the answer for everything. Start with use cases where:
Good candidates:
Poor candidates:
Several frameworks exist for building agentic systems. Research on agent orchestration platforms identifies key players:
For analytics specifically, LangGraph is the most mature. It handles state management, tool use, and multi-agent coordination well.
If you're using a managed BI platform like D23's Apache Superset offering, the integration is straightforward. The platform provides APIs for query generation, catalog access, and result formatting. Your agent orchestration layer sits on top, using these APIs as tools.
If you're building from scratch, you need:
Wire these together through an orchestration framework, and you have an agent-based analytics system.
Don't try to build a fully autonomous, multi-agent system on day one. Start with a single agent handling one task—e.g., natural language to SQL translation. Get it working reliably, measure latency and cost, then expand.
Common expansion paths:
Each step adds complexity, but you're building on proven foundations.
The shift from declarative DAGs to agent orchestration mirrors earlier transitions in software. Imperative programming gave way to declarative approaches (SQL, configuration management) because declarative systems are easier to reason about. Now, as workloads become more dynamic and intelligent, agentic approaches are winning because they're more adaptive.
For analytics, this shift is already underway. Teams building embedded analytics, natural language interfaces, and autonomous insights are adopting agent orchestration. Teams managing traditional batch ETL are sticking with DAGs. Both are right for their use cases.
But if you're building analytics for the next generation of users—users who expect AI-driven insights, natural language interfaces, and adaptive systems—agent orchestration is the foundation you need. It's not about replacing DAGs; it's about building a layer on top that brings intelligence and adaptability to your analytics.
Platforms like D23 are designed for this world. They combine the stability and power of Apache Superset with modern AI integration, API-first architecture, and agent-ready orchestration. The result is analytics infrastructure that scales with your team, adapts to your data, and responds intelligently to your questions.
The declarative DAG era was necessary and valuable. But for AI-native analytics, agent orchestration is the better primitive. The data and engineering leaders who recognize this shift early will build more responsive, more intelligent, and more scalable analytics systems.
Declarative DAGs and agent orchestration are both valuable tools. The question isn't which is better in absolute terms—it's which is better for your specific use case.
Use DAGs for:
Use agent orchestration for:
For modern analytics—especially analytics embedded in products or serving dynamic user queries—agent orchestration is the better choice. It's faster to build, more flexible to adapt, and more intelligent in execution.
If you're evaluating analytics platforms, look for those that support agent orchestration natively. If you're building your own, invest in orchestration frameworks that make agentic systems easy to develop and operate. The future of analytics is agentic, and the sooner you adopt these patterns, the faster you'll move.