New: AI & text-to-SQL on your own SupersetBook a demo

Data Strategy18 Apr 2026

Multi-Agent Forecasting: Coordinating Time-Series Specialists

Learn how multi-agent forecasting systems coordinate specialized time-series models with AI routing for accurate predictions and reduced latency.

DTD23 Team

14 minutes read

Understanding Multi-Agent Forecasting Architecture

Multi-agent forecasting represents a fundamental shift in how organizations approach time-series prediction. Instead of relying on a single monolithic model to handle all forecasting scenarios, a multi-agent system deploys multiple specialized forecasting agents—each optimized for specific data patterns, seasonality characteristics, or business domains—and uses an intelligent router or supervisor agent to coordinate their efforts.

The core insight is simple: no single forecasting algorithm excels at every problem. ARIMA (AutoRegressive Integrated Moving Average) performs exceptionally well on stationary series with clear autocorrelation structure. Prophet, developed by Meta, shines when you have strong seasonal patterns and holiday effects. Machine learning-based approaches like XGBoost or neural networks capture complex nonlinear relationships. By deploying these as independent agents and routing each prediction task to the most appropriate specialist, you achieve both accuracy gains and operational resilience.

This architecture mirrors real-world organizational structures. A data science team doesn't assign every forecasting problem to one person; instead, they route demand forecasting to the supply chain specialist, financial projections to the FP&A analyst, and infrastructure capacity planning to the platform engineer. Multi-agent systems codify this specialization into software.

At D23, we've seen this pattern emerge consistently across mid-market and scale-up organizations building embedded analytics and self-serve BI platforms. When teams integrate AI-powered forecasting into dashboards or products, the ability to route queries intelligently—rather than force-fitting all time-series data through one model—becomes the difference between a system that scales and one that fails on edge cases.

The Routing Supervisor: Intelligence at the Center

The routing supervisor (or orchestrator) is the decision-making brain of a multi-agent forecasting system. Its job is to examine incoming data and determine which specialist agent should handle the prediction task. This routing decision directly impacts accuracy, latency, and computational cost.

A well-designed supervisor typically evaluates:

Temporal characteristics: Does the series exhibit strong seasonality, trend, or stationarity? A series with clear 52-week seasonality (retail sales) might route to Prophet, while a white-noise-like series routes to exponential smoothing.
Data volume and frequency: High-frequency data (minute-level stock prices) may route to neural network agents, while sparse quarterly data routes to statistical models that don't require massive training sets.
Forecast horizon: Short-term predictions (1-7 days) often benefit from autoregressive models, while long-term forecasts (1-2 years) may route to trend-based or causal models.
Domain context: A supervisor with access to metadata knows that "revenue" often has different patterns than "website traffic," and can apply domain-specific routing rules.
Historical performance: If you've previously forecasted this exact series, the supervisor can route to the agent that performed best last time.

Supervisors can be implemented as rule-based systems (if ACF > 0.7, route to ARIMA) or as learned models themselves. The most sophisticated implementations use large language models (LLMs) as supervisors. An LLM can read a time-series description in natural language—"quarterly SaaS revenue with increasing growth rate and a Q4 spike"—and route intelligently without explicit rules.

Google's approach to multi-agent forecasting, detailed in their multi-agent system for superior business forecasting, demonstrates this principle at scale. They combine data agents that prepare and analyze series characteristics with prediction agents that execute forecasts, coordinated by a supervisor that learns which agent performs best for each data pattern.

Specialist Agents: ARIMA, Prophet, and Machine Learning

Each specialist agent in a multi-agent forecasting system is a focused, optimized implementation of a forecasting algorithm. Let's examine the three most common specialists and when they excel.

ARIMA: The Statistical Foundation

AutoRegressive Integrated Moving Average (ARIMA) has been the workhorse of time-series forecasting for decades. It models a series as a linear combination of its own past values (the autoregressive component), differences of the series to achieve stationarity (the integrated component), and past forecast errors (the moving average component).

ARIMA excels when:

Your series is stationary or becomes stationary after differencing
You have clear autocorrelation structure that you can exploit
You need interpretability (the coefficients tell you how much past values influence the future)
You have limited data (ARIMA is parameter-efficient)
You're forecasting short to medium horizons with stable patterns

ARIMA struggles with:

Strong seasonality (though SARIMA adds seasonal components)
Exogenous variables or causal relationships
Nonlinear patterns
Multiple structural breaks or regime changes

In a multi-agent system, ARIMA typically handles baseline forecasts for well-behaved, stationary series. It's fast to train, interpretable, and provides a reliable fallback.

Prophet: Seasonality and Holidays at Scale

Prophet, released by Meta (formerly Facebook), was specifically designed for business time-series forecasting at scale. It decomposes a series into trend, seasonality, and holiday effects, making it particularly effective for retail, e-commerce, and SaaS metrics.

Prophet excels when:

Your series has strong, regular seasonal patterns (daily, weekly, yearly)
You have holidays or special events that create predictable spikes
You need to forecast multiple related series consistently
You have some missing data or outliers (Prophet is robust to these)
Your stakeholders need to understand and adjust forecasts manually

Prophet struggles with:

Series without clear seasonality
Very short histories (Prophet needs at least a few seasonal cycles)
Capturing complex nonlinear relationships
Handling multiple interacting causal factors

In a multi-agent system, Prophet is your go-to agent for consumer-facing metrics: website traffic, app downloads, subscription churn, and seasonal product demand. It's also valuable when business users need to understand why a forecast looks the way it does.

Machine Learning Agents: Capturing Complexity

Machine learning approaches—XGBoost, LightGBM, neural networks, and ensemble methods—excel at capturing complex, nonlinear patterns that statistical models miss. They can incorporate exogenous variables (like marketing spend, weather, or competitor pricing) directly into the model.

ML agents excel when:

Your series has nonlinear relationships or regime changes
You have exogenous features that causally influence the target
You have large amounts of historical data to train on
You're forecasting derived metrics that aggregate multiple signals
You need to capture interactions between multiple factors

ML agents struggle with:

Extrapolation far beyond the training data distribution
Interpretability (why did the model make that prediction?)
Overfitting to noise in small datasets
Cold-start problems (new products or series with little history)

In a multi-agent system, ML agents handle complex, multivariate forecasting. They're particularly valuable when you have rich feature sets: e-commerce platforms routing to ML agents for product-level demand forecasting, SaaS companies using ML agents for logo churn with account health signals, and financial services using ML agents for credit risk or fraud prediction.

Routing Logic and Decision Frameworks

The effectiveness of a multi-agent forecasting system hinges entirely on routing logic. Poor routing—sending a series to the wrong agent—undermines all the benefits of specialization.

Statistical Feature Extraction

A practical routing supervisor begins by extracting statistical features from the incoming time series:

Autocorrelation Function (ACF) and Partial ACF: Measures how much a series correlates with its own past values. High ACF suggests ARIMA will perform well.
Seasonality strength: Computed via STL decomposition or spectral analysis. Strong seasonality (>0.7) suggests Prophet.
Trend strength: Linear trend component extracted via decomposition. Strong uptrend or downtrend may suggest ML models that can capture acceleration.
Stationarity tests: Augmented Dickey-Fuller (ADF) test. Non-stationary series may need differencing (ARIMA's domain) or trend-aware models.
Entropy and complexity: Measures of randomness in the series. High entropy suggests you need flexible models; low entropy suggests simple models suffice.
Missing data percentage: High missingness may route to Prophet (which handles gaps) rather than ARIMA.
Forecast horizon: Longer horizons may route to trend-based models; shorter horizons to autoregressive models.

These features create a decision tree or classifier that routes each series to the appropriate agent. A simple rule might be:

If seasonality_strength > 0.7 AND has_exogenous_features == False:
  Route to Prophet
Else if ACF[1] > 0.8 AND stationarity == True:
  Route to ARIMA
Else:
  Route to ML_ensemble

LLM-Based Routing

More sophisticated systems use language models as supervisors. An LLM can read metadata about a time series—"quarterly revenue for a SaaS company with strong Q4 seasonality and accelerating growth"—and reason about which agent to use without explicit rules.

Research on agentic frameworks like MoiraiAgent, which integrates contextual signals and uses LLM for expert selection in time-series forecasting, demonstrates that LLMs can effectively route between specialized forecasting agents. The LLM acts as a reasoning layer, weighing the characteristics of the data against the known strengths and weaknesses of each agent.

LLM-based routing offers several advantages:

Flexibility: You can add new agents or change routing rules without rewriting code; you just update the LLM's instructions.
Interpretability: The LLM can explain its routing decision in natural language.
Generalization: The LLM can handle novel data patterns it hasn't seen before by reasoning from first principles.
Integration with dashboards: An LLM supervisor can be queried via natural language, making it suitable for embedding in self-serve BI platforms like D23.

Real-World Implementation Patterns

Ensemble Aggregation

Once each specialist agent generates a forecast, the supervisor must decide how to combine them. Simple averaging often underperforms; instead, practical systems use weighted averaging or meta-learning.

Weighted averaging assigns higher weights to agents with proven track records. If Prophet has historically forecasted retail revenue with 15% MAPE (Mean Absolute Percentage Error) and ARIMA with 22% MAPE, the final forecast weights Prophet at 60% and ARIMA at 40%.

Meta-learning trains a secondary model to learn optimal weights for each agent based on the characteristics of the current series. This is more sophisticated but requires historical forecast accuracy data.

Confidence intervals from each agent can be combined using Bayesian methods. If one agent produces a very narrow interval (high confidence) and another produces a wide interval (low confidence), the final interval reflects this uncertainty distribution.

Cold-Start and Sparse Data Handling

New products, new markets, or sparse data present a challenge: you don't have enough history to train complex models reliably. A well-designed multi-agent system routes these cases strategically.

When data is sparse (< 50 observations):

Route to ARIMA, which is parameter-efficient
Route to Prophet if seasonality is known from domain knowledge
Avoid ML agents unless you have exogenous features
Consider external data: if you're forecasting a new product, use similar product history

When data is completely new (< 10 observations):

Use domain-based priors (e.g., "new SaaS products typically grow 10-15% month-over-month")
Route to statistical models that can incorporate prior beliefs
Combine with judgmental forecasts from domain experts
Plan to retrain as more data arrives

Incremental Learning and Adaptation

Time-series forecasting is never "done." As new data arrives, models degrade, and agents must adapt. A production multi-agent system includes mechanisms for incremental learning:

Periodic retraining: Retrain all agents weekly, daily, or in real-time depending on data velocity and stability.
Performance monitoring: Track each agent's accuracy on recent data. If Prophet's accuracy drops from 15% to 25% MAPE, investigate why (regime change? missing seasonality adjustment?).
Adaptive routing: Adjust routing weights based on recent performance, not just historical averages.
Anomaly detection: Flag when a series exhibits unusual behavior that might require manual review or special handling.

Integration with Analytics Platforms

Multi-agent forecasting systems are most valuable when integrated into analytics and BI platforms where business users can access predictions without building models themselves. This is where platforms like D23, built on Apache Superset, become critical.

Embedding Forecasts in Dashboards

A self-serve BI platform can expose multi-agent forecasting as a dashboard feature. Users select a metric, choose a forecast horizon, and the platform automatically:

Extracts the time series from the data warehouse
Routes to the appropriate specialist agent
Generates a forecast with confidence intervals
Displays the result alongside historical data

This requires tight integration between the forecasting system and the BI platform's query engine. The D23 platform architecture supports this through API-first design and MCP (Model Context Protocol) server integration, allowing forecasting agents to be called directly from dashboard queries.

Text-to-SQL and Natural Language Routing

Advanced platforms use text-to-SQL capabilities combined with multi-agent routing. A user asks: "Forecast next quarter's revenue with uncertainty bounds." The system:

Parses the request to identify the metric (revenue) and horizon (next quarter)
Generates SQL to extract historical revenue data
Analyzes the time series to route to the appropriate agent
Executes the forecast
Returns results in natural language and visualization

This pattern aligns with research on TimeSeriesScientist, an LLM-driven agentic framework for time-series forecasting with tool-augmented reasoning. By combining language understanding with forecasting agents, platforms enable non-technical users to access sophisticated predictions.

API-First Architecture

For embedded analytics use cases—forecasts embedded directly in product UIs—a multi-agent system must expose forecasting as an API. A SaaS platform might call:

POST /api/forecast
{
  "metric": "churn_rate",
  "series": [0.02, 0.019, 0.021, ...],
  "horizon": 30,
  "confidence_level": 0.95
}

The API routes internally to the appropriate agent, returns a forecast with confidence intervals, and logs performance for future routing optimization. This enables embedded analytics—forecasts shown directly to customers in your product—without exposing the complexity of the underlying system.

Advanced Techniques: Foundation Models and Multimodal Agents

The frontier of multi-agent forecasting incorporates foundation models—large, pre-trained models that have learned patterns across millions of time series.

TimesFM and Foundation Models

TimesFM, Google's decoder-only foundation model for time-series forecasting, represents a new class of specialist agent. Unlike ARIMA or Prophet, which are trained on your specific data, foundation models are pre-trained on diverse time-series data and can generalize to new domains with minimal fine-tuning.

In a multi-agent system, TimesFM (or similar models like Chronos or NeuralForecast) serve as a universal "catch-all" agent that can handle any time series reasonably well. The routing logic becomes:

Route domain-specific patterns (strong seasonality, known holidays) to specialized agents (Prophet)
Route stationary, autocorrelated series to ARIMA
Route everything else to the foundation model

Foundation models excel at:

Transfer learning: They've already learned temporal patterns from diverse data
Few-shot forecasting: They can forecast series with minimal history
Multivariate forecasting: Many can handle multiple related series simultaneously
Robustness: They're less prone to overfitting than ML models trained on small datasets

Multimodal Agents with Contextual Signals

The most sophisticated multi-agent systems incorporate contextual signals beyond the time series itself. Research on FinVision, a multi-modal multi-agent framework for financial prediction, demonstrates this approach.

Contextual signals might include:

Structured metadata: Product category, geography, customer segment
News and events: Earnings announcements, competitor launches, regulatory changes
Causal features: Marketing spend, pricing changes, inventory levels
Market indicators: Macro conditions, competitor activity, seasonal indices

A multimodal agent system routes not just on time-series characteristics but on the full context. For example:

"Revenue for a new product launch" → Route to ML agent with launch event features
"Steady-state subscription revenue with no major changes" → Route to Prophet
"Volatile stock price with breaking news" → Route to foundation model with news embeddings

Measuring and Optimizing Multi-Agent Performance

A multi-agent system is only as good as its accuracy. Measuring performance and continuously optimizing routing and agent selection is essential.

Accuracy Metrics

Standard forecasting accuracy metrics include:

MAPE (Mean Absolute Percentage Error): Percentage error, useful for comparing across different scales. A MAPE of 15% means forecasts are off by 15% on average.
RMSE (Root Mean Squared Error): Penalizes large errors more heavily than small errors. Useful when outliers are costly.
MAE (Mean Absolute Error): Simple average absolute error, robust to outliers.
Coverage of confidence intervals: What percentage of actual values fall within the predicted confidence interval? Should be close to the stated confidence level (e.g., 95%).

For multi-agent systems, also track:

Per-agent accuracy: How does each specialist perform on the series it was routed to?
Routing accuracy: Did we route to the best agent, or would a different agent have performed better?
Ensemble benefit: Does combining agents beat the best single agent?

Backtesting and Walk-Forward Validation

Proper evaluation requires walk-forward validation: simulate real-world usage by forecasting historical periods, comparing predictions to actual outcomes, and measuring accuracy.

A robust backtesting process:

Split data: Use the first 80% for training, last 20% for testing
Walk forward: For each test period, train on all data up to that point, forecast the next period, observe the actual value
Track routing decisions: Log which agent was routed to for each forecast
Compute metrics: Calculate MAPE, RMSE, coverage for each agent and the ensemble
Analyze failures: When accuracy drops, investigate why. Did the series characteristics change? Did a new agent perform better?

Continuous Monitoring in Production

Once deployed, monitor forecasting performance continuously:

Automated alerts: If a metric's forecast error exceeds a threshold, alert the data team
Drift detection: Monitor whether time-series characteristics are changing (e.g., seasonality weakening, trend accelerating)
Agent performance tracking: Log accuracy for each agent on recent forecasts
Retraining triggers: Automatically retrain agents when performance degrades

Practical Considerations and Pitfalls

Computational Cost vs. Accuracy Tradeoff

Running multiple agents and aggregating forecasts costs more than a single model. In a production system:

ARIMA: ~10ms inference time, minimal memory
Prophet: ~50-100ms inference time, moderate memory
ML ensemble: ~100-500ms inference time, significant memory
Foundation model: ~500ms-2s inference time, high memory

For a dashboard with 1000 metrics forecasted daily, this compounds. A practical system might:

Use ARIMA as the default for all series
Route only "important" metrics (top 10% by business impact) to more expensive agents
Cache forecasts for stable series, recompute only when data changes significantly
Use approximate routing (sample 10% of series to determine agent selection, apply to all similar series)

Interpretability and Explainability

When forecasts drive business decisions (inventory, hiring, budgeting), stakeholders need to understand why. A multi-agent system can provide interpretability at multiple levels:

Agent level: Why did we route to Prophet? Because the series has strong weekly seasonality.
Model level: Prophet decomposes the forecast into trend (growing 5% monthly) and seasonality (20% boost on Fridays).
Ensemble level: The final forecast is 60% Prophet, 40% ARIMA, because Prophet has historically performed better on this metric.

LLM-based supervisors excel here because they can explain reasoning in natural language.

Handling Structural Breaks and Regime Changes

Time series often experience structural breaks: a product pivot, market disruption, or business model change that fundamentally alters the pattern. Multi-agent systems can detect and handle these:

Changepoint detection: Algorithms like PELT or binary segmentation identify when the series behavior changes
Adaptive routing: After a changepoint, retrain all agents on post-break data and re-evaluate routing
Regime-aware agents: Some agents (like Prophet's changepoint detection) explicitly model regime changes
Manual intervention: Alert stakeholders to review forecasts after detected changes

Conclusion: The Future of Forecasting

Multi-agent forecasting represents the maturation of time-series prediction from a specialized skill to an operational capability. By deploying specialized agents and routing intelligently, organizations achieve accuracy that no single model can match, while maintaining interpretability and operational resilience.

The convergence of three trends accelerates adoption:

Foundation models and LLMs: Pre-trained models that generalize across domains, reducing the need for extensive domain-specific tuning
Agentic AI frameworks: Tools and libraries that make building and coordinating multi-agent systems straightforward
Integration with analytics platforms: BI and analytics platforms that embed forecasting as a native capability

Organizations building self-serve BI systems—whether through platforms like D23 or custom implementations—can now expose forecasting as a first-class citizen. Users ask questions in natural language, the system routes to the appropriate agent, and predictions appear in dashboards with confidence intervals and explanations.

For data leaders evaluating managed analytics solutions, the ability to deploy multi-agent forecasting without building custom infrastructure is a significant advantage. Instead of hiring specialists to implement ARIMA, Prophet, and ML models separately, you can configure a system that coordinates them automatically.

The future of forecasting isn't a single powerful model; it's an ensemble of specialists, intelligently routed, continuously learning, and deeply integrated into the analytics workflows where decisions are made.