Learn agent memory patterns for persistent context in long-running workflows. Explore storage, retrieval, and architectural best practices for production AI systems.
When you deploy an AI agent to solve complex, multi-step problems—whether it's querying your data warehouse, generating dashboards, or orchestrating analytics workflows—the agent needs to remember what it has already learned, what decisions it has made, and what context it discovered along the way. Without persistent memory, each step starts from scratch. The agent forgets the user's intent, loses track of intermediate results, and cannot build on previous discoveries.
This is where agent memory patterns become critical. Unlike traditional stateless APIs that process a single request and discard context, long-running agents operate across multiple interactions, sometimes spanning hours or days. They need to maintain a coherent understanding of the problem domain, user preferences, data schemas, and the results of prior computations.
Think of it like a data analyst working on a complex project. On day one, they explore the schema, run test queries, and document findings. On day two, they build on that knowledge to create dashboards. Without notes, they'd have to re-explore everything. Agent memory is that notebook—a persistent store that lets the agent pick up where it left off, avoid redundant work, and make better decisions because it understands the full context.
Research from the AI agent community, including work on AI Agents Need Memory Control Over More Context, demonstrates that bounded, controllable memory systems are essential for agents operating beyond the limits of a single context window. Modern approaches like OpenSearch as an Agentic Memory Solution show how to architect memory systems that scale with agent complexity and interaction volume.
Large language models (LLMs) that power modern AI agents operate within fixed context windows—typically 4K, 8K, 32K, 100K, or 200K tokens depending on the model. A token is roughly a word or fraction thereof. Even with the largest models, a long-running workflow can accumulate far more context than fits in memory at once.
Consider a practical scenario: An AI agent embedded in your analytics platform needs to:
If you try to stuff all of this into the context window alongside the current user request, you'll quickly hit the limit. The agent will either lose critical information, respond slowly due to token overhead, or fail entirely.
Persistent memory solves this by externalizing context. Instead of keeping everything in the LLM's context, you store structured information in a database or vector store, and the agent retrieves only what it needs for the current task. This pattern is discussed extensively in Building Effective AI Agents: Architecture Patterns and Implementation Frameworks, which outlines how to manage state beyond context windows in dynamic workflows.
Agent memory is not monolithic. Different types of information require different storage and retrieval strategies. Understanding these categories helps you design systems that are both performant and maintainable.
Working memory is ephemeral context used within a single conversation or task. It includes:
Working memory typically lives in the LLM's context window or in a short-lived session store (Redis, in-memory cache, or a database with a TTL). It's fast to access, small enough to fit in context, and discarded when the task completes or the session ends.
Example: An agent is generating a dashboard. The user asks, "Show me revenue by region for Q4." The agent retrieves the revenue table schema (stored elsewhere), notes that it needs to filter by quarter, and keeps the current request in working memory while it constructs the SQL. Once the dashboard is generated, that working memory is no longer needed.
Episodic memory captures the sequence of events within a session—what the user asked, what the agent did, what the results were. This is essential for:
Episodic memory is typically stored in a database with a session or conversation ID. It persists longer than working memory (days or weeks) and is searchable. Many teams use PostgreSQL, MongoDB, or specialized conversation stores for this.
Example: A user asks the agent to create a KPI dashboard. The agent runs three queries, encounters an error on the second, corrects it, and succeeds. All of this—the queries, the error, the correction—is logged in episodic memory. If the user returns a week later and asks about that dashboard, the agent can retrieve the history and understand what was built and why.
Semantic memory is stable, domain-specific knowledge that transcends individual sessions. It includes:
Semantic memory is long-lived, shared across users, and rarely changes. It's often stored in a vector database (for semantic search) or a relational database with indexing for fast retrieval. This is the kind of knowledge that makes an agent smarter over time.
Example: Your agent learns that the customers table has a signup_date field that is sometimes NULL for legacy accounts, and that queries should filter these out. This knowledge is stored in semantic memory. Every future agent instance benefits from this learning without having to rediscover it.
Procedural memory captures patterns, strategies, and methods that work. It includes:
Procedural memory is often implicit in the agent's behavior, but making it explicit—storing it as templates, examples, or decision rules—improves consistency and performance. Building Agentic AI Workflows with MCP, AgentCore, and Bedrock discusses how to structure tool definitions and memory systems to enable effective procedural learning.
Once you've categorized your memory, you need to store it somewhere. Different backends have different tradeoffs.
Relational databases are reliable, queryable, and familiar. They're excellent for:
Tradeoffs: Relational queries are precise but can be slow for unstructured text search. You'll need to add full-text search indexes or use an external search engine for semantic queries.
Vector databases excel at semantic search—finding similar concepts even if the exact words don't match. They're ideal for:
Tradeoffs: Vector databases require embedding your text (converting it to numerical vectors), which adds latency and cost. They're less precise for structured queries and harder to update without re-embedding.
Graph databases are powerful for representing relationships—between tables, metrics, users, and concepts. They're useful for:
Tradeoffs: Graph databases require careful schema design and can be overkill for simple use cases. Query performance depends heavily on graph traversal patterns.
Search engines combine the best of relational and vector approaches. As noted in OpenSearch as an Agentic Memory Solution, modern search platforms like OpenSearch 3.3 provide native support for agentic memory systems, enabling both full-text and semantic search alongside structured metadata.
Tradeoffs: Search engines require operational overhead (indexing, cluster management) but provide excellent flexibility for hybrid queries.
For working memory and frequently accessed data, in-memory caches are unbeatable. They're fast, simple, and ideal for:
Tradeoffs: In-memory caches are volatile (data is lost on restart) and limited by RAM. They require careful eviction policies to avoid memory bloat.
With storage backends in mind, let's look at common architectural patterns for agent memory. These patterns determine how your agent stores, retrieves, and updates memory.
RAG is the most common pattern for adding persistent memory to agents. The flow is:
This pattern is straightforward and works well for read-heavy workloads (analytics queries, dashboard generation). The challenge is deciding what to retrieve and ensuring you retrieve enough context without overwhelming the LLM.
Example in D23: When a user asks your embedded analytics to "Show me revenue by region," the agent retrieves the relevant schema definitions, past queries about revenue, and business logic around revenue calculations. It augments the request with this context, generates a better SQL query, and stores the new query in episodic memory for future reference.
RAG assumes memory is read-only within a task. But agents often need to update memory as they learn. The memory update pattern adds explicit write operations:
This requires careful design to avoid memory corruption and inconsistency. You need:
Not all memory is equally important. Hierarchical memory organizes information by relevance and specificity:
The agent retrieves from Level 1 first, falls back to Level 2 if needed, and rarely accesses Level 3. This balances performance with cost.
Example: Current session history is in cache (hot). Schema definitions and recent insights are in a vector database (warm). Historical conversation logs from six months ago are in a data warehouse (cold).
Instead of storing only the final state, transcript replay stores the full sequence of events and reconstructs state on demand. This is inspired by event sourcing and is discussed in AI Agents Need Memory Control Over More Context.
This pattern is robust (you never lose information) but can be slow (replaying a long transcript takes time). It's often combined with periodic snapshots: store the full transcript, but also store compressed summaries at intervals.
Long-running workflows can accumulate unbounded memory. The bounded context pattern explicitly limits memory size:
This prevents memory bloat but requires careful policy design to avoid losing critical information. Research on AI Agents Need Memory Control Over More Context explores bounded memory systems for exactly this reason.
Moving from theory to practice requires attention to reliability, security, and performance.
When multiple agent instances or users interact with the same memory, consistency becomes critical. You need to decide:
For analytics workflows, eventual consistency is often acceptable (a dashboard built from slightly stale schema knowledge is still useful). But for critical business logic, strong consistency may be necessary.
Memory can contain sensitive information: query results, user preferences, data schema details. You need:
The OWASP Top 10 for Agentic Applications 2026 highlights memory and context poisoning as a critical risk in agentic systems. An attacker who can corrupt memory can cause agents to make bad decisions. Implement validation, versioning, and audit trails to mitigate this.
Memory retrieval adds latency to every agent action. Strategies to minimize impact:
For embedded analytics in D23, latency matters. Users expect dashboards to load quickly. Memory retrieval should be sub-second, which means careful indexing and caching.
Persistent memory systems have costs: storage, retrieval queries, vector embeddings, and infrastructure. To manage costs:
Vector embeddings are particularly expensive (they require API calls or local compute). Consider whether every piece of memory needs to be embedded, or if keyword search is sufficient for some data.
As your system scales, you'll encounter challenges with multiple agents, multiple users, and distributed systems.
When multiple agents work on the same problem, they need to coordinate. Shared memory enables this:
This requires strong consistency (all agents must see the same view) and careful conflict resolution. Tools like Memory in LangGraph provide patterns for multi-actor agent workflows with persistent state.
In large organizations, you might have:
This requires federation—a way to query and update memory across boundaries while maintaining security and consistency. It's complex but necessary for scaling agent systems across large organizations.
Long-running workflows generate enormous amounts of memory. Compression keeps it manageable:
The challenge is doing this without losing critical information. LLMs are good at summarization, but they can miss nuances. Hybrid approaches (LLM summarization + human review for critical memory) are common in production systems.
Let's walk through a concrete example of agent memory in an analytics context, relevant to how D23 and similar platforms operate.
Scenario: A startup uses an embedded analytics agent to help non-technical users explore their product data. The agent needs to:
Memory Architecture:
Workflow:
If the user returns a week later:
This example shows how agent memory reduces latency, improves consistency, and enables learning. For platforms like D23 that embed analytics into products, this matters enormously—every second of latency affects user experience.
Based on research and real-world deployment, here are key best practices:
Don't build a complex multi-tier memory system from day one. Start with episodic memory (conversation logs) and working memory (session state). Add semantic memory and vector search once you understand your use case. Add hierarchical tiering and compression once you hit performance or cost problems.
Memory should be inspectable. Users and developers should be able to ask:
This requires storing memory in queryable formats (not just embeddings) and providing tools to inspect it.
When agents update memory, validate the update and track versions. This enables rollback if something goes wrong and helps debug agent behavior.
Memory can degrade over time. Implement monitoring to catch:
Set maximum memory sizes and define eviction policies. Make these policies explicit and tunable, not hidden in code.
Rare is the system that uses only one memory type. Most production systems combine episodic (conversation logs), semantic (knowledge base), and working memory (session state) in a carefully orchestrated way.
If you're building agents, you likely use frameworks like LangChain, LlamaIndex, or Anthropic's APIs. These frameworks increasingly support memory patterns:
Agent Capabilities from Anthropic covers building agents with memory systems for maintaining context and state across extended interactions. Similarly, Memory in LangGraph provides specific patterns for persistent state in multi-actor workflows.
When evaluating frameworks, check:
Frameworks that treat memory as a first-class citizen (not an afterthought) are easier to scale and debug.
Agent memory is not optional—it's foundational infrastructure for long-running, intelligent systems. Whether you're building autonomous analytics agents, embedding BI into your product, or orchestrating complex data workflows, you need to think carefully about how your agents store, retrieve, and update context.
The patterns discussed here—RAG, memory updates, hierarchical storage, bounded contexts, and multi-agent coordination—are proven approaches used in production systems. Start with the simplest pattern that solves your problem, measure performance and costs, and evolve as you learn.
For organizations using platforms like D23, which manages Apache Superset with AI and API-first design, understanding these patterns helps you architect better analytics workflows. When you embed self-serve BI or AI-powered analytics into your product, persistent agent memory becomes the difference between a system that learns and improves and one that starts from scratch every time.
The research community continues to advance memory systems for agents—Agent-Memory-Paper-List on GitHub provides a curated list of papers on this topic if you want to dive deeper. As these systems mature, they'll enable agents to solve increasingly complex problems, maintain richer context, and operate more reliably in production environments.
The key insight: in a world of bounded context windows and long-running workflows, memory is not a luxury—it's the engine that makes intelligent agents practical and reliable.