Every user who interacts with your AI agent more than once runs into the same wall. They've already told the agent their preferences, their constraints, their context. Now they're telling it again. And again. Because the agent doesn't remember.
Stateless agents — agents that treat every session as the first — are the norm today. There are good reasons for this. Memory introduces complexity, compliance risk, and infrastructure requirements that are genuinely hard. But there's also a lot of complexity avoided by simply not solving the problem, and I think the industry has been too comfortable with that avoidance.
In enterprise HR tech, the memory problem is concrete and consequential:
- A recruiter has preferences about evaluation criteria, communication style, role-specific benchmarks.
- A candidate has a history of interactions with the platform.
- The organization has context — what this team values, how they've calibrated similar roles — accumulated over time.
None of that should be re-entered every session. And none of it should be forgotten.
The Four Types of Agent Memory
A useful taxonomy, borrowed from cognitive science and adapted for LLM systems:
| Memory Type | What It Stores | Persistence | Implementation |
|---|---|---|---|
| Working | Current context window contents | Session only | Already in use everywhere |
| Episodic | Record of past interactions and events | Cross-session | Conversation history retrieval |
| Semantic | Learned facts and accumulated knowledge | Long-term | Vector DB + structured metadata |
| Procedural | Learned workflows and process patterns | Long-term | Distilled from repeated episodic experience |
Working memory (in-context) is the contents of the current context window. Fast, directly accessible, and temporary — it disappears when the session ends. This is the memory type everyone is already using. The challenge: it's bounded, expensive, and non-persistent.
Episodic memory is the record of past interactions and events. What happened in prior sessions? What did the user say, what did the agent do, what were the outcomes? Episodic memory is the foundation of continuity.
Semantic memory is the store of learned facts and accumulated knowledge. Not "what happened" (episodic) but "what is true." A recruiter's preference for a certain evaluation approach is a semantic memory — a fact learned from experience that should persist.
Procedural memory is the record of learned workflows and processes. How does this team handle a certain class of role? What's the standard evaluation sequence? Procedural memories are the distillation of repeated episodic experience into generalized process knowledge.
Why Stateless Agents Fail at Enterprise Tasks
In consumer contexts, the failure is obvious — users find it frustrating to repeat themselves. In enterprise contexts, the failure is structural.
Enterprise tasks are multi-session. A recruiting workflow unfolds over days or weeks: JD refinement, sourcing criteria calibration, batch screening with iterative feedback, final evaluation, scheduling. An agent that starts fresh each session can't participate in a workflow that spans sessions.
Calibration is cumulative. The recruiter's feedback — this score was too high, this criteria doesn't capture what we care about — is valuable training signal. An agent that can't remember calibration feedback can't improve. Every batch starts from scratch.
Organizational context is implicit. Much of what makes an evaluation correct for a specific company lives in the minds of people who've been doing this work for years. It surfaces in offhand comments, corrections, and preferences during sessions. An agent that can't accumulate this implicit context is structurally limited.
The Retrieval-Augmented Memory Pattern
The architecture that works at scale: a persistent memory store that the agent retrieves from at session start and writes to at session end.
The memory store has two components:
A vector database that holds embedded representations of past interactions, learned facts, and accumulated context. When a new session starts, the agent retrieves the most relevant memories based on current context — role, organization, task type — and loads them into working memory as a structured context block.
A structured metadata store that holds explicit, typed facts: user preferences, calibration settings, organizational parameters. These are retrieved by direct lookup rather than embedding similarity. If there's a stored preference for a specific rubric format, you don't need similarity search to find it.
The Memory Corruption Problem
The hardest unsolved problem in agent memory: how do you invalidate stale memories?
Facts change. A recruiter's preference might shift after hiring manager feedback. A job description might evolve between posting rounds. An organizational calibration from six months ago might be outdated.
Approaches that work, with tradeoffs:
TTL-based expiration. Memories have a defined time-to-live. After expiration, they're flagged as potentially stale and require confirmation. Fast to implement but crude — some memories should be permanent, some should expire quickly, and a uniform TTL doesn't capture this.
Confidence decay. Memories accumulate confidence scores based on recency and confirmation. Older, unconfirmed memories have lower confidence and are less aggressively applied. More nuanced than TTL, requires more infrastructure.
Explicit invalidation on contradiction. When a session produces information that directly contradicts a stored memory, the contradiction is detected, the old memory is flagged, and the agent surfaces the conflict for resolution. The most accurate approach but requires building contradiction detection — a non-trivial NLP problem.
We use a combination: confidence decay for general memories and explicit invalidation for high-consequence memories (calibration parameters, organizational evaluation criteria).
Privacy and Compliance: The Enterprise Non-Negotiable
For enterprise customers, the memory architecture is a compliance conversation before it's a product conversation.
Enterprise customers do not want their organizational data stored in a shared memory system accessible to other customers. Tenant isolation at the memory layer is table stakes:
- Every memory record tagged to its owning tenant
- Retrieval scoped to that tenant's data only
- Separate namespace in the vector store per tenant
- Memory operations gated behind tenant-scoped authentication
- Configurable retention policies per tenant
- Verified data purge on customer request
For SOC2 compliance, you also need to answer: who has access to the memory store? What is the retention policy? Can tenant data be completely purged on request? These are contract requirements, not engineering afterthoughts.
Procedural Memory as a Competitive Advantage
Episodic + Semantic Memory
Procedural Memory
Concretely: over time, an AI screening agent that accumulates procedural memories develops a distilled model of how a specific customer approaches hiring — what signals matter, what disqualifiers are common, where calibration shifts for different role types. That model improves the agent's default behavior in ways that don't require explicit teaching every session.
// key takeaway