N°04 · Architecture · Memory systems

The Four-Tier Memory Model

How a personal AI should actually remember you.

The flattening problem

Most production AI memory in 2026 is one bag of vectors. A user says something interesting; an embedding model produces a 1,536-dimensional vector; the vector goes into a single table; later, a similarity search retrieves the top-k. This is the architecture you’ll find in nearly every “memory feature” shipped this year, including the headline products.

It works, until it doesn’t. Three failure modes show up at scale, and they show up faster than people expect:

Identity drift. The user’s name, role, deepest values get stored at the same weight as a passing comment about preferring oat milk. After a few hundred memories, similarity search starts surfacing the trivia and burying the identity. The AI gets worse at knowing who you are over time, not better.
Overwrite collisions. “I love Vue” said three years ago and “I now use Svelte” said this week occupy the same flat space. Without a mechanism to express durability, the model has no way to know which one is current. It picks whichever embedding happens to win the cosine race.
Procedural amnesia. “How I write commit messages” is fundamentally different in shape from “what happened on Tuesday.” A flat memory store treats them identically. The how-to gets recalled in conversational contexts where it’s irrelevant; the conversational beat gets recalled when you’re trying to do work.

The fix is not better embeddings. It is separating memories by kind before similarity ever happens. We use four tiers.

Tier 1 — Foundational

What it stores. Identity. Values. Core beliefs. Durable facts about the user that should almost never change without an explicit signal.

Examples from a real iCog account:

“User’s name is Parsa.” (foundational, identity anchor)
“User identifies as a spiritual seeker with a mission for Earth’s spiritual ascension.” (foundational, deepest stake)
“User runs three companies — Lexaplus, MoltJobs, CognitiveX.” (foundational, role)

Recall policy. Highest tier weight in any blended retrieval. Almost always surfaces in identity-relevant contexts (“who am I?” “tell me about the user”).

Write policy. Promote-only. A foundational memory is created either explicitly via the user (identify MCP method, or a strong assertion the cognition layer detects) or by the consolidation job after a fact has been reinforced repeatedly across tiers. Once foundational, it can be replaced but not casually overwritten — replacement needs a signal of equal strength.

The cost of getting this wrong. If foundational memories drift, the AI loses the user. Every identity-relevant question becomes a confabulation risk. A flat memory store has no such tier and is structurally vulnerable.

Tier 2 — Episodic

What it stores. Events. Sessions. The conversations themselves. Time-stamped, narrative.

Examples:

“On 2026-05-11 at 18:11, the user committed three tracked changes including the gitignore for sibling repos.”
“The user pushed back on the Seeker/Builder section in the founder essay during the May 11 review.”

Recall policy. Time-aware. Recency biased but not absolute — significance score (computed by the cognition layer) can promote an old episodic memory over a recent trivial one. When the user asks “what did I do last week,” this tier is the substrate; semantic memory should not appear unless explicitly relevant.

Write policy. Append-only. Episodic memories are immutable once written. They can be deduplicated by the consolidation job (collapsing five identical commits into “the user shipped 5 versions of the same fix”) but not edited.

The cost of getting this wrong. Without episodic separation, you cannot answer “when did this happen” or “what changed since last time.” The AI loses temporal grounding.

Tier 3 — Semantic

What it stores. Facts the user has come to believe. Decisions made. Preferences expressed. Conclusions drawn from episodic experience.

Examples:

“User has decided that the iCog launch positioning is ‘memory that travels with you,’ not the depth-first companion frame.”
“User uses Fraunces for display typography on premium publications.”

Recall policy. Strong on relevance, moderate on recency. Semantic memories are the current state of what the user thinks, so they should usually beat episodic memories in non-historical queries.

Write policy. Update-in-place when new evidence contradicts old. The cognition layer flags contradictions; the user (or the consolidation job) resolves them. Past versions are not deleted — they become episodic memories of the prior state.

The cost of getting this wrong. Without semantic separation, the AI’s view of “what the user currently thinks” gets polluted by ancient preferences. The user has to keep correcting the AI back to current state.

Tier 4 — Procedural

What it stores. How-tos. Workflows. Coding patterns. The user’s methods.

Examples:

“When committing, the user prefers conventional-commit prefixes (feat / fix / chore / docs / ux) and HEREDOC for multi-line messages.”
“User likes the cross-AI session-start pattern: identify → recall recent work → confirm task before starting.”

Recall policy. Surfaces in task contexts, not conversational ones. The cognition layer’s decision tier flags tasks vs. conversation and gates procedural recall accordingly.

Write policy. Append-with-versioning. Procedural memories accumulate variations (different ways the user has done a thing) and the consolidation job clusters them. The most-used variation wins recall ties.

The cost of getting this wrong. Without procedural separation, the user has to re-explain how they like things done in every conversation, even after twenty examples. This is the most-felt failure of flat memory; it is what breaks the “AI that knows me” promise in practice.

Abstract 3D vector space: a luminous central violet node with thin threads radiating outward to nearby nodes, others dim in the distance — suggesting weighted cosine recall. — The recall blend: cosine similarity, tier weight, recency decay, significance.

Where the math actually happens

Each tier uses the same underlying primitive: 1,536-dimensional Gemini Embedding 2 Preview vectors, indexed via HNSW on pgvector. The intelligence is in the blending function across tiers, not in the embeddings themselves.

A simplified version of the recall blend iCog uses:

score(memory, query) =
    cosine(memory.embedding, query.embedding)
  * tier_weight[memory.tier]
  * recency_decay(memory.timestamp, now, memory.tier)
  * significance[memory.id]

Where:

tier_weight favors foundational > semantic > procedural > episodic for identity-relevant queries; the order flips for historical queries
recency_decay is steep for episodic, gentle for semantic, almost flat for foundational
significance is updated by the consolidation job based on how often a memory is recalled and reinforced

The five chunk-classification rules used in iCog’s URL ingest pipeline are pure cosine math against this scoring function — five of six classes (already_known, reinforces, extends, novel_relevant, novel_orphan) require no LLM call. Only conflicts invokes a model. This is the LCM principle in practice: structure first, language last.

What it costs to ignore the tiers

Building this is not free. A four-tier model adds:

Schema complexity (tier column, tier-specific indexes, write-policy code per tier)
A consolidation job that runs nightly on a dedicated connection pool (we got bitten by this — the dream consolidator initially shared the user pool and starved API connections)
A blending function that has to be tuned per query type
Significance scoring that needs its own update path

The temptation to flatten is real. Every team has a “we’ll add tiers later” comment in their memory-system PR. Later doesn’t come; the flat store accumulates noise, and by the time you want to add tiers, you have to migrate years of data.

We made the call early. Eight months in, the four-tier separation is the single architectural decision we would not undo.

What you’re signing up for if you build it

The schema is the easy part. What you don’t see until you ship to real users:

Vector DB operations at scale. pgvector with HNSW is fine until you’re carrying 10M memories per million users and your reindex windows start eating write throughput. Your DBA conversations get very specific. Your maintenance_work_mem, your HNSW m and ef_construction, your read-replica routing — every one of them becomes a tuning surface.
Embedding model migrations. When the dimensionality of your embedding model changes — and it will, every model generation — you re-embed every memory you have, on a dedicated pipeline, with shadow writes and a cutover plan. The first time we did this it ran for weeks. There will be other times.
Cross-vendor MCP integration. Claude Desktop, ChatGPT, Cursor, Continue, Zed, Windsurf — each speaks MCP in slightly different ways and each breaks the integration in its own way on minor releases. Keeping “works in every major AI” alive is a full-time engineering posture, not a checkbox.
Enterprise integration. SOC 2, GDPR, residency, BYOK encryption, audit logging on every recall, role-based access on the memory tiers themselves. Each of these is a quarter of work before you can sell to a buyer over $500 a month. Add another quarter for the security questionnaire that follows.
Adversarial recall. Once your user memory is valuable, it becomes an attack surface — prompt injection trying to extract or pollute it. The cross-user cosine threshold protecting against memory leaks has been re-tuned three times in eight months, each after an attack class we hadn’t predicted.

The hard part is not the schema. It is the eight months of operational learning between the schema and a system you would let an enterprise buyer touch.

iCog has paid those bills. We aren’t going to say don’t try this at home. But if you are past prototype and putting memory in front of real users, that is the difference you are paying for.

Part of an ongoing series on the iCog cognitive architecture. See also: From LLM to LCM and The State of AI Memory — Q2 2026.

§ end