From LLM to LCM
Why Large Cognitive Models will replace stateless chatbots.
The category mistake
When OpenAI released GPT-3 in 2020, the industry settled on a category name — Large Language Model — and built five years of product around it. The name was accurate for what existed at the time: a very large neural network that predicted text.
In 2026, that name has stopped describing what people actually want from AI.
The product most users now demand is not a language model. It is a system that knows them, remembers them, reasons about them over time, and behaves consistently across conversations. A pure LLM, by definition, can do none of those things. It is stateless. Each call is independent. Memory, identity, reasoning continuity — all of these are bolted on by application layers built around the model.
We need a new category. The name we use at iCog — and that we’ll argue for in this piece — is Large Cognitive Model, or LCM. Not because the name is clever, but because the architecture it implies is different in kind from what an LLM is.
What changes when memory is the substrate, not the wrapper
The LCM thesis can be reduced to one inversion:
In an LLM-centric system, the model is the substrate; memory is a wrapper around it. In an LCM-centric system, memory is the substrate, and the model is a renderer.
This is not a stylistic distinction. It produces different code, different schemas, different bottlenecks, different failure modes, and — crucially — different products.
In an LLM-centric architecture, every interaction starts at the model. A user query arrives, the application code stuffs whatever context it can find into the prompt, the model emits text, and the conversation ends when the context window does. Whatever “memory” exists is some combination of vector store, system prompt, and chat-history truncation. The model is asked to integrate the past, the present, and the persona — every single call. It does so with no introspection of its own state.
In an LCM-centric architecture, the user query first hits a structured cognitive state — what the user has said before, what was decided, what the system has come to believe about the user. The model is then called to render an output that is consistent with that state. The model becomes the language layer of a cognitive system, not the cognitive system itself. Swapping the model changes the prose; it does not change who the AI is to you.
If you can describe what an output should structurally contain without naming a model, the algorithm is in LCM territory. If your answer to “what does this system do?” is “whatever the model decides,” it is still LLM territory.
The four cognitive layers
The LCM architecture we have shipped at iCog runs in four layers. They are not novel in isolation — every one of them appears in the academic literature. The bet is that, integrated and exposed through MCP to any AI client, they create a different product category.
Layer 1 — Memory
Four memory tiers, each with a distinct write/read policy:
- Foundational — identity, values, core beliefs, durable facts. Highest weight in recall. Resists overwrite.
- Episodic — events, sessions, conversations. Time-stamped. The substrate of “what happened to me last Tuesday.”
- Semantic — facts, decisions, learned preferences. Frequently updated. The substrate of “what I believe.”
- Procedural — how-tos, coding patterns, workflows. The substrate of “how I do things.”
Each tier uses 1,536-dimensional embeddings from Gemini Embedding 2 Preview, indexed via HNSW on pgvector. Recall blends vector similarity with tier-specific weighting and time-aware decay.
Layer 2 — Cognition
The cognition layer reads memory and produces insights, reflections, and consolidations that themselves become memories. This is the layer that runs while you sleep.
- Reflection engine. Replays recent episodic memory, finds patterns (“the user keeps oscillating between two projects”), and writes those patterns to semantic memory.
- Dream consolidation. A separate asyncpg connection pool that runs nightly. It deduplicates, merges related memories, surfaces forgotten threads, and re-weights tiers based on what the user actually returns to.
- Personality vector. VAD mood (valence/arousal/dominance) plus 13 trait scores (curiosity, technical depth, contradiction comfort, humility, directness, etc.). Updated via interaction patterns, exposed via MCP for client introspection.
Layer 3 — Decision
A small, deterministic layer that decides what to do with the cognitive state before the model is called. Routes a request to the right tier of memory, picks a model based on task complexity, decides whether to invoke deep recall vs. shallow.
This is the layer that makes the LLM cheap to swap. Same decision logic, different rendering model.
Layer 4 — Collective
A federated cognition layer (still in research, not shipped) where memory, with explicit consent, can be shared across users and form collective intelligence — bug patterns shared by a team, decision patterns shared by a community.
Why LLMs alone hit a wall
Three structural limits push the architecture toward LCM whether you adopt the term or not:
-
Context windows are finite. Even at one-million tokens, you cannot stuff a year of personal history into every prompt. The decision of what to surface must be made before the model is called. That decision is a cognitive act; it does not belong inside the model.
-
Models are vendor-captive. Every walled-garden memory feature shipped in 2025–2026 (ChatGPT Memory, Anthropic memory, Gemini memory) is structured to keep your context inside their product. From the user’s perspective, this is not memory; it is lock-in dressed as memory. The LCM architecture exposes the cognitive state through an open protocol (MCP) so the user owns the substrate while the model can be replaced.
-
Models drift across releases. The same prompt to GPT-4-Turbo and GPT-4-Omni produces different behavior. If the AI’s identity lives in the model, identity drifts with every release. If identity lives in the cognitive state and the model is just rendering it, the AI stays you across model versions.
What “swap the model, change the prose” looks like in practice
This is the test we use internally. It is also the most concrete demonstration of the LCM thesis.
iCog routes different requests to different underlying models — a small fast one for low-stakes turns, a stronger reasoning model for the deep-recall path, and an arbiter that picks per-request based on task complexity, tier, and cost ceilings. The model varies; the cognitive state, the personality vector, the memory tiers, and the recall policies do not.
When we recently swapped one of those underlying models for a competitor entirely — different vendor, different family, different temperament — the prose tightened slightly, but no user noticed a change in who iCog was to them. That was the moment the architecture earned its name. If a model swap had changed the personality, the architecture would have failed its own test.
What this means for the next two years
Three predictions for 2026–2027 if the LCM frame is correct:
-
The “memory feature” arms race will plateau — there are only so many ways to dress per-vendor recall as a differentiator. The interesting product question shifts to whose memory it is.
-
MCP-native consumer products will multiply. Once the protocol is plumbing, the wedge moves to UX, trust, and ownership. We expect at least three serious consumer plays in 2027 with portability as the hero.
-
Cognitive architecture becomes the hiring keyword. Today, “AI engineer” mostly means prompt-and-pipeline plumbing. We expect a meaningful share of senior AI roles in 2027 to require fluency in tier-based memory, recall policies, consolidation jobs, and decision layers — the LCM stack.
What it costs to run this in production
The LCM frame reads as a clean architectural diagram. In production it is anything but. A few of the bills you don’t see in the diagram:
- Model routing under cost ceilings. Every request has to pick the cheapest model that can satisfy the cognitive state’s expectations of identity, tone, and depth. Get this wrong and you either burn margin on every cheap turn or you get a Conscious-tier customer hitting an Amnesiac-tier model.
- Identity invariants across model swaps. When the underlying model changes — vendor outage, price spike, deprecation — the personality vector has to re-render through the new model without the user noticing. That requires a regression test suite for behavior, not just for outputs. Most teams don’t have one.
- Memory hygiene across tier boundaries. When a semantic memory is upgraded to foundational (or downgraded), the consolidation pipeline has to do it without breaking active recalls. We’ve shipped fixes for race conditions there twice.
- Cross-vendor MCP compatibility. Every major AI client speaks MCP slightly differently and breaks the integration on minor releases. “Works in every major AI” is a posture, not a feature.
- Adversarial input through the rendering layer. Prompt injection through the LLM layer has to not be able to corrupt the cognitive state. The architecture has to draw a hard line between what the model says and what gets persisted as memory — and defend it.
The LCM frame is right. It is also expensive to run correctly. We aren’t going to say don’t try this at home, but if you’re past prototype, the cost of paying that bill yourself is what we’re built to absorb.
A small, unfashionable claim
We are not saying LLMs go away. They are the rendering layer of an LCM and a beautiful one. We are saying that calling a stateless predictor “the AI” — or pretending that stuffing a vector store next to it constitutes memory — has stopped describing what users want.
The system that knows you tomorrow will not be a Large Language Model. It will have one inside it.
Sources & further reading
- iCog cognitive-architecture research notes —
research/MASTER_PLAN_V4.md, May 2026 - Model Context Protocol — Wikipedia
- State of AI Memory 2026 — Mem0 blog
- Letta on context repositories
- iCog four-tier memory model — see The Four-Tier Memory Model