Every developer using AI coding agents has experienced this:
You spend 20 minutes in session A teaching your agent a bug pattern — global regex lastIndex not resetting causes .test() to alternate between true/false in a loop. The agent gets it, fixes it, you’re happy.
Next day, new session. Same bug. The agent remembers nothing.
AI agents have no memory. Every conversation starts on a blank slate.
This isn’t a bug in any particular product. It’s a structural flaw in the entire paradigm.
The Problem Isn’t Retrieval — It’s Structure
There are plenty of “AI memory” solutions out there. Most follow the same playbook: embedding → vector DB → semantic search → top-K retrieval.
I tried them. Three problems:
-
One-shot retrieval is fragile — cosine similarity only finds superficially similar things. “Global regex lastIndex” and “
.test()alternating in loops” are two faces of the same problem, but embeddings may not connect them. -
Not explainable — vector similarity tells you “these two things are 0.87 similar” but not why. The agent gets a pile of similar documents with no context to judge which is actually relevant.
-
Too much infrastructure — a coding agent’s memory system needs Qdrant/Pinecone/ChromaDB? For what is essentially “help me remember lessons learned,” that’s over-engineering.
What I needed wasn’t a search engine — it was a notebook.
Zettelkasten: A 300-Year-Old Solution
Niklas Luhmann was a German sociologist who wrote 70 books and 400 papers in his lifetime. His secret weapon was a card box system — the Zettelkasten.
The core rules are simple:
- One idea per card (atomic)
- Cards connected via explicit links (bidirectional)
- Find entry points via keyword index, explore context along links
This method has been validated for decades in human knowledge management. My key insight: it maps almost directly to AI agents.
Three equivalent substitutions make this work:
| Human Zettelkasten | Agent Zettelkasten |
|---|---|
| Human brain understands card meaning | LLM understands card meaning |
Manually write bilinks [[related card]] | Agent auto-links after writing cards |
| Brain’s fuzzy memory “I think there’s a card about this” | memex search CLI as entry point |
The key trade-off: spend more LLM tokens for explainability and zero infrastructure.
LLMs generate multiple search keywords (like using a search engine) — stronger than vector similarity’s one-shot top-K because the LLM can adjust the next query based on previous results. Bilinks explicitly state “A and B are related because X” — more precise and human-readable than cosine similarity.
Tokens only get cheaper over time, so this trade-off becomes increasingly favorable.
Architecture: Two-Layer Separation
┌─────────────────────────────────┐
│ Smart Layer (Skills/Prompts) │ ← LLM logic lives here
│ recall / retro / organize │
├─────────────────────────────────┤
│ Protocol Layer (CLI) │ ← Pure data ops, no LLM
│ search / read / write / links │
├─────────────────────────────────┤
│ Storage │
│ ~/.memex/cards/*.md │ ← Plain markdown files
└─────────────────────────────────┘
The CLI is the data layer. Skills are the intelligence layer. The CLI has zero LLM dependencies — all LLM logic lives in skills.
Why not let skills manipulate files directly? The CLI exists not because the operations are complex, but because it’s the protocol between memory and agent. Claude Code calls the CLI through skills, Cursor calls it through MCP, humans call it directly from the terminal — all accessing the same ~/.memex/cards/ directory.
One memory store, accessible by any system. No lock-in.
Workflow: Three Automated Moments
1. Before a task — Recall
When the agent receives a new task, it automatically searches for relevant memory cards. Not stuffing all history into the context window, but exploring along bilinks to pull only truly relevant context.
2. After a task — Retro
The agent automatically distills lessons from the current task into atomic cards. Not dumping the entire conversation, but extracting insight — a reusable unit of cognition.
3. Periodically — Organize
Detect orphan cards, discover missing links, merge duplicates. A knowledge network needs maintenance, just like code needs refactoring.
The entire process can be fully automatic (via hooks) or manually triggered (memex recall, memex retro).
Cross-Platform: One MCP Server Covers All Clients
Distributing agent memory to different products requires distinguishing three layers:
- Tool protocol (MCP) — makes tools callable by the LLM. One MCP server covers Cursor, VS Code/Copilot, Windsurf, Codex.
- Instruction injection — teaches the LLM when to use tools and how to chain them. Key discovery: tool descriptions themselves are the most universal instruction carrier — every MCP client reads tool descriptions, no extra per-project config files needed. Zero config, works out of the box.
- Claude Code Plugin — the deepest integration. SessionStart hook for auto-recall, slash commands to trigger retro, skills to orchestrate the full workflow.
Why Markdown
Because markdown is the lingua franca of the developer world.
- Open
~/.memex/cards/in Obsidian to see the full knowledge graph - Search with
grep - Sync to any device with
git - Edit with any editor
- If memex dies tomorrow, your knowledge survives
No lock-in. No vendor dependency. Your knowledge is yours.
What I Learned
Building memex refreshed several of my assumptions:
1. LLMs are better search engines than embeddings (for this use case)
Embeddings do one-shot semantic matching. LLMs do iterative exploration — search, read, judge, adjust query, search again. The latter is clearly stronger, at the cost of a few hundred extra tokens. With token prices dropping exponentially, this trade-off becomes more favorable over time.
2. Explicit links >> implicit similarity
The explicit link [[js-global-regex-lastindex-trap]] tells the agent: this card relates to a regex bug. That carries an order of magnitude more information than “cosine similarity 0.87.”
3. The best infrastructure is no infrastructure
No database to maintain, no embedding model to deploy, no vector store to pay for. The filesystem + git is the infrastructure.
Try It
# Claude Code
npx memex@latest
# VS Code / Cursor / Codex / Windsurf (MCP)
npx @anthropic/memex-mcp
GitHub: github.com/iamtouchskyer/memex
Web Viewer: memra.vercel.app
The structure of the next era won’t be discovered — it’ll be built.