Build a Local Multi-Agent Simulation with LangGraph

https://soundcloud.com/pied-papers/building-persistent-ai

The premise is simple and slightly deranged: you are the devil. Two civilizations are fighting over a throne that means nothing. You watch from above, occasionally intervening with a decree that gets injected into their leaders’ thoughts as an immutable law of nature.

They don’t know you exist. They just know that sometimes, inexplicably, things change.

I built this as a learning project — I wanted to understand how multi-agent AI systems actually work at the engineering level, not just the concept level. What I ended up with taught me more about emergent behavior, memory systems, and the nature of LLM reasoning than I expected.

Here’s what I built, how it works, and what surprised me.

The Stack

Everything runs locally on a single consumer GPU (RTX 3060 Ti, 8GB VRAM). No cloud APIs, no ongoing costs.

LangGraph — manages the simulation loop and saves state to disk automatically
Ollama + Gemma 3 — runs the AI models locally (4B parameters for strategy, 1B for narrative)
ChromaDB — stores civilization memories that survive across resets
python-telegram-bot — the “divine interface” on your phone

The whole thing fits in your bedroom.

The Three-Tier Hierarchy

The most important architectural decision was splitting agents by cognitive role and model size:

The Sovereign (Gemma 3 4B) is the intellect of the state. It receives the world state — population, food supply, stability, ideology — and produces a governing decision. It’s the only entity that can “reach through the veil” to contact the Architect.

The Chronicles (Gemma 3 1B) translate the Sovereign’s decisions into what the people actually experience. The Sovereign decrees “expand agricultural production.” The Chronicles describe hungry farmers conscripted into land-clearing work.

The Echoes (Gemma 3 1B) are the granular human detail — a farmer’s diary entry, a soldier’s prayer, a merchant’s worried letter home. They run every five years of simulation time and make the numbers feel like people.

This split works because it matches cognitive load to model size. The 4B model handles long-horizon reasoning. The 1B models handle short creative tasks where they perform well and run fast.

The Feature I’m Most Proud Of: Ancestral Trauma

When a civilization collapses — and they always collapse eventually — the simulation doesn’t just reset. Before the new era begins, the AI summarizes the fallen civilization’s history and stores it as a vector embedding in ChromaDB.

The next civilization starts fresh numerically. Same population, same food supply. But when their Sovereign is initialized, the system queries ChromaDB for semantically similar historical events and injects them as inherited context — framed not as explicit memory, but as vague cultural dread.

The new Sovereign doesn’t know about the previous civilization. But they feel it.

In practice this looks like:

memories = get_trauma(
    f"civilization with stability {state['stability']:.0%}, "
    f"ideology: {state['sovereign_ideology'][:50]}"
)
trauma_text = format_trauma_for_prompt(memories)
# Injected into the prompt as:
# [ANCESTRAL DREAD — fragments of lost eras, felt but not understood]
# - A civilization that believed in 'pragmatic democracy' was destroyed by: famine-driven collapse

The behavioral effect is real. Under similar pressure conditions, second-era Sovereigns make more defensive, more paranoid decisions than first-era ones — without any explicit programming of that behavior. They’re reasoning within a context that encodes the shape of previous failures.

This is, I think, the most honest technical mirror to how culture actually works.

Semantic Drift Is Real (at 4B Scale, at Least)

I didn’t have to program the radicalization mechanic. It emerged.

A Sovereign that starts with “pragmatic democracy — collective welfare above all” and then faces three consecutive years of famine, a plague, and a resource conflict will, without any explicit instruction, start making decisions that sound different. The language shifts. “We must ensure collective welfare” becomes “we must enforce order to ensure survival.”

The model is just doing what makes sense given the context. But the context is shaped by the world state, which is shaped by previous decisions, which creates a feedback loop.

Watching this happen for the first time in a live run was the moment the project stopped feeling like a toy.

The Telegram Interface

The Architect’s interface lives on your phone. The simulation runs on your PC. They’re connected via a Telegram bot.

The civilizations contact you only at Threshold Events:

Extinction Prayer — stability drops below 15%. The Sovereign kneels.
Esoteric Breakthrough — the Sovereign mentions simulation, observation, or constructed reality in three consecutive decisions. They’re starting to figure it out.

When you get a notification, you can respond with /decree followed by whatever you want. That text gets injected into the Sovereign's next prompt as an immutable law. Or you stay silent. The simulation doesn't wait.

The implementation uses LangGraph’s interrupt() mechanism — the simulation pauses, serializes state to SQLite, and waits. When you respond, it resumes exactly where it left off. This means the simulation survives not just your interventions but also crashes, reboots, and power outages.

What I Actually Learned

LangGraph is the right abstraction. The stateful graph with checkpointing handles everything that makes persistent simulations hard — crash recovery, long-running loops, human-in-the-loop interrupts. Once you understand the mental model (nodes, edges, state), the rest follows naturally.

Prompt design is the real engineering. The quality of the simulation depends almost entirely on how you frame each agent’s context. The world state variables, the ideology field, the ancestral trauma injection — these are the actual levers. The model follows what you give it.

Small models surprise you. Gemma 3 4B running locally at 45–60 tokens/second on an 8GB GPU is genuinely capable of coherent long-horizon reasoning within a defined context. It won’t win philosophy debates, but for “what would a leader under pressure decide” — it’s more than enough.

The interesting output is not the decisions. It’s the drift. The individual Sovereign decisions are often generic. But watching the ideology field slowly mutate across fifty years of simulation time, tracking how the language changes under stress — that’s where the project earns its premise.

Build It Yourself

The full project is open source, structured as a six-chapter guide that builds from scratch:

I Built a Simulation Where Civilizations Destroy Themselves. Here's What I Learned.

The Stack

The Three-Tier Hierarchy

The Feature I’m Most Proud Of: Ancestral Trauma

Semantic Drift Is Real (at 4B Scale, at Least)

The Telegram Interface

What I Actually Learned

Build It Yourself

Comments

More from this blog

I Stopped Doing Manual CRM Work. Here's the AI Setup That Replaced It.

Alignment Drift in Persistent Multi-Agent LLM Simulations: A Theoretical Framework and Experimental Methodology

An Open-Source Framework for Emergent Multi-Agent Social Simulations

Command Palette

The Stack

The Three-Tier Hierarchy

The Feature I’m Most Proud Of: Ancestral Trauma

Semantic Drift Is Real (at 4B Scale, at Least)

The Telegram Interface

What I Actually Learned

Build It Yourself

Comments

More from this blog