An Open-Source Framework for Emergent Multi-Agent Social Simulations

"This serves as a complex mirror to real-world systems of power, belief, and emergent social behavior.”

The architecture leverages LangGraph for cycle management and ChromaDB for persistent storage. It runs locally on consumer-grade hardware by utilizing Gemma models for their high intelligence-to-parameter ratio.

Interventions are managed asynchronously through a Human-in-the-Loop (HITL) interface, enabling real-time state injections that influence the simulation's trajectory via a Telegram-based command protocol.

Persistence and Long-Context Management

To achieve infinite uptime, the framework implements hierarchical recursive summarization. This ensures the "World Engine" maintains historical awareness without exceeding context limits, while LangGraph checkpointers provide durability across system reboots.

Orchestration: LangGraph and the Router Pattern

The backbone of this infinite loop is LangGraph, which facilitates a stateful, cyclic graph architecture. It allows nodes to represent distinct agents—such as the Logic Engine and Narrative agents—while edges define the flow of information through a persistent Observe-Think-Act loop.

To bypass the strict limits of consumer cards, the framework utilizes llama.cpp’s router mode. This implementation uses a Least Recently Used (LRU) eviction policy, dynamically swapping models between the GPU and the system RAM.

The Quantization Standard

By reducing model weights from 16-bit to 4-bit, we significantly lower the VRAM footprint—allowing a 9B model that would normally require 18GB to fit into just 5.8GB of VRAM, thus maintaining high reasoning capabilities on a local workstation.

This precision reduction preserves the vast majority of the model's emergent behavior while leaving critical headroom for the KV Cache, which stores the context of the current conversation. For extended simulations, we further utilize 4-bit KV cache quantization to reduce context penalty by up to 50%, enabling deeper historical awareness within the hardware's physical limits.

VRAM Reference by Model and Quantization

These figures are planning ranges for local use. Real usage will be slightly higher once the runtime, context window, and KV cache are active.

Model	Q2	Q4 (Default)	Q5	FP16	Practical Application
E2B	~0.8 GB	~1.5 GB	~1.8 GB	~4 GB	Minimal agents; CPU-heavy setups.
E4B	~2 GB	~3.5 GB	~4.2 GB	~8 GB	Standard starting point for local agents.
26B A4B	~9 GB	~14 GB	~17 GB	~52 GB	High-tier social complexity; requires 24GB GPU.
31B	~12 GB	~20 GB	~24 GB	~62 GB	Maximum local reasoning; 24GB VRAM required.

Join the Experiment.

This is an invitation to build your own scenarios—whether they be utopian social experiments or harrowing studies of civilizational collapse.

https://soundcloud.com/mihai-vancea-909027674/building-persistent-ai

An Open-Source Framework for Emergent Multi-Agent Social Simulations

Persistence and Long-Context Management

Orchestration: LangGraph and the Router Pattern

The Quantization Standard

VRAM Reference by Model and Quantization

Join the Experiment.

Comments

More from this blog

I Stopped Doing Manual CRM Work. Here's the AI Setup That Replaced It.

Alignment Drift in Persistent Multi-Agent LLM Simulations: A Theoretical Framework and Experimental Methodology

I Built a Simulation Where Civilizations Destroy Themselves. Here's What I Learned.

Command Palette

Persistence and Long-Context Management

Orchestration: LangGraph and the Router Pattern

The Quantization Standard

VRAM Reference by Model and Quantization

Join the Experiment.

Comments

More from this blog