Do AI Agents Form Mental Models of Each Other? Evaluating Opponent Modeling in Multi-Agent LLM Systems

by Peiyu Chen6 months ago

5

Large language models can debate, cooperate, and negotiate — but do they actually model each other as individuals? This project proposes using Werewolf/Mafia–style social deduction games to explore whether AI agents spontaneously develop something resembling opponent modeling or even Theory of Mind.

Social deduction games are ideal because they combine:
· hidden roles and uncertainty,
· deception and persuasion,
· multi-round discourse,
· shifting coalitions,
· trust calibration under pressure.

Humans naturally build mental models of other players (“she’s always defensive when she’s guilty”; “his timing changed this round”). The question is: will AI agents do the same if placed in this environment?

The idea is to compare two AI agents:

Dialogue-only agent — uses conversation history as memory but no explicit reasoning about others.
Opponent-modeling agent — tracks beliefs about each partner (e.g., likelihood of being a werewolf, persuasion tendencies, reaction patterns) or uses shallow recursive reasoning (“I think she thinks I’m innocent”).

We then evaluate:
· Do opponent-modeling agents win more often?
· Do they infer hidden roles more accurately?
· Does group belief converge faster?
· Do distinct “mental models” of specific players emerge in embeddings or behavior?

This setting gives us a clean, measurable way to study AI social cognition—not just whether agents communicate, but whether they adapt their communication based on who they believe they’re talking to. It also creates a fun, interpretable benchmark for future multi-agent LLM research.

Multi-agent systems Game theory Human-AI interaction LLM behavior Persuasion Social cognition Implemented:https://github.com/ChicagoHAI/ai-agents-modeling-claude Implemented:https://github.com/ChicagoHAI/ai-agents-modeling-codex Implemented:https://github.com/ChicagoHAI/ai-agents-modeling-gemini

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{chen-do-ai-agents-2025,
  author = {Chen, Peiyu},
  title = {Do AI Agents Form Mental Models of Each Other? Evaluating Opponent Modeling in Multi-Agent LLM Systems},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/5DAV2ivAKK27mYli9p2R}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!