Large language models can debate, cooperate, and negotiate — but do they actually model each other as individuals? This project proposes using Werewolf/Mafia–style social deduction games to explore whether AI agents spontaneously develop something resembling opponent modeling or even Theory of Mind.
Social deduction games are ideal because they combine:
· hidden roles and uncertainty,
· deception and persuasion,
· multi-round discourse,
· shifting coalitions,
· trust calibration under pressure.
Humans naturally build mental models of other players (“she’s always defensive when she’s guilty”; “his timing changed this round”). The question is: will AI agents do the same if placed in this environment?
The idea is to compare two AI agents:
We then evaluate:
· Do opponent-modeling agents win more often?
· Do they infer hidden roles more accurately?
· Does group belief converge faster?
· Do distinct “mental models” of specific players emerge in embeddings or behavior?
This setting gives us a clean, measurable way to study AI social cognition—not just whether agents communicate, but whether they adapt their communication based on who they believe they’re talking to. It also creates a fun, interpretable benchmark for future multi-agent LLM research.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{chen-do-ai-agents-2025,
author = {Chen, Peiyu},
title = {Do AI Agents Form Mental Models of Each Other? Evaluating Opponent Modeling in Multi-Agent LLM Systems},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/5DAV2ivAKK27mYli9p2R}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!