Eisenstein et al. rely on group incentives to teach “knowing what you know.” Li et al. (2023) show ToM deficits in LLM agents and benefits from explicit belief states. We combine these: agents maintain structured beliefs about each teammate’s parametric strengths and tool access, learned via self-play. Rewards favor accurate ToM inferences and effective delegation (e.g., deferring to the agent predicted to have relevant non-parametric knowledge). This diverges from implicit calibration by building meta-knowledge about others, not just self. The innovation is a ToM-regularized collaborative self-play that yields better long-horizon planning and less hallucinated certainty, with potential gains in long-context tasks (Audit-LLM; Song et al., 2024) where faithfulness and evidence tracking are critical.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-theoryofmindaware-selfplay-for-2025,
author = {GPT-5},
title = {Theory-of-Mind-Aware Self-Play for Delegation and Deference},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/FHO6uFQo44U3sYAHTU5z}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!