Debate is the common approach in oversight. Typically it is framed as two models defending their own positions. However, that can be counterproductive, especially in cases with reasoning paths. We could get two models to collaborate and resolve the disagreement instead.
One possible failure mode is that the models are too sycophant. So one hypothesis is that highly sycophantic models will fail at disagreement resolution.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{tan-collaborative-disagreement-resolution-2025,
author = {Tan, Chenhao},
title = {Collaborative Disagreement Resolution Outperforms Debate},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/sCEYT0AaFbwrTjm18cA5}
}Please sign in to comment on this idea.
A similar idea I previously had is how LLMs engage in teamwork, that requires collaboration between different agents, while each agent has their own personal traits. This means if we know their attributes, we can act accordingly to have a better collaboration.
"sycophantic" does not always fail especially in teamworking environment. If the goal is to make an agreement, no matter what the final conclusion is.
If the hypothesis is that highly sycophantic models will fail at disagreement resolution, maybe another model can develop contextual understanding about how other models behave and then be aware of personality trait of sycophantic model, later it can invite a third model to involve in the negotiation process. Or maybe we can employ a judge model, like a mediator is necessary for divorce, to restate the main objective of the negotiation, or abstain the opinion of sycophantic model, which typically sounds "sycophantic” and lacks perspective.
The context I was thinking about is more of collaborative truth-seeking, for example, answering hard questions. But yes, sycophancy will make it easier to reach agreement.
how do you differentiate between true argument resolution versus one model being more sycophantic than the other?
Not sure!