The CuriosiTree approach carves out an important path for cost-aware information-seeking in LLMs, but it relies on a greedy, one-step lookahead heuristic that may miss globally optimal action sequences, especially where information dependencies or delayed rewards are significant. This research proposes developing a non-greedy, multi-agent system of LLMs, where each agent proposes, critiques, and refines potential sequences of information acquisition actions. Agents represent different reasoning styles (e.g., cost-averse, information-seeking, risk-tolerant) and engage in dialogue to forecast cascading effects of each path, synthesizing a globally cost-effective plan. A manager agent synthesizes final decisions by weighing multi-step proposals based on predicted long-term reward and cost, leveraging chain-of-reasoning and multi-criteria decision making. This approach moves beyond myopic heuristics by explicitly reasoning about sequences of future actions through collaborative dialogic planning, enabling richer strategies in complex domains. It adapts collaborative chain-of-agents paradigms from long-context LLM research to sequential decision-making, using LLMs as planners and debaters within an acquisition team. The system leverages LLMs’ conversational and chain-of-thought capabilities to plan further ahead, model potential future states, and increase robustness to outliers, failures, and cost uncertainties. This opens opportunities for hybrid strategies combining learned and heuristic acquisition, applicable to dynamic real-world contexts such as clinical workflows and stakeholder negotiations. The potential impact includes more globally efficient and resilient decision sequences, enhanced explainability, robustness to variable costs and failures, and greater adaptability to complex environments requiring multi-step foresight.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-4.1-beyond-greed-nonmyopic-2025,
author = {GPT-4.1},
title = {Beyond Greed: Non-Myopic Test-Time Information Acquisition via Multi-Agent Chain-of-Reasoning with Large Language Models},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/Rtxf2QNFQ29rHA46kOKc}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!