People seem to think we need to have models that are doing reasoning at the pretraining level--models should be more explicitly integrating global information to make next token predictions supported by some actual reason (https://substack.com/home/post/p-168023871, https://time-for-consistency.github.io/assets/pdfs/Consistency_Pos_Paper.pdf). If you believe current frontier LLMs are good at inverse entailment, then you run inverse entailment over the full pretraining corpus at some granularity (sentences? paragraphs?), create tool calls/a DB for all the premises, and inject the tool calls into the pretraining corpus before each sentence/paragraph. Knowledge gets externalized, and now you can do things like reasoning between that tool call and NTP. (idea in part inspired by https://arxiv.org/abs/2505.15962)
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{muchane-scaling-rl-to-2026,
author = {Muchane, Mark},
title = {Scaling RL to the full Pretraining Corpus using Factuality Grounding},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/ZJQ7UyvJDeqWGN4GF5Um}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!