LLMs are bad at keeping obvious secrets

4

LLMs are bad at keeping obvious secrets. For instance, when LLMs write fiction they foreshadow plot way too much because planning and prose generation are entangled—the model can't help but hint at what it knows is coming the way it's doing implicitly in its hidden states. I think this is one of the reasons many of the "scheming" results have felt underwhelming and artificial. The model wants to say what it's thinking, without being told it has to. Is this still true when it can rely on a plan it's conditioning on that sets up the future so it doesn't have to constantly get ready for that future on its own?

llms planning writing scheming monitoring llm thoughts Implemented:https://github.com/Hypogenic-AI/llms-keep-secrets-claude Implemented:https://github.com/Hypogenic-AI/llms-keep-secrets-codex Implemented:https://github.com/Hypogenic-AI/llm-keeping-secrets-gemini

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-llms-are-bad-2025,
  author = {Holtzman, Ari},
  title = {LLMs are bad at keeping obvious secrets},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/IiwUQTntauUfrIaJvWES}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!