LLM Unspeakables

by Ari Holtzmanabout 21 hours ago

0

Assistant-tuned LLMs are trained not to talk about certain things (e.g. their opinions). Do some of those things they're not allowed to talk about secretly form representations that end-up representing 'undisclosed opinions'. How would we find them?

llms hard to measure

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-llm-unspeakables-2026,
  author = {Holtzman, Ari},
  title = {LLM Unspeakables},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/uMyV9umKSVAgsFfT9IrH}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!