LLM Unspeakables

by Ari Holtzmanabout 21 hours ago
0

Assistant-tuned LLMs are trained not to talk about certain things (e.g. their opinions). Do some of those things they're not allowed to talk about secretly form representations that end-up representing 'undisclosed opinions'. How would we find them?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-llm-unspeakables-2026,
  author = {Holtzman, Ari},
  title = {LLM Unspeakables},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/uMyV9umKSVAgsFfT9IrH}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!