Non-linear Diction

1

Since steering vectors work, there appear to be directions associated with different styles (in addition to other aspects of text). One question is: if there's a particular style that it's hard to to get a steering vector for, does that also mean it will be hard to prompt? Does the model represent any style it 'knows' a priori (rather than constructs online, e.g., if I ask it to always end every sentence with 🥕) non-linearly, or are they all linear?🥕

llms lrh mechinterp steering

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-nonlinear-diction-2026,
  author = {Holtzman, Ari},
  title = {Non-linear Diction},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/Tb1IIhOnXoYhPPqMsGcg}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!