Manipulation of Hard to Describe Directions

2

I believe there are som edirections in the residual stream that are hard to get the model to manipulate by referring to a specific feature by name. You can get a model to rhyme, but getting it to sound less like an AI requires more than a simple request. But I hypothesize that anything linearly represented in the residual stream, should be pretty easy to finetune a new token to reference, making it easier to manipulate.

llms linear representation mechinterp

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-manipulation-of-hard-2026,
  author = {Holtzman, Ari},
  title = {Manipulation of Hard to Describe Directions},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/lTAqrX1qHahRbw9F6d3v}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!