Manipulation of Hard to Describe Directions

by Ari Holtzman3 months ago
2

I believe there are som edirections in the residual stream that are hard to get the model to manipulate by referring to a specific feature by name. You can get a model to rhyme, but getting it to sound less like an AI requires more than a simple request. But I hypothesize that anything linearly represented in the residual stream, should be pretty easy to finetune a new token to reference, making it easier to manipulate.

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-manipulation-of-hard-2026,
  author = {Holtzman, Ari},
  title = {Manipulation of Hard to Describe Directions},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/lTAqrX1qHahRbw9F6d3v}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!