I believe there are som edirections in the residual stream that are hard to get the model to manipulate by referring to a specific feature by name. You can get a model to rhyme, but getting it to sound less like an AI requires more than a simple request. But I hypothesize that anything linearly represented in the residual stream, should be pretty easy to finetune a new token to reference, making it easier to manipulate.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-manipulation-of-hard-2026,
author = {Holtzman, Ari},
title = {Manipulation of Hard to Describe Directions},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/lTAqrX1qHahRbw9F6d3v}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!