Steering Vector Decay

by Ari Holtzman3 months ago
0

Activation Steering is awesome and simple: use contrastive pairs to make a vector v that captures a behavior the model encodes, add it at every generation step, and you've got a model that acts a certain way. But if you add v for the first three tokens of generation, and then stop, how quickly do hidden states stop looking like v? How does this change if you teacher-force the tokens generated with v but don't add them to the hidden state?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-steering-vector-decay-2026,
  author = {Holtzman, Ari},
  title = {Steering Vector Decay},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/h0MZpeE5daQUHo8dYkxt}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!