Steering Vector Decay

0

Activation Steering is awesome and simple: use contrastive pairs to make a vector v that captures a behavior the model encodes, add it at every generation step, and you've got a model that acts a certain way. But if you add v for the first three tokens of generation, and then stop, how quickly do hidden states stop looking like v? How does this change if you teacher-force the tokens generated with v but don't add them to the hidden state?

llms steering mechinterp

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-steering-vector-decay-2026,
  author = {Holtzman, Ari},
  title = {Steering Vector Decay},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/h0MZpeE5daQUHo8dYkxt}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!