Is there a direction in the residual stream for every phrase, e.g., 'washing machine' or is there some kind of arithmetic, e.g., 'washing' + 'machine'? If there is arithmetic how does the model solve weird problems where there are collisions, e.g., something like "wash" + "dirty washing machine" = "clean washing machine"? How does the model represent these concepts?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-phrase-directions-or-2026,
author = {Holtzman, Ari},
title = {Phrase Directions or Phrase Arithmetic?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/ciZcEwXLwZz72vHHLjZm}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!