One of the things I don't really buy about so-called linear representations in the residual stream of Transformer LLMs: that they form a linear space where you can compose directions. However, I bet related vectors (e.g. the vector for red and the vector for blue) do compose naturally. Can we trawl through the linear directions we've found and map out which ones are coherent linear subspaces and which ones are singular directions that are probably not really linear so much as approximately linear in the places where we've tested?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-which-linear-directions-2026,
author = {Holtzman, Ari},
title = {Which linear directions are compositional?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/DQvf9rozqmC2H3nxft7J}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!