Let's say we have a bijection f from V -> V, that essentially shuffles which tokens which represent which surface forms. If we feed this into a normal LM, it will likely have no idea what's going on for the first few words, helplessly trying to interpret them normally. Then it will likely represent the shuffle as it attunes to the code. After a lot of text, will the residual stream look kind of normal, modulo the shuffle? Or will the model never recover because its embeddings don't get remapped properly?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-the-residual-stream-2026,
author = {Holtzman, Ari},
title = {The Residual Stream After Bijective Token Transformation},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/cQGkntqDtSeMPCouGPZB}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!