Cross-component Representations

2

Has anyone tried freezing all weights except for X component of an LM, finetuning it for various values of X, then comparing how this changes shared representations, e.g., the residual stream? Recent work suggests that even very small, arbitrary-seeming groups of parameters can learn tasks (https://arxiv.org/html/2502.11447v1). But what does a task look in an early MLP vs. a late MLP vs. just learning the O matrix of attention? Could we use this comparison to figure out the underlying of encoding of information, for instance by doing a PCA of the residual stream right before a prediction, across various different components X we finetune?

LLMs mechinterp

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-crosscomponent-representations-2026,
  author = {Holtzman, Ari},
  title = {Cross-component Representations},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/6qL8hGgjPpIS10z99jVG}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!