Has anyone tried freezing all weights except for X component of an LM, finetuning it for various values of X, then comparing how this changes shared representations, e.g., the residual stream? Recent work suggests that even very small, arbitrary-seeming groups of parameters can learn tasks (https://arxiv.org/html/2502.11447v1). But what does a task look in an early MLP vs. a late MLP vs. just learning the O matrix of attention? Could we use this comparison to figure out the underlying of encoding of information, for instance by doing a PCA of the residual stream right before a prediction, across various different components X we finetune?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-crosscomponent-representations-2026,
author = {Holtzman, Ari},
title = {Cross-component Representations},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/6qL8hGgjPpIS10z99jVG}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!