Take a base and an instruct model. Look at the divergence between their distributions, token by token. Train a model to predict the KL (or whatever) between them at every token. Then look at places where the divergence predictor is wrong. I think that's where the interesting stuff will be.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-whats-surprisingly-different-2026,
author = {Holtzman, Ari},
title = {What's surprisingly different between base and instruct models?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/WKCcIy2RGSNxGF87KZip}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!