Let's say you visualized text in a 2D PCA. In that space, LLMs would probably generate far away from any kind of boundary you could draw around 'likely' or 'reasonable' text. But my guess is that way down the PCA vectors, LLMs would be more willing to cross that boundary. Thus, I think LLM extrapolation largely comes from low-variation dimensions of text. We can get them to combine incongruent high-variation dimensions (e.g. style and formalities that don't work together normally), if we force them, but they'll tend to mix and match choppily, integrating stereotypical aspects of both. True extrapolation, I predict, comes from low-volume regions where LLMs naturally must extrapolate.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-lowvariation-extrapolation-2026,
author = {Holtzman, Ari},
title = {Low-Variation Extrapolation},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/IZklu3O4fDPT90daWR23}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!