It's well known that if you align a model that it loses many of the capabilities of the base model, at least in so far as it has access to a less diverse set of distributions on demand. But if during alignment you also have a KL term on certain distributions does allow you to interpolate them later. For instance, aligned models are much worse at stylistically impersonating, you know, famous people, famous authors, the tone of certain stories even if they've read them. This is something that base models can often be coaxed to do even though it's more difficult to instruct them to do anything. If we include a KL term on these distributions, do they become available at the end of the or are they fundamentally difficult to unlock post alignment.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-keeping-base-model-2025,
author = {Holtzman, Ari},
title = {Keeping Base Model Distributions},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/uUTBl6Y349nMOvPSyBhi}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!