Could we do something like meta-learning during pre-training? Where we regularly cause distribution shifts while randomizing everything else so that the model mostly experiences normal pre-training except the model needs to get good at generalizing to a new distribution each time. I guess the tricky part is figuring out what that distribution should be but time seems like a reasonable variable hiding pockets of time that you don't share with the model or certain subsets of documents such as, I don't know, certain kinds of legal documents like, just maritime law. And then maybe the model's not as good at maritime law overall as if you trained it all at once. But it gets good at extrapolating fast.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-distribution-shift-curriculum-2026,
author = {Holtzman, Ari},
title = {Distribution Shift Curriculum},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/lYWbO3Ko8Vml3ORT5pgV}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!