Scaling Laws for Shallow Finetuning

0

It seems larger models get a lot more out of brief finetuning than smaller models. Can we quantify this with a scaling law? Can we predict how finetunable a model is by looking at performance after just one step? 10 steps? Does finetuning capability vary a lot across models of the same size? Is that still true for shallow finetuning, which feels like it might be more invariant?

llms finetuning scaling laws

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-scaling-laws-for-2025,
  author = {Holtzman, Ari},
  title = {Scaling Laws for Shallow Finetuning},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/6tUaZ3S3sHnsLwSj5MOT}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!