Overtrain on the midtraining data

2

Overtraining on midtraining data is better than training on pre-train data

What if training on 100 epochs on the mid-training data is better than training on < 1 epcoh of the pre-training data?

Maybe it's better on math / code than QA? Math / code are usually less token-intensive.

language models LLMs Artificial Intelligence

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{heineman-overtrain-on-the-2025,
  author = {Heineman, David},
  title = {Overtrain on the midtraining data},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/QA29uVbXYAHMKzrXxhE0}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!