Q: Can we speed up generation with early exiting?
Idea: Pre-train models with classification head after every layer
Notes:
I really feel that this has been done, but I could not find an example
In CV (https://arxiv.org/abs/1409.4842), this early exits were used before we had residual streams (see Figure 3 of GoogLeNet)
There's a connection to MTP (https://arxiv.org/abs/2404.19737) (first scaled up in DeepSeek V3) in https://arxiv.org/abs/2412.19437v2.
Could be a nice connection to continuous CoTs
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{heineman-early-exits-for-2025,
author = {Heineman, David},
title = {Early exits for LLMs},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/NAjgyTEHZWbgWBRdPslf}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!