Retrofitting tokenizers for efficient reasoning

1

Generation time is the bottleneck for test-time scaling.

Can we make CoTs more efficient by introducing new tokens, which don't have a human interpretable meaning, but can be used to compress the reasoning CoT?

or

Can a language model expand or contract its own tokenizer at test time, based on how much compute it wants to spend on a thought trace?

Relevant position paper: https://arxiv.org/abs/2502.07586

language models LLMs Artificial Intelligence

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{heineman-retrofitting-tokenizers-for-2025,
  author = {Heineman, David},
  title = {Retrofitting tokenizers for efficient reasoning},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/Ehci9fhm113f9s8fVJN1}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!