There are now many signs that LLMs have implicit vocab, that aren't strictly part of the vocabulary matrices, but almost act like it. (See: https://arxiv.org/abs/2406.20086) Can we make a logit lens that uses these?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-logit-lens-with-2026,
author = {Holtzman, Ari},
title = {Logit Lens with Implicit Tokens},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/tO8okrersuj6MqmTj1VG}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!