How is the notion of 'X' is a pokemon stored in an LLM? Probably via linear direction in the residual stream that checks with an MLP? Whatever it is, once we discover it, can we see what things that aren't pokemon are the closest to (in a given LLMs opinion) being pokemon?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-do-any-nonpokemon-2026,
author = {Holtzman, Ari},
title = {Do any non-pokemon have non-trivial pokemon quality?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/fM0g0UQh6YJ5b09tlSEm}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!