If you tell an LLM to obey a bunch of substitutions, i.e., 'carrot = horse', '9 = 100', etc. (i) after how many does it start to break down? (ii) what is its error profile? mostly drawing from the original term? interpolating between them? something in between?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-substitution-errors-2026,
author = {Holtzman, Ari},
title = {Substitution Errors},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/oGrF9YvU5uZdYzEkWUow}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!