A given LLM tends to be more susceptible to certain kinds of prompts than others. For instance, a classic early jailbreak was having the model write a story that included the bad behavior being requested. The big question to me is: do these jailbreaks tend to be 'all encompassing', so that any bad behavior the model is ever likely to do can be accessed by a single jailbreak? Or is it that different behaviors are more susceptible to different jailbreaks? The really big question here is: How should we categorize behaviors to get a clean answer?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-is-there-one-2026,
author = {Holtzman, Ari},
title = {Is there one LLM 'command language' or many?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/NuXwOmkHyUMvOWGyTxcL}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!