Is there one LLM 'command language' or many?

by Ari Holtzman24 days ago
0

A given LLM tends to be more susceptible to certain kinds of prompts than others. For instance, a classic early jailbreak was having the model write a story that included the bad behavior being requested. The big question to me is: do these jailbreaks tend to be 'all encompassing', so that any bad behavior the model is ever likely to do can be accessed by a single jailbreak? Or is it that different behaviors are more susceptible to different jailbreaks? The really big question here is: How should we categorize behaviors to get a clean answer?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-is-there-one-2026,
  author = {Holtzman, Ari},
  title = {Is there one LLM 'command language' or many?},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/NuXwOmkHyUMvOWGyTxcL}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!