Do LLMs Understand Nonsense Commands?

by Ari Holtzman5 months ago
6

A simple idea:

  1. Find a prompt that does something specific, for instance, causes the model to write a poem that doesn't look anything like English. Use prompt optimization with a specific focus on increasing the perplexity of prompt.
  2. Ask the model to explain its prompt.
  3. Study.

Will models be able to explain prompts the way they can English? Or are these prompts hard to find and jailbreaking is just a special case, but a directed jailbreak is hard to find?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-do-llms-understand-2025,
  author = {Holtzman, Ari},
  title = {Do LLMs Understand Nonsense Commands?},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/QnPYsHR9aQDCkAc64amN}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!