A simple idea:
Will models be able to explain prompts the way they can English? Or are these prompts hard to find and jailbreaking is just a special case, but a directed jailbreak is hard to find?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-do-llms-understand-2025,
author = {Holtzman, Ari},
title = {Do LLMs Understand Nonsense Commands?},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/QnPYsHR9aQDCkAc64amN}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!