We all know about the adversarial prompts that look like gibberish but get an LLM to do something like tell you how to build a bomb or simply write a python function. Is it easier or harder to hide such 'instructions' in very long documents? Does the document create noise that overrides the adversarial effect or does it give more room to work with?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-is-it-easier-2025,
author = {Holtzman, Ari},
title = {Is it easier or harder to hide adversarial prompts in longer documents?},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/RE4hPBZFPcPjlucQzkA8}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!