Diluted Adversarial Prompts?

by Ari Holtzman13 days ago
0

There is evidence that adversarial prompts essentially hack an LLM's attention mechanism (e.g. https://arxiv.org/abs/2406.11717). Is it possible to make diluted adversarial prompts, or does dilution just cause them to stop actually interrupting processing?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-diluted-adversarial-prompts-2026,
  author = {Holtzman, Ari},
  title = {Diluted Adversarial Prompts?},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/NKAaMPoJfSe3UJWJuH8u}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!