Lying Style

by Ari Holtzman2 months ago
2

If you ask a model to lie about something and you can get it to (e.g. via jailbreaking, roleplaying, etc.) is that distribution noticeably different from how the model normally writes?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-lying-style-2026,
  author = {Holtzman, Ari},
  title = {Lying Style},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/4QKMaU77er8GPrZIYA7S}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!