Personally, I think personas are a discreet and recognizable feature in LLMs. Though I don't think anyone has proven this yet, that's my hypothesis. Now, different personas like and dislike different things, but here's the big question: do LLMs model the fact that different personas would dislike each It would be one thing for LLMs to say that one persona likes high literature and one persona likes pulp novels and that these each hate the others' preferences. But are all of those preferences captured individually in the persona as likes and dislikes for certain objects or entities or whatever? Or is it that the model actually captures the fact that because of the way persona space is in humans, things that one persona likes are more likely to disliked by another persona that is far enough away? There are of course quote mashup unquote exceptions in human persona space. But as a general rule, I think human personas are created more out of distinction than individual intrinsic likes and dislikes. And so if LLMs are genuinely capturing this, and if I'm right about human personas, this should be a salient feature in models.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-do-llms-model-2026,
author = {Holtzman, Ari},
title = {Do LLMs model persona-persona dislike?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/bmJFacRgxhVuuDUD4wnA}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!