Repulsive Memories in LLMs

by Ari Holtzmanabout 2 months ago

0

I bet that for a given LLM X there are certain kinds of data Y drawn from distribution D that if you finetune X on Y then it will perform worse on D. It misgeneralizes, due to its priors, as we all do. Are there any interesting cases of this that don't feel totally adversarial and artificial?

LLMs finetuning misgeneralization.

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-repulsive-memories-in-2026,
  author = {Holtzman, Ari},
  title = {Repulsive Memories in LLMs},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/vNApT4dvkxmJTkvyoMEX}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!