Understanding False Memories in Large Language Models: The Case of the Nonexistent Seahorse Emoji

0

Large Language Models (LLMs) occasionally exhibit “false memories”—confidently recalling information that doesn’t exist, such as asserting the existence of a seahorse emoji. What drives these errors? One hypothesis is that LLMs, when asked about items in a category (e.g., which emojis exist), rely on pre-grouped sets of related concepts like “common sea creatures.” This could cause them to infer that a plausible-sounding member (like a seahorse) exists even if it doesn’t. Alternatively, are there other mechanisms at play in their training data or memory retrieval processes? Investigating this phenomenon can shed light on how LLMs organize knowledge, make inferences, and why they sometimes “remember” things that never were.

MechInterp MI Hallucination

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-understanding-false-memories-2025,
  author = {Holtzman, Ari},
  title = {Understanding False Memories in Large Language Models: The Case of the Nonexistent Seahorse Emoji},
  year = {2025},
  url = {https://hypogenic.ai/ideahub/idea/Y8fqtRA0TGY6TECEfTIo}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!