There was a paper recently on memorization in LLMs (https://arxiv.org/abs/2505.24832) which relied on training models with random bitstrings ({0, 1}). Can we instead train them on alphanumeric sequences ([a-z,A-Z,0-9])? That way, we could compare them to actual language models trained on alphanumeric sequences.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{heineman-memorization-vs-generalization-2025,
author = {Heineman, David},
title = {memorization vs. generalization in LLMs using alphanumeric strings},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/EwGDU8KlpXMgNCVcrUSe}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!