Dilution Bench

0

What if we took examples of things that LLMs can do (plot summarization, bug finding) and then purposefully diluted it (e.g. adding more description, more verbose dialogue, more boiler plate code). Can we use what level of dilution d an LLM can tolerate while still maintaining performance above some threshold to measure the model's ability to find true patterns in data?

llms benchmarking dilution

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-dilution-bench-2026,
  author = {Holtzman, Ari},
  title = {Dilution Bench},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/fepAz30xGyn0Eqv5cKJD}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!