The information content of a generative model's output is bounded by the information content of its input. For LLMs, measure input and output complexity via compression ratio using a reference LLM. For image generators, measure prompt complexity the same way and output complexity via a learned image compressor. Plot the relationship. If outputs consistently exceed inputs in information content, the hypothesis is wrong. If they track or fall below, we've learned something about what these models can and can't do.
Caveat: Gibberish is high-entropy but doesn't make aligned LLMs do anything interesting—they just ask for clarification. So it's not raw information content but something like usable information, whatever that means. Probably: information the model can actually route through its learned representations. Gibberish doesn't compress well but it also doesn't land anywhere specific.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-does-information-in-2025,
author = {Holtzman, Ari},
title = {Does information in ≈ information out in modern GenAI?},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/Zs55gEaURmPn94YOsXv8}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!