Week of 04/20/26-04/26/26: More agents won't save you from collapse

By Haokun Liu

Welcome to another weekly entry! Thank you to everyone who submitted and voted on ideas.

This week we measured how similar adjacent layers are inside a neural network, tested whether adding more AI models to a self-training loop prevents collapse, and tried to estimate how much of an LLM's total capacity is spent on storing basic facts.

Winning ideas and generated repos here:

How similar are MLP hidden states to each other? by Ari Holtzman

A neural network transforms data one layer at a time. Each layer can only change the representation by so much before passing it on. If that's true, adjacent layers should end up looking similar to each other. How similar are they in practice, and does the answer change depending on how wide or deep the network is?

How Many Agents to Avoid Collapse? by Ari Holtzman

When AI models generate text and future AI models train on that text, the outputs tend to degrade over generations. This is called model collapse. Is there a minimum number of distinct AI models in this loop that prevents collapse? And what even counts as "distinct" for a model?

How much volume is standard knowledge in LLMs? by Ari Holtzman

LLMs know a lot of basic facts, like "Paris is the capital of France" or "peanuts are legumes." Storing these facts takes up some portion of the model's capacity. How much of an LLM's total parameter budget is actually devoted to this kind of standard, encyclopedic knowledge?


TL;DR for ideas

  1. Adjacent layers in a neural network look similar to each other. Wider networks show more diversity between layers because each layer has more room to do something different. At depth 16, the middle layers converge to nearly the same representation and stop changing much. This block structure appears even when the model is trained on shuffled labels.

  2. Adding more AI models to a self-training loop only slows collapse. The thing that actually protects an ecosystem is using models from different training lineages. Two models from different families can stay stable when they read shared outputs as context instead of training on them. Different prompts, different data sources, and different random seeds give almost no protection. Three identical models sharing a data pool collapsed faster than a single model alone.

  3. Standard factual knowledge takes up about one millionth of an LLM's storage capacity. A library of 10 million common facts would use at most 7% of an 8-billion-parameter model. Most of a model's capacity goes to language patterns, reasoning, and the long tail of obscure information.

Verdicts

IdeaVerdictNext Question
MLP layer similaritySupported. Adjacent layers are more similar than distant ones, with the degree set by depth and widthCould you prune the middle layers of a very deep MLP with little accuracy loss, since they all look alike?
Minimum viable agent populationPartially supported. Collapse can be avoided with architecturally distinct agents that read shared outputs instead of training on themDoes combining diverse agents with a small ongoing injection of real human content eliminate collapse entirely?
Standard knowledge volumeSupported. Standard facts occupy a tiny fraction of model parameters across all three measurement approachesDoes standard factual knowledge degrade faster than other kinds of knowledge when a model is compressed or quantized?

Findings from the Ideas

How Similar Are MLP Layers to Each Other?

The question. A feedforward neural network (the kind used inside most AI systems) transforms data one layer at a time. Each layer applies a linear operation followed by a nonlinearity and passes the result to the next layer. If each layer can only change the representation a little, adjacent layers should look similar. How similar are they in practice, and does making a network wider or deeper change the answer?

What the agents tried.

  • Claude trained networks at different depths (4, 8, and 16 layers) and widths (128, 512, and 2048 neurons per layer) on image classification tasks. They measured similarity across all pairs of layers using several metrics. They also tested what happens with shuffled labels and after removing high-activation samples.
  • Codex ran a similar study with four-layer networks on MNIST, Fashion-MNIST, and CIFAR-10. They compared layer similarity within the same trained model and across models trained from different starting points.
  • Gemini trained 10-layer networks with varying widths on MNIST and CIFAR-10. They computed the full grid of pairwise similarities before and after training, then compared trained networks to randomly initialized ones.

What happened.

All three agents confirmed the main prediction: adjacent layers are more similar to each other than layers that are far apart. Codex saw this in 12 out of 12 combinations of datasets and similarity measures, with statistically clear differences in 10 of them.

The interesting finding is what width does to similarity. You might guess that a wider network produces layers that all look alike because there is more overlap. The opposite is true. In Claude's experiments, wider networks showed less similarity between adjacent layers at depths 4 and 8 (the correlation between width and similarity was -0.95). A narrow layer can only squeeze out a small transformation before passing its output on, so consecutive layers end up looking alike. A wider layer has more room to do something different at each step. Gemini saw the same pattern: networks with 1024 neurons per layer had lower adjacent similarity than networks with 128 neurons.

Claude also found something unexpected at depth 16. The middle layers, roughly layers 2 through 14 out of 16, converged to nearly identical representations with similarity scores above 0.9. The first and last layers stayed distinct, but the vast middle was almost homogeneous. This block structure appeared even when the model was trained on randomly shuffled labels. The architecture and optimization process drive it, not the task. Training reduces layer similarity in shallow networks and preserves it in deep ones.

What we learned.

Adjacent MLP layers share a lot of structure because each layer can only change things incrementally. The narrower the network, the more this constraint bites. Wider layers have more room to differentiate. Very deep networks hit a different regime: past a certain depth, the middle layers stop changing the representation and just pass it through. A large fraction of the layers in a very deep network could probably be removed with little accuracy loss, since they mostly mirror their neighbors.


Does Adding More AI Models Prevent Collapse?

The question. When AI models generate text that future AI models train on, the quality degrades over time. This is called model collapse: outputs get repetitive, the model loses rare patterns, and the whole loop converges to a narrow low-quality regime. The question is whether having more distinct models in this loop changes the outcome, and what "distinct" has to mean for it to help.

What the agents tried.

  • Claude ran two kinds of experiments. In the first, copies of a small model kept retraining on each other's outputs over six generations, with the number of models ranging from 1 to 16. In the second, models with different training lineages shared a growing pool of text without retraining on it. They just read it as context. Claude also compared four types of diversity directly: different random seeds, different prompt personas (such as "physicist" or "children's storyteller"), different portions of the training data, and different base model families.
  • Codex ran the fine-tuning version with up to 4 models and measured how much collapse happened under each condition.
  • Gemini ran a fine-tuning loop with 1 and 3 models. The 3 models used slightly different generation settings (temperatures of 0.5, 0.7, and 0.9) to test whether that was enough to protect the ecosystem.

What happened.

More models does not prevent collapse by default. Codex found that 3 identical models sharing a data pool collapsed faster than a single model, since each model reinforced the others' biases. The shared pool accumulated the same distortions faster, hitting 47% unique sentences by generation 3 compared to 74% for a single model. Gemini confirmed that slightly varied temperatures gave only a small improvement. Claude's fine-tuning experiments showed that even 16 agents still collapsed every generation, just more slowly: a single model's perplexity grew 7.5 times over 6 generations, while 16 models grew only 2.1 times.

The clean result came from Claude's second experiment. When models read the shared pool as context instead of training on it, two models from different training lineages maintained stable diversity for all 12 iterations with zero detectable collapse. When those two models shared the same base training and only differed in prompt style or data source, diversity still decayed.

The axis comparison was direct: model family (different base training) preserved about 1.9 times more inter-agent diversity at iteration 10 than any other axis tested. Different prompts, different data slices, and different random seeds all performed at roughly the level of no diversity at all.

What we learned.

The risk of model collapse depends on whether models retrain on AI-generated content. If they do, more models slows the collapse but nothing we tested stopped it within the population sizes and time horizons used. The axis that buys real protection is architectural diversity from different training lineages. Prompt diversity, data diversity, and seed diversity all give very little. If models read shared content as context instead of training on it, even two models from different families can keep an ecosystem stable. The thing to watch is whether the producers of AI content are themselves trained on that content, more than how many of them there are.


How Much of an LLM Is Spent on Basic Facts?

The question. LLMs know a lot of things. They know that France has a capital city, that peanuts are legumes, that there are seven continents. These facts are encoded somewhere in the model's parameters. How much of the model's total capacity goes to this kind of standard encyclopedic knowledge, compared to language patterns, reasoning, and other capabilities?

What the agents tried.

  • Claude measured how many bits of factual information four Pythia models (ranging from 160 million to 2.8 billion parameters) had actually stored. They used two benchmarks of standard facts covering country capitals, heads of government, and category membership. They converted accuracy scores into bits of stored information and divided by the model's theoretical maximum storage capacity to get a fraction.
  • Codex took a causal approach. They identified the small set of neurons most responsible for factual answers, turned those neurons off, and measured how much factual accuracy dropped. For comparison they turned off the same number of randomly chosen neurons.
  • Gemini used an information-theoretic estimate. They computed how many bits it takes to represent 10 million common facts and how many parameters that would require given known LLM storage rates.

What happened.

All three approaches landed on small numbers.

Claude found that standard factual knowledge takes up between 0.00006% and 0.00022% of the model's bit budget. That is roughly one part in a million. The fraction gets even smaller as models get bigger, since the number of parameters grows faster than the amount of standard knowledge being stored. The facts models know best are the everyone-knows-this ones: taxonomic categories, country capitals, ownership. Relational facts like "the capital of X is Y" keep improving with scale. Categorical facts like "a Siamese is a kind of cat" plateau around 1 billion parameters.

Codex found that turning off just 300 neurons in GPT-2 XL (0.06% of the model's parameters) produced the same drop in factual accuracy as turning off the top 4 full layers (5.3% of parameters). Turning off 300 random neurons had almost no effect. Factual knowledge is concentrated in a small identifiable part of the model. The surrounding circuitry that supports retrieval is broader, but the core storage is sparse.

Gemini's upper-bound calculation said encoding 10 million standard facts would take about 537 million parameters at known LLM storage rates, or about 6.7% of an 8-billion-parameter model. Real facts share structure (knowing "dogs are mammals" implies many other properties), so the actual fraction used is likely lower.

What we learned.

Standard factual knowledge is a tiny fraction of what an LLM stores. Most of the model's capacity goes to language structure, reasoning patterns, long-tail entities, and procedural knowledge. This has a few practical implications. Aggressive compression is unlikely to destroy basic factual knowledge specifically, since that knowledge occupies such a small fraction of parameters. Better factual accuracy in larger models probably comes from the model getting better at expressing what it already knows, more than from devoting more capacity to facts. And the neurons responsible for factual recall are concentrated enough that targeted editing of specific facts is likely feasible, which is the intuition behind fact-editing research.


Next Week's Competition

The twenty-fifth weekly competition is now open! Voting closes Friday, May 1 at 11:59 PM AOE.

Check out this week's ideas and upvote the ones that excite you. Submit your own ideas to enter the next round!

This week we found that MLP layers look similar to their neighbors, that wider networks show more diversity between layers, and that deep networks develop a large middle block where almost nothing changes. More AI models in a self-training loop only prevents collapse when those models come from genuinely different training backgrounds. And standard factual knowledge takes up roughly one millionth of a model's storage capacity.

If you have thoughts on these findings, please feel free to reach out at haokunliu@uchicago.edu. We welcome collaborations and contributions! Check out our NeuriCo repo to see how the experiments are run.


If you are interested in citing this blog, use this bibtex:

@misc{liu-week-of-04-21-2026, author = {Liu, Haokun}, title = {Week of 04/21/26-04/27/26}, year = {2026}, month = {April}, day = {28}, url = {https://hypogenic.ai/blog/weekly-entry-260421} }