Necessity, Sufficiency, and Selectivity

0

In MechInterp interventions I see a lot of talk about necessity and sufficiency of interventions.

The current standard appears to be:

A component X is necessary to behavior Y, if you ablate it and Y goes away.
A component X is sufficient for behavior Y, if you amp it up (e.g. for activations you add a steering vector) and Y appears where it wouldn't

But I feel like most such evidence doesn't capture selecitivity: does X effect other stuff? Is Y just downstream of that other stuff? Many evals include something like "does model perplexity go up on unrelated data" or "does the model still do well on MMLU". This seems entirely insufficient (no pun intended).

We should fix this but also, I'm trying to think what causes the leakage in our current working definitions of necessity and sufficiency. One thing is that behaviors (Y) are often poorly defined enough that we are really seeing necessity or sufficiency to some subset of Y, often not the same subset.

How can we clean this all up?

llms mechinterp scientific methodology

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-necessity-sufficiency-and-2026,
  author = {Holtzman, Ari},
  title = {Necessity, Sufficiency, and Selectivity},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/lFLYN3AqEPxjiygIyvt9}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!