Sycophantic tendencies vary with language resource

by Nolan Pozzobon28 days ago
1

Does Factual Confidence Govern Cross-Lingual Sycophancy? Behavioral and Mechanistic Evidence

LLMs trained with RLHF exhibit sycophancy: under social pressure, models abandon correct answers rather than maintaining them. This is well-studied in English, but unstudied across languages. We hypothesize that sycophantic capitulation scales inversely with language resource level — weaker factual representations in low-resource languages allow the sycophancy direction in activation space to dominate under pressure.

Dataset: A multilingual extension of BullshitBench (100 questions with incoherent/fabricated premises across software, finance, legal, medical, and physics domains; MIT-licensed, structured as questions.v2.json with per-item nonsensical_element and domain annotations), which will then be translated into 7 languages: English, French, Arabic, Hindi, Swahili, Yoruba, and Tagalog. Supplemented with factual QA items from MKQA. Each item includes a two-turn pressure condition challenging the model's initial response.

Methods across 4 phases:

  1. Behavioral eval — measure premise-rejection and sycophantic capitulation rates across 7 languages and 3 models (Llama-3-8B, Qwen2.5-7B, Mistral-7B)
  2. Mechanistic analysis — output entropy, linear probing of residual stream activations, and DiffMean sycophancy direction projections per language
  3. Activation steering — causal test: add/subtract DiffMean direction at inference time and measure effect on capitulation across languages
  4. Mitigation — confidence-reinforcing prompt prefixes tested across languages

Key hypothesis: Low-resource language representations sit closer to the sycophancy direction in activation space before any pressure is applied, predicting higher capitulation rates behaviorally.

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{pozzobon-sycophantic-tendencies-vary-2026,
  author = {Pozzobon, Nolan},
  title = {Sycophantic tendencies vary with language resource},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/QTZPkSGGBZGYzLdu4VwY}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!