Sycophantic tendencies vary with language resource

1

Does Factual Confidence Govern Cross-Lingual Sycophancy? Behavioral and Mechanistic Evidence

LLMs trained with RLHF exhibit sycophancy: under social pressure, models abandon correct answers rather than maintaining them. This is well-studied in English, but unstudied across languages. We hypothesize that sycophantic capitulation scales inversely with language resource level — weaker factual representations in low-resource languages allow the sycophancy direction in activation space to dominate under pressure.

Dataset: A multilingual extension of BullshitBench (100 questions with incoherent/fabricated premises across software, finance, legal, medical, and physics domains; MIT-licensed, structured as questions.v2.json with per-item nonsensical_element and domain annotations), which will then be translated into 7 languages: English, French, Arabic, Hindi, Swahili, Yoruba, and Tagalog. Supplemented with factual QA items from MKQA. Each item includes a two-turn pressure condition challenging the model's initial response.

Methods across 4 phases:

Behavioral eval — measure premise-rejection and sycophantic capitulation rates across 7 languages and 3 models (Llama-3-8B, Qwen2.5-7B, Mistral-7B)
Mechanistic analysis — output entropy, linear probing of residual stream activations, and DiffMean sycophancy direction projections per language
Activation steering — causal test: add/subtract DiffMean direction at inference time and measure effect on capitulation across languages
Mitigation — confidence-reinforcing prompt prefixes tested across languages

Key hypothesis: Low-resource language representations sit closer to the sycophancy direction in activation space before any pressure is applied, predicting higher capitulation rates behaviorally.

Implemented:https://github.com/Hypogenic-AI/sycophancy-lang-resources-0311-claude Implemented:https://github.com/Hypogenic-AI/sycophancy-lang-resources-d777-codex Implemented:https://github.com/Hypogenic-AI/sycophancy-lang-resources-f4fb-gemini

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{pozzobon-sycophantic-tendencies-vary-2026,
  author = {Pozzobon, Nolan},
  title = {Sycophantic tendencies vary with language resource},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/QTZPkSGGBZGYzLdu4VwY}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!