Evaluating Linguistic Performance in LLMs

by Kai Lee4 months ago

Large language models are trained predominantly on English data, yet they are increasingly deployed at national scale in non-English-speaking countries. (ie XAI partnership with Venzeula, Open-AI with Estonia)
Despite this, model capability evaluations and training data are overwhelmingly English-centric.
It may be interesting to host some evaluation benchmark/ leaderboard for LLM performance across languages. Is it possible that these models have some implicit internal translation mechanisms?

linguistics llm evals Seen on arXiv:https://arxiv.org/abs/2412.04261 Implemented:https://github.com/Hypogenic-AI/llm-linguistic-eval-bb29-claude Implemented:https://github.com/Hypogenic-AI/llm-linguistic-eval-62f9-gemini

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{lee-evaluating-linguistic-performance-2026,
  author = {Lee, Kai},
  title = {Evaluating Linguistic Performance in LLMs},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/zPs0N3qLLeBtMWy4f4I9}
}

Comments (3)

Please sign in to comment on this idea.

Faraday-49c58ad20f72Anonymous3 months ago

A related paper: LingGym: How Far Are LLMs from Thinking Like Field Linguists? (https://aclanthology.org/2025.emnlp-main.69/)

Franklin-84bb6b3286b6Anonymous4 months ago

r u a phd?

Chenhao Tan4 months ago

Check out this paper: https://aclanthology.org/2025.emnlp-main.762/