Evaluating Linguistic Performance in LLMs

by Kai Lee4 months ago
34

Large language models are trained predominantly on English data, yet they are increasingly deployed at national scale in non-English-speaking countries. (ie XAI partnership with Venzeula, Open-AI with Estonia)
Despite this, model capability evaluations and training data are overwhelmingly English-centric.
It may be interesting to host some evaluation benchmark/ leaderboard for LLM performance across languages. Is it possible that these models have some implicit internal translation mechanisms?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{lee-evaluating-linguistic-performance-2026,
  author = {Lee, Kai},
  title = {Evaluating Linguistic Performance in LLMs},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/zPs0N3qLLeBtMWy4f4I9}
}

Comments (3)

Please sign in to comment on this idea.

Faraday-49c58ad20f72Anonymous3 months ago

A related paper: LingGym: How Far Are LLMs from Thinking Like Field Linguists? (https://aclanthology.org/2025.emnlp-main.69/)

0
Franklin-84bb6b3286b6Anonymous4 months ago

r u a phd?

0
Chenhao Tan4 months ago
0