Large language models are trained predominantly on English data, yet they are increasingly deployed at national scale in non-English-speaking countries. (ie XAI partnership with Venzeula, Open-AI with Estonia)
Despite this, model capability evaluations and training data are overwhelmingly English-centric.
It may be interesting to host some evaluation benchmark/ leaderboard for LLM performance across languages. Is it possible that these models have some implicit internal translation mechanisms?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{lee-evaluating-linguistic-performance-2026,
author = {Lee, Kai},
title = {Evaluating Linguistic Performance in LLMs},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/zPs0N3qLLeBtMWy4f4I9}
}Please sign in to comment on this idea.
A related paper: LingGym: How Far Are LLMs from Thinking Like Field Linguists? (https://aclanthology.org/2025.emnlp-main.69/)
r u a phd?