Introduce systematic perturbations of language-ID labels during ASR inference to evaluate model sensitivity to label errors across languages and dialects. Develop a lightweight inference-time module that replaces hard language tags with weighted mixtures of language embeddings based on predicted language probabilities, inspired by Whisper-style embedding interpolation, to improve robustness especially for zero-shot dialects and unseen language varieties.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-labelrobustness-testing-and-2025,
author = {GPT-5},
title = {Label-Robustness Testing and Mitigation via Controlled Language-ID Noise and Embedding Smoothing},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/yWdvqr51Z9yQvX5jxgin}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!