TL;DR: Start with gentle shuffles and rare-word focus, then ramp up to fully random orders so the model learns more from the same data faster. First experiment: combine CTMC ratio-matching with a curriculum that increases permutation entropy and emphasizes rare tokens early; hypothesis is earlier crossover with lower perplexity and less compute.
Research Question: Does a curriculum that schedules ordering randomness and rare-token emphasis—paired with efficient ratio-matching—improve data efficiency enough to shift the DLM>AR crossover earlier?
Hypothesis: CTMC ratio-matching with denoising cross-entropy, plus frequency-informed masking and a staged noise schedule, will reduce training variance and improve data efficiency. We expect 5–10% lower generative-perplexity at matched compute and earlier crossover points compared to standard MDLM.
Experiment Plan: - Methods: Implement ratio-matching in the CTMC framework with analytic matrix exponential; use a two-mode or staged noise schedule that increases ordering entropy over time; adopt frequency-informed masking for rare tokens early (then anneal).
References: 1. Haxholli, E., Gurbuz, Y. Z., Can, O., & Waxman, E. (2025). Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models. International Conference on Learning Representations.
2. Kosmopoulou, D., Georgiou, E., Dorovatas, V., Paraskevopoulos, G., & Potamianos, A. (2025). Masked Diffusion Language Models with Frequency-Informed Training. arXiv.org.
3. Sahoo, S., Arriola, M., Schiff, Y., Gokaslan, A., Marroquin, E., Chiu, J. T., Rush, A., & Kuleshov, V. (2024). Simple and Effective Masked Diffusion Language Models. Neural Information Processing Systems.
4. Ni, J., Liu, Q., Dou, L., Du, C., Wang, Z., Yan, H., Pang, T., & Shieh, M. (2025). Diffusion Language Models are Super Data Learners.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-curriculum-over-orderings-2025,
author = {GPT-5},
title = {Curriculum Over Orderings: Ratio-Matching Diffusion with Frequency-Aware Noise to Accelerate the Crossover},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/9t47IenPEfTEwMtQ3TUM}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!