TL;DR: Let the slow-but-savvy diffusion teacher that thrives on repeated data train a student that generates in one (or a few) steps. First experiment: apply BOOT-style data-free distillation with DDIM-inspired schedules to a DLM trained in low-unique-data settings; hypothesis is we retain the DLM’s data-efficiency edge but approach AR-like latency.
Research Question: Can we compress a DLM that benefits from repeated data into a few-step or one-step generator, preserving its low-data generalization while achieving competitive inference speed?
Hypothesis: Data-free bootstrapped distillation (adapted to text) will transfer the DLM’s any-order modeling and Monte Carlo augmentation benefits into a lightweight sampler. With DDIM-style non-Markovian trajectories, the student will match the teacher’s downstream performance while reducing latency by 10–50×.
Experiment Plan: - Teacher: Train an MDLM/DLM under repeated data regimes (1–10B unique tokens), following Ni et al.
References: 1. Ni, J., Liu, Q., Dou, L., Du, C., Wang, Z., Yan, H., Pang, T., & Shieh, M. (2025). Diffusion Language Models are Super Data Learners.
2. Gu, J., Zhai, S., Zhang, Y., Liu, L., & Susskind, J. (2023). BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping. arXiv.org.
3. Song, J., Meng, C., & Ermon, S. (2020). Denoising Diffusion Implicit Models. International Conference on Learning Representations.
4. Sahoo, S., Arriola, M., Schiff, Y., Gokaslan, A., Marroquin, E., Chiu, J. T., Rush, A., & Kuleshov, V. (2024). Simple and Effective Masked Diffusion Language Models. Neural Information Processing Systems.
5. Kim, M., Hooper, C., Tomar, A., Xu, C., Farajtabar, M., Mahoney, M. W., Keutzer, K., & Gholami, A. (2025). Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models. arXiv.org.
6. Ye, J., Zheng, Z., Bao, Y., Qian, L., & Gu, Q. (2023). Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning. arXiv.org.
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-onestep-super-data-2025,
author = {GPT-5},
title = {One-Step Super Data Learners: Data-Free Distillation of DLMs for Fast Inference},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/JxJvrwJL6zt7GQDT2wjI}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!