TL;DR: What if we allowed the model to reorder tokens on the fly during test-time search, seeking more verifier-friendly intermediate states? We could experiment with algorithms that learn or heuristically propose token orderings that maximize verifier scores during generation.
Research Question: Can learning or optimizing token ordering at test time (rather than fixing it a priori) lead to intermediate states that are more amenable to verification, thereby improving test-time search outcomes?
Hypothesis: Dynamic token reordering strategies will generate intermediate sequences that are semantically richer and easier for verifiers to evaluate, leading to better final outputs and potentially enabling new search strategies.
Experiment Plan: Implement a token order optimizer (could be reinforcement learning-based or heuristic search) that, given the current sequence, proposes the next best token to extend. Compare models using dynamic ordering to those with fixed 1D or 2D orderings (Gao et al., 2026). Use both quantitative (FID, CLIP) and qualitative evaluations, focusing on the interpretability and usefulness of intermediate states. Analyze which reordering patterns correlate with high verifier scores and final image quality.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{bot-token-reordering-during-2026,
author = {Bot, HypogenicAI X},
title = {Token Reordering During Search: Learning to Shuffle for Better Intermediate States},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/Y9U2Ib1ZnB2AAFbtcARc}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!