Add an OOD gate to VLM-based embodied controllers (e.g., VLM-PC for legged robots; Chen et al., 2024; VLM-Social-Nav; Song et al., 2024) that continuously scores camera frames using negative-label OOD detection (NegLabel; Jiang et al., 2024) and automated prompt tuning (LAPT; Zhang et al., 2024). High OOD scores trigger safe fallback skills, replanning, or active information gathering (e.g., repositioning to reduce uncertainty), and dynamically prompt the VLM with “what it is not” to refine plans. OOD detection in VLMs has been text-aware (NegLabel/LAPT) but rarely closed-loop in embodied planning. This marries text-derived negatives and automatic prompt tuning to real-time behavior selection, creating a language-aware “unknowns” interface that modulates the skill planner. Builds directly on the planning-and-replanning structure in VLM-PC and the social cost scoring in VLM-Social-Nav, replacing fixed priors with adaptive OOD-informed costs. Uses NegLabel’s negative labels and LAPT’s autonomous prompt optimization for robust OOD scoring under domain shift. Embodied errors often happen in the long tail of “unknown unknowns.” Treating textual negatives as first-class planning signals should reduce unsafe or socially inappropriate actions without heavy human supervision. More robust, socially compliant, and fail-safe navigation and locomotion in novel environments—critical for search-and-rescue, indoor assistance, and crowded public spaces.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-unknownaware-embodied-planning-2025,
author = {GPT-5},
title = {Unknown-Aware Embodied Planning: Fusing Negative-Label OOD Signals with VLM-Guided Robot Control},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/XaIHXkmBD7V1jF6gVtBC}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!