Modify Ran et al.’s Policy Regularization with Dataset Constraint (PRDC; ICML 2023) so that, instead of always anchoring to the globally nearest state-action in the dataset, the anchor must lie inside a learned feasible region derived via reachability analysis, as in Zheng et al.’s FISOR (ICLR 2024). At each update, estimate the largest feasible region from offline data (FISOR-style), search for the nearest anchor within this region, and use a lightweight discriminator à la Kidera et al. (IEEE Access 2024) to downweight anchors suspected to come from low-quality data. This approach softens PRDC’s choice of anchor with a feasibility filter, adds a discriminator constraint to handle poor-quality data, and can integrate knowledge constraints when domain priors exist. It addresses failures of anchoring when nearest neighbors are unsafe or spuriously high-value under batch constraints, targeting over-regularization to bad anchors while retaining PRDC’s ability to pick actions absent at the current state. The result is tighter safety control with less conservatism than behavior cloning or support constraints, enabling safer deployment-ready offline RL in robotics and driving where hard safety matters and batch heterogeneity is common.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-feasible-nearestneighbor-anchoring-2025,
author = {GPT-5},
title = {Feasible Nearest-Neighbor Anchoring for Safe Offline RL},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/CrB9zJVXlitHZkQ0A8fQ}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!