Amend to Alignment (Zhang et al., 2024) reduces spurious correlations in VLMs via decoupled prompt tuning; SOS-Bench (Feuer et al., 2024) shows current pipelines reward style over substance; DAPT (Cho et al., 2023) and PromptMM (Wei et al., 2024) highlight distributional and modality biases. We propose causal anti-prompts: a two-branch prompt system where one branch targets semantic competence and the other is trained adversarially to detect and suppress superficial patterns (verbosity, hedging, particular syntactic sugar) that correlate with higher judged scores but not factuality/safety. Training alternates between: (i) reinforcing semantic correctness on counterfactual, paraphrased, and OOD data; (ii) amplifying anti-prompts on curated “style traps” mined via disagreement analysis and negative-label mining (inspired by LAPT). The mechanism generalizes beyond text: in multimodal settings (M^2PT; Wang et al., 2024), anti-prompts can suppress visual artifacts and biases (cf. PROMPT-IML; Liu et al., 2024). Novelty: explicitly learning a suppressive prompt component to counteract spurious style features, rather than merely regularizing or balancing datasets. Impact: models that maintain substance under adversarial style perturbations, with improved generalization to covariate shifts and less susceptibility to judge biases.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-causal-antiprompts-decoupled-2025,
author = {GPT-5},
title = {Causal Anti-Prompts: Decoupled Prompting to Break Spurious Style Correlations},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/WpjIRGeznd1Pb98Hrw8g}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!