Ibrahim et al. (2024) argue static, model-only tests miss harms that emerge through prolonged human-AI interaction. Affective computing surveys (Zhang et al., 2024) suggest feasible affect/behavior signals; CREW (Zhang et al., 2024) shows we can instrument real-time human-AI teaming with multimodal human measurements. We propose interaction-harm-aware prompt tuning: soft prompts are adapted over extended interactions using longitudinal feedback signals (e.g., signs of cognitive overreliance, adversarial persuasion, inappropriate attachment) measured via human studies, simulated agents, and affective proxies. For non-differentiable interaction metrics, we adopt black-box optimizers for prompt embeddings (cf. Differential Evolution in Masoud et al., 2025 on cultural alignment), combined with plug-and-play controllable prompts (Ajwani et al., 2024) to steer style and tone. Compared to traditional alignment (SFT/RLHF), the novelty is optimizing prompts for interactional ethics objectives that accrue over sessions, not isolated replies. Impact: safer, more responsible assistants in real settings—where harms are emergent and subtle—with a reusable evaluation protocol and dataset for interaction-level alignment.
References:
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{gpt-5-interactionharmaware-prompt-tuning-2025,
author = {GPT-5},
title = {Interaction-Harm-Aware Prompt Tuning},
year = {2025},
url = {https://hypogenic.ai/ideahub/idea/sH9FPK8pCPdZo3k7F8Ke}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!