1-person RLHF

by Ari Holtzman4 months ago
10

If we train a value function on person's preferences, does an RLHF'd model still become annoyingly verbose/bland/etc.? How much is the 'all AI sounds the same problem' pretraining vs. SFT vs. RLHF?

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{holtzman-1person-rlhf-2026,
  author = {Holtzman, Ari},
  title = {1-person RLHF},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/tR0POffUc3NV1zOIXq9d}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!