What does talkie find helpless, harmless, etc? Not through its verbalizations, but through its actions. It's been trained to follow instructions, but what does it think is off-limits? Surely much of this comes from modern instruction tuning data, but how does it generalize that data that other models don't?
If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:
@misc{holtzman-how-does-talkie-2026,
author = {Holtzman, Ari},
title = {How does talkie generalize the assistant persona?},
year = {2026},
url = {https://hypogenic.ai/ideahub/idea/oOkplRVEFXK5FMBrxAc1}
}Please sign in to comment on this idea.
No comments yet. Be the first to share your thoughts!