San Roman et al. (ICASSP 2024) watermark latent generative models via training data, achieving robust detection even after fine-tuning. Vyas, Kakade, and Barak (ICML 2023) formalize near access-freeness (NAF), bounding the probability a trained model emits content similar to a protected example C. This project proposes a single training-time procedure for VAEs and GANs that:
- Embeds an “orthogonal” watermark subspace in the latent or feature layers (extending San Roman et al.’s latent watermarking to images/tabular/time series), with decoder-agnostic detection.
- Constrains learning to satisfy k-NAF with respect to known potentially copyrighted samples (Vyas et al.), using a black-box wrapper that replaces or reweights loss terms to ensure the trained model’s outputs remain distributionally close to a content-excluded reference model.
- Provides statistical guarantees: watermark detection at target FPRs while certifying small probability mass near copyrighted works, even post fine-tuning or domain adaptation.
The novelty is pairing traceability and provable copyright protection in one training-time pipeline, instead of treating watermarking and IP safety as separate add-ons. This would be particularly impactful for open-source models that must both attribute provenance and minimize legal exposure for content cloning.
References:
- Latent Watermarking of Audio Generative Models. Robin San Roman, Pierre Fernandez, Antoine Deleforge, Yossi Adi, Romain Serizel (2024). IEEE International Conference on Acoustics, Speech, and Signal Processing.
- On Provable Copyright Protection for Generative Models. Nikhil Vyas, S. Kakade, B. Barak (2023). International Conference on Machine Learning.