Detecting Finetuned Models

by Freda Shi2 months ago

0

Suppose we have a model A trained on dataset D, and the following two models with exactly the same architecture as A:

Model B (finetuning): obtained by finetuning A on dataset E.
Model C (training from scratch): obtained by training a randomly initialized model (with a different random seed from A) from scratch on D + E.

Research Questions:

Can we reliably tell which one is obtained through finetuning?
If so, what metrics (e.g., metrics examining one or more in model internals, probability distribution, and decoding results) are the most prominent signals?

Watermarking Language Models Finetuning

Chat

If you are inspired by this idea, you can reach out to the authors for collaboration or cite it:

@misc{shi-detecting-finetuned-models-2026,
  author = {Shi, Freda},
  title = {Detecting Finetuned Models},
  year = {2026},
  url = {https://hypogenic.ai/ideahub/idea/CbE22wwXlhAgrnij7ylI}
}

Comments (0)

Please sign in to comment on this idea.

No comments yet. Be the first to share your thoughts!