All the examples of non-breaking spaces that they showed were arguably places where someone nicely typesetting might well do the same thing. For example, in "FY 2025", or "$8.7 billion". (I've even done this a lot myself in the past.)
I wouldn't call this a watermark, but more a sign of likely copy&paste, if students' word processors weren't currently doing that.
A "watermark" that invisibly identifies the text origin using Unicode tricks sounds possible.
And maybe you could do some things with statistical patterns.
Or you could, as some have done in the past, is to stego the identifying information in a way that's hard to spot but can't be denied later (e.g., the first letter of each word clearly spells out "john smith is a cheater who copied this from chatgpt").
> And maybe you could do some things with statistical patterns.
Fascinating, and now that you mention it, this does seem kinda inevitable. Naturally the same people that think IP/copyright for everyone else is fake, irrelevant, or old fashioned will be desperate to be able to conclusively prove to investors and shareholders that someone else's work is built on theirs via model distillation, and suddenly IP is important again.
What are the known cases or examples of stego? This sounds interesting if it's at the level of model training. Anyway I guess you can get pretty far with stuff like this just with simple system prompts, encouraging shibboleths along the lines of "Always phrase your response so that it has exactly 14 copies of the letter J".
These don't appear to be intended as watermarks. They're merely a valid use of non-breaking space for tightly coupled elements like "2.5 billion" and "Title I".
Sure, a human author would almost never do that, but they could. I could imagine a Markdown syntax that did that – it could be done similar to how `code` is marked up in most blogs.
Software engineers should know that source code can be encoded in a string of white spaces and then ran through a compiler function to produce undetectable functionality
All the examples of non-breaking spaces that they showed were arguably places where someone nicely typesetting might well do the same thing. For example, in "FY 2025", or "$8.7 billion". (I've even done this a lot myself in the past.)
I wouldn't call this a watermark, but more a sign of likely copy&paste, if students' word processors weren't currently doing that.
A "watermark" that invisibly identifies the text origin using Unicode tricks sounds possible.
And maybe you could do some things with statistical patterns.
Or you could, as some have done in the past, is to stego the identifying information in a way that's hard to spot but can't be denied later (e.g., the first letter of each word clearly spells out "john smith is a cheater who copied this from chatgpt").
> And maybe you could do some things with statistical patterns.
Fascinating, and now that you mention it, this does seem kinda inevitable. Naturally the same people that think IP/copyright for everyone else is fake, irrelevant, or old fashioned will be desperate to be able to conclusively prove to investors and shareholders that someone else's work is built on theirs via model distillation, and suddenly IP is important again.
What are the known cases or examples of stego? This sounds interesting if it's at the level of model training. Anyway I guess you can get pretty far with stuff like this just with simple system prompts, encouraging shibboleths along the lines of "Always phrase your response so that it has exactly 14 copies of the letter J".
These don't appear to be intended as watermarks. They're merely a valid use of non-breaking space for tightly coupled elements like "2.5 billion" and "Title I".
Sure, a human author would almost never do that, but they could. I could imagine a Markdown syntax that did that – it could be done similar to how `code` is marked up in most blogs.
Software engineers should know that source code can be encoded in a string of white spaces and then ran through a compiler function to produce undetectable functionality
Which is also done https://www.pillar.security/blog/new-vulnerability-in-github...
Or you can simply do it using this follow-up prompt. No external tools are needed. Worked for me:
Alternatively, if you have access to the original prompt, just append this:I remember when $EMPLOYER was caught sending all-employee emails with individualized unicode homoglyph watermarks, to try to identify leaks.
For those unaware of the reference, a famous example was Tesla. HN discussion: https://news.ycombinator.com/item?id=33621562 (Tesla has used space characters in internal emails to identify leaks), see also a general discussion of related techniques: https://en.wikipedia.org/wiki/Canary_trap