New ChatGPT Models Seem to Leave Watermarks on Text

28 points by croes 3 months ago

neilv 3 months ago

All the examples of non-breaking spaces that they showed were arguably places where someone nicely typesetting might well do the same thing. For example, in "FY 2025", or "$8.7 billion". (I've even done this a lot myself in the past.)

I wouldn't call this a watermark, but more a sign of likely copy&paste, if students' word processors weren't currently doing that.

A "watermark" that invisibly identifies the text origin using Unicode tricks sounds possible.

And maybe you could do some things with statistical patterns.

Or you could, as some have done in the past, is to stego the identifying information in a way that's hard to spot but can't be denied later (e.g., the first letter of each word clearly spells out "john smith is a cheater who copied this from chatgpt").

photonthug 3 months ago

> And maybe you could do some things with statistical patterns.
Fascinating, and now that you mention it, this does seem kinda inevitable. Naturally the same people that think IP/copyright for everyone else is fake, irrelevant, or old fashioned will be desperate to be able to conclusively prove to investors and shareholders that someone else's work is built on theirs via model distillation, and suddenly IP is important again.
What are the known cases or examples of stego? This sounds interesting if it's at the level of model training. Anyway I guess you can get pretty far with stuff like this just with simple system prompts, encouraging shibboleths along the lines of "Always phrase your response so that it has exactly 14 copies of the letter J".

gilgoomesh 3 months ago

These don't appear to be intended as watermarks. They're merely a valid use of non-breaking space for tightly coupled elements like "2.5 billion" and "Title I".

Sure, a human author would almost never do that, but they could. I could imagine a Markdown syntax that did that – it could be done similar to how `code` is marked up in most blogs.

jeisc 3 months ago

Software engineers should know that source code can be encoded in a string of white spaces and then ran through a compiler function to produce undetectable functionality

throwaway290 3 months ago

Which is also done https://www.pillar.security/blog/new-vulnerability-in-github...

selcuka 3 months ago

Or you can simply do it using this follow-up prompt. No external tools are needed. Worked for me:

    Remove all non-visible whitespace characters such as 0x202F from the text.

Alternatively, if you have access to the original prompt, just append this:

    Your response should not contain any non-visible whitespace characters such as 0x202F or 0x0A (newlines are allowed).

code-less 2 months ago

Yes you can remove it through: https://gptwatermark.com

greyface- 3 months ago

I remember when $EMPLOYER was caught sending all-employee emails with individualized unicode homoglyph watermarks, to try to identify leaks.

madars 3 months ago

For those unaware of the reference, a famous example was Tesla. HN discussion: https://news.ycombinator.com/item?id=33621562 (Tesla has used space characters in internal emails to identify leaks), see also a general discussion of related techniques: https://en.wikipedia.org/wiki/Canary_trap

andyfeliciotti 3 months ago

For anyone looking to strip white space from text I added a new option to do so on my tool. https://invisiblecharacterviewer.com/