Right, that success story is only because there was "organic" (for lack of a bet...

bigthymer · 2026-03-03T15:04:36 1772550276

> Can the source database for an LLM be one-way, in that it does not contain output from itself, or other LLMs?

I think, for public internet data, we can only be reasonably confident for information before the big release of ChatGPT.

nsvd2 · 2026-03-03T18:20:48 1772562048

Yes, people have likened pre-LLM Internet content to low-background steel.

If in the hypothetical future the continual learning problem gets solved, the AI could just learn from the real world instead of publications and retain that data.

nprateem · 2026-03-03T20:54:50 1772571290

One reason why Google made that algorithm to watermark AI output

black_puppydog · 2026-03-03T14:46:17 1772549177

That's exactly why text written before the first LLMs has a premium on it these days. So no, all major models suffer from slop in their training data.