This is like trying to apply "labor theory of value" to datasets. It doesn't wor...

This is like trying to apply "labor theory of value" to datasets. It doesn't work any better there than it does in economics in general.

It doesn't matter how many human hours went into making a Twitter shitpost. What matters is: how much value does it add to pre-training run, and how easy is it to substitute it for another data source.

"Cheap data" has low training value and is easy to replace. Twitter shitposts are worthless except in aggregate. "Expensive data" is what has high training value and is hard to replace. Things like SFT traces, domain expert RLHF guidance, RLVR bits - that's what the "moat" is.