2 Comments
User's avatar
RC's avatar

Makes me wonder if anyone's tried to estimate the proportion of contributions to withdrawals that would make a significant difference to a given model's outputs. I'm pretty sure I'm not the first to think of doing this on a small scale (locally) and make some extrapolations from the results. Seems a fun personal project to satisfy my curiosity, dunno if it has any real-world value😅

Expand full comment
Nick Vincent's avatar

Absolutely -- I think this kind of experiment is going to be critical to upcoming content deals, and ultimately should inform the new set of rules about content/info/knowledge use in the "post-web scale AI world".

One thing that makes me hopeful here is that these kind of experiments are also interesting from a safety / explainability / interpretability lens -- so I think a lot of these experiments will happen (and the estimation methods will get better with more technical work). One probably-not-so-great outcome is if all the major AI operators have really accurate estimates of data impact but none of the creators / public do, but this is where academic research incentives (publishing, etc.) can play a big role. And I do think public bodies have a role to play here in doing some "public valuation" as well (more on this to come...)

Expand full comment