Tipping Points for Content Ecosystems
Our AI design choices in 2024 could preclude "Powerful AI" in 2030.
This post was co-written by Nick Vincent, Jacob Thebault-Spieker, and Johanna Desprez.
This is an early draft — we welcome your feedback and are eager to build more coalitions around these ideas.
The choices and policy paradigms shaping the design of AI products may create “tipping points for content ecosystems” — analogous to tipping points in natural ecosystems — that preclude humankind from ever seeing the full benefits of AI, including both near-term benefits and the realization of AI abundance promoted by figures in the tech industry.
A photo of thawing permafrost. From Boris Radosavljevic (posted to flickr, hosted on Wikimedia Commons) licensed under CC BY 2.0.
Somewhat recently, Dario Amodei, the CEO of Anthropic, wrote a rather extensive blog post highlighting ways that AI — in particular, what he calls powerful AI and what others might call Artificial General Intelligence (this is the term Sam Altman of OpenAI uses in his “three observations” blog post with a similar overall perspective) — can make the world better. Amodei’s post naturally saw a lot of traction within typical AI circles, and was also covered in the tech media (see e.g. Robinson’s coverage of the fund-raising angle in The Verge and discussion on the podcast Hard Fork). Most of the essay hinges on a key preliminary assumption: powerful AI will come to exist.
Discussions about AI progress often center “technical” questions about modeling data and the construction of AI systems. The key questions in these discussions tend to be:
Are We (humanity) allocating enough people and resources towards the “right” modeling approaches? For instance: should we spend more resources on deep learning or other approaches; if we use deep learning, which architectures should we use; what other disciplines, such as cognitive science, should we look towards?
What industrial and engineering challenges need to be overcome to enable a given modeling approach? For instance: how should we design data centres; how can we improve chip manufacturing; how can we handle energy demands of new AI systems?
These discussions, which are technocratic in nature, really only involve a relatively small set of actors, i.e. the tech industry and those who self-identify as AI researchers. Answering these questions certainly matters for determining whether or not we build powerful AI.
However, there’s an entirely separate debate to be had about the production and sustenance of data flow underlying AI. Data is, in one sense, more upstream on the mountain (see e.g., Delacroix on data rivers) than the technocratic modelling and engineering discussions. After all, if our sensor technologies or record keeping technologies were to falter, all the modelling in the world would bear little fruit. The most elegant and cathedral-like data centre we humans might build would simply act as a vessel for noise.
Here, we want to argue that it is very possible that we will fail to build “powerful AI” or “AGI” because we cause one or more Tipping Points for Content Ecosystems.
The Setup
To introduce this concept, let us consider a hypothetical future in which we keep building new foundation models and to track their progress we invent a new index of AI Utility that aggregates existing benchmarks with some new set of more dynamic benchmarks. We’ll hand wave here and assume a number of open challenges with benchmarks are solved; the key idea is let’s assume we can cleanly compare how much utility humanity gets out of AI in 2024 and 2030.
For simplicity, let’s discuss coding and science-focused AI systems (much of Amodei’s powerful AI blog post centers on AI’s contribution to scientific progress). Imagine a model trained in January 2025 using the full historical records from Stack Exchange, Wikipedia, and arXiv as pre-training data, alongside fine-tuning data from a set of, say, 10,000 contractors paid for “preference data”. Let’s say that this model is able to achieve 4 “units” of intelligence — we might call it a “2024 Level 4 AI”. After an exhaustive set of data valuation experiments and exploration of counterfactual data ablations, we identify the key pieces of content that are most responsible for those units of intelligence. Perhaps we can even train a very good model on a small set of “core” documents — the 100,000 (to pick a clean number again) documents containing the pearls of wisdom in our pile of human-generated token sequences (perhaps leveraging so-called “synthetic data”).
This story, so far, gets us to “2024 Level 4 AI” (on the back of 100,000 volunteer-created documents and 10,000 contractor-created outputs). But, to unpack where the upstream data comes from, which will be needed to move beyond “2024 Level 4 AI”, we can think about some other clean numbers.
Below, we’re going to do some inline napkin math. The key goal here is to start to outline how sophisticated ecosystems-style modeling could help avoid tipping points in AI. (One planned follow-up for this post is to create an entry in the “Data Napkin Math Project” that makes the below scenario into an interactive explorable).
Let’s say we need to get 100 people to contribute something to an imaginary platform like Stack Exchange — let’s call it “StackipediaXiv” — to get one especially good document that we can include in our high quality pre-training set. Furthermore, based on Stack Exchange’s real ratio of pageviews to posts (roughly 800M to 8M) and the old “1% rule” of Internet communities, we need to get 100 people to visit StackipediaXiv to get 1 person to make a contribution.
That’s 100 visitors for every “document” (i.e., contribution, post), and 100 contributors for every “especially good” document, which gives us a conversion rate of sorts: to get 1 “especially good” document per month, we need 10,000 monthly visits (we’ll ignore the distinction between individual visits and monthly active users for now and just focus on visits). For reference, the actual activity of Stack Overflow, the programming-specific section of Stack Exchange, in 2022 was 60k posts per week (see work on the topic here), or about 240k posts per month (let’s assume this came from 240k * 100 = 24M visits, although actual unique visitor estimates are higher).
With these clean numbers, over a 10 month period, if our StackipediaXiv had 10M visits a month (so 100M total visits across the whole period and 1M total contributions), we could get a nice 10k document dataset. If each document is around 600 tokens (let’s smooth over the the typical 1 token = 3/4 word and imagine each document is around the average length of a Wikipedia article, 692 words), this gets us 6M tokens. If we can 1000x our monthly visits (go from 10M to 100B — higher than Stack Overflow’s reported 257M monthly visits), we’d get 6B tokens, nearly the amount used for Microsoft’s Phi-1 and Phi 1.5.
(Summarizing the quick math above: we’re assuming 1 “high quality contribution” requires 10k visits and gives us 600 tokens, or 6 hundredths of a token per visit. To get 6B tokens under these assumptions, we have (6e9 / (6/100)) = 100B visits needed).
Ok, but Tipping Points?
The key argument we’re working towards is that the reliance on contributions in the open web creates the risk of a tipping point in AI. In natural ecosystems, tipping points are thresholds: points that, if crossed, lead to harmful outcomes and are difficult/impossible to come back from. The choices made in deploying and governing this “2024 Level 4 AI” may be a tipping point.
Naively deploying AI may cause reduced contribution on some parts of the open web and more AI-assisted action on other parts. Fewer people may visit sites like Stack Exchange (a trend that’s already seemingly begun — “…after the release of ChatGPT…posting activity decreased sharply, with the weekly average falling from around 60,000 posts to 40,000 within 6 months”), resulting in fewer contributors and fewer helpful posts. People may start submitting more AI-generated content to Wikipedia (this has also begun) and to arXiv (likewise), affecting the underlying training data for future models. In our above example, we needed to achieve extremely high monthly visit counts to our imaginary knowledge creation site to produce even a very “small” dataset. So what are we to do if monthly visits are actually decreasing rather than increasing?
Moreover, even short term increases in economic inequality – jobs are displaced, people experience dispossession of their intellectual property, and power is concentrated in organizations that operate AI – affect the content ecosystem too. People who once had decent jobs and a bit of time to participate in StackExchange or to fix maps on OpenStreetMap may start to feel more economic pressure, causing the amount of “self-directed knowledge contribution” in the world to go down.
Fast Forward to 2030
In 2030, the “Level 4 AI” we deployed in 2024 may have lead to (1) reduced human participation in peer production due to direct substitution (I visit Claude or ChatGPT instead of Stack Exchange), reduced human participation in peer production due to increased economic inequality that causes people to spend less overall time making self-directed knowledge contributions. and (3) an increase in the publication of AI content that reduces the signal to noise ratio of the web.
Now, maybe traffic to online knowledge production websites is halved and we have half the high quality tokens we needed to achieve some capabilities jump. Or, maybe it now takes 200 visitors for someone to become a contributor (a halving of the conversion rate). This toy example gives us a first pass at a kind of “ecological” model, against which we could simulate a much more robust set of experiments. However, in all of the example outcomes, we start to see our “2030 AI” being weaker than the “2024 Level 4” AI of yesteryear.
This is the opposite of recursive self-improvement. Instead, the tipping point of “2024 Level 4” AI creates feedback loops caused by AI product and policy choices. Does StackExchange continue to be the place people go for coding help, with some subset of them turning into active contributors, as Claude becomes increasingly useful for programmers? Do people turn to Wikipedia as a reliable resource, with some subset of them helping add knowledge, as Deep Research summarizes the web for them? Is ArXiv a trusted vehicle for novel research, as AI generated content proliferates on that platform? This is the crux of our argument: there may be no turning back from this tipping point and the belief that “powerful AI will come to exist” hinges on the product and policy choices of today.
Maybe clever new applications of reinforcement learning can help mitigate some content ecosystems issues, and maybe companies can get some equivalent data by paying people directly. But we must recognize that’s it is a serious possibility that we permanently block or seriously delay certain capabilities from arising because those capabilities were dependent on certain knowledge artifacts coming into being.
What do we do? Pie-in-the-sky and Practical Solutions
What would the opposite scenario entail? If, starting tomorrow, we could achieve global coordination on a number of economic development and policy related questions, what could we achieve? Tipping points are stopped or mitigated by finding a balance before we hit the point of no return. There’s often a natural ebb and flow before the tipping point is hit (warmer years might lead to some glaciers melting, but colder years help build that back up). So, some reduction in data contribution does not mean we’re completely doomed – but it does suggest we should start thinking about the positive vision here.
First, some kind of redistributive program that ensures AI profits are shared with the public at large could increase, on average, participation in self-directed knowledge sharing. Of course, if we wave a magic wand and send everybody a check tomorrow, many people will not suddenly become Wikipedia super-editors or dedicated programming tutors. Many might, however, engage in other forms of knowledge contribution (mapping their local areas in OpenStreetMap, reviewing products, answering questions online). And we do know that at least in some contexts, editing activity is going down now, so this is not entirely hypothetical.
The above magic wand scenario has some viability issues, as such scenarios often do (where does the money come from? At the moment, actual AI profits wouldn’t be enough to sustain something like this). Another avenue might involve using worker power, policy levers, or making explicit technology design choices that try to ensure future data creation jobs lean towards the more self-directed end of the spectrum.
The combination of worker bargaining power, policy, and design could lead to an outcome in which people do day-to-day tasks that look like peer production (they write text, verify claims, organize references, answer questions) but as part of a well-paid, reasonably empowered job (emphatically not a gig work dystopia). Perhaps we can imagine an AI future in which much of the population is effectively employed to maintain knowledge ecosystems.
Beyond just generally improving wealth and standard of living to allow self-directed knowledge contribution, we might make targeted product decisions to make contribution easier and more likely (e.g., echoing the now “classic” issues with search engines providing attribution to Wikipedia). Just moving toward AI products that promote “prosocial behavior” (acknowledge and maybe even help clean up the knowledge commons you draw on, avoid using the products to lower the SNR of the web, etc.) could be enough to avoid these content tipping points.
Ultimately, while this argument involves some speculation and some of the empirical trends about online activity observed so far could reverse (or become less relevant because of other unexpected developments — new training paradigms emerge, new platforms emerge, etc.), we believe it’s an important perspective to include in AI discussions, even those focused on the immense benefits of powerful AI / AGI.
Great article! I noticed one jump that you made that I wanted to clarify though, there seems to be an assumption that human-written wp/arxiv/se article is better than AI-generated. Making a case for this more explicitly might help, as the kind of data production you have mentioned in the writing does not seem so far off from what systems like Deep Research could do, and if this is 2030, we might add robots doing the in-person data collection tasks. In other words, what is the fundamental advantage of human data over machine data?
I know you have an answer to this, just curious what it is!