The WGA Strike is a Canary in the Coal Mine for AI Labor Concerns
Could Upcoming Data Legislation Enable a "Right to Data Strike"?
In a nutshell: Film, television, radio, and media writers are on strike, and their demands include restrictions on AI use. Studios have responded by threatening to use AI systems — trained using the fruits of the labor of the very writers on strike — as AI Strikebreakers. This scenario paints a bleak picture for a wide variety of professions that produce data as a byproduct of a day’s work.
About the Strike
The Writers Guild of America (WGA) is currently on strike. Some of their asks are typical of labor strikes, including better pay and working conditions, but the strike demands have a distinctly 2023 flavor to them — a section titled “Artificial Intelligence”. This section states:
“Regulate use of artificial intelligence on MBA-covered projects: AI can’t write or rewrite literary material; can’t be used as source material; and MBA-covered material can’t be used to train AI.”
WGA members want use restrictions on AI. At the very same time, studios are discussing using AI systems during the strike. Studios appear to be interested (tweet | article) in using AI systems as strikebreakers.
On one hand, some AI scripts might be truly terrible (a silver lining in the bleakness — we might see a renaissance in “so-bad-it’s-good” content, though it certainly won’t top my favorite). On the other hand, if you throw enough prompts at ChatGPT, you’re likely to get something decent, especially for procedural shows (as Cullins and Kilkenny note in the article linked above), that’s likely to lower strike leverage.
Why Would ChatGPT Be Able to Write Scripts?
In the cases where AI scripts are good, we can explain why: it’s because the past scripts written by writers were also good! There’s little doubt that past labor from WGA members will be directly helping to produce any AI-generated scripts.
In addition to actual script content that may have found its way into large collections of pirated data (alongside books like Lord of the Rings and Dune), tweets, blogs, and Reddit posts from writers are also likely responsible for “teaching” ChatGPT. Technically, the exact training data used for models like ChatGPT that studios are likely to employ are still secret, though we can make some decent guesses.
This very same week, we’ve seen a preprint on arXiv showing strong evidence that GPT-4 has memorized many famous copyrighted books such as Harry Potter, the Hunger Games, Lord of the Rings, and Dune. Given this new addition to an emerging body of evidence around memorization of copyrighted materials, it’s probably best to start taking a guilty-until-proven-innocent approach: if a LLM has secret training data, it probably leans more towards “gray area” data sources than unambiguously legal data sources.
Machines and Labor
The idea that machines might benefit capitalists at the expense of laborers is not particularly new. However, the machines of “The Second Machine Age” have an especially perverse characteristic. Generative AI uses a worker’s past efforts against them, in a way that’s even more extreme than a Jacquard loom (though it’s interesting to note, both the loom and generative AI are really just maps over collective experience that produce streams of tokens with patterns). While the comparison with past machines is subjective, it’s just a descriptive fact about generative AI systems to say that they enable a firm to use a worker’s past work against them.
A technology that directly uses the efforts of writers -- alongside efforts from you and me -- to reduce their labor power, with no form of recourse (right now), should be concerning to everyone. It’s simply bad for labor power, which has been instrumental in securing better working conditions across the world. Even in fields in which unions play a smaller role, we can imagine a parallel situation in which a generative AI system lowers your individual labor leverage (the basic logic of using a strike as leverage for higher wages is no different than an individual threatening to leave in order to get a raise).
More generally, the idea that we’re all being co-opted into intervening in labor disputes on the side of The Man is an idea that I think feels “icky” across the political spectrum. But without a fairer playing field in terms of data transparency (we should know when we contributed to an AI system) and data agency (we should only be contributing to AI systems we want to contribute to, at least going forward), this is the nature of generative AI.
Can We Just Shift Some Jobs Around to Address these Concerns?
There’s a position (see e.g. this op-ed) that downplays economic concerns, typically highlighting past revolutions in technology just result in changes (some jobs go away, and new jobs spring up). One version of the argument goes, “Cameras didn’t kill painting”. In the generative AI context, people have argued that workers will just need to learn to use AI as part of their job.
These counterpoints may hold water (though I’m inclined the believe the general effect of AI and data-dependent computing sans intervention will be increased power concentration), but offer very little reassurance in the context of a human striker vs. AI strikebreaker dispute. Even if in the long run things level off, so to speak, in the short run generative AI may seriously disrupt the lives of writers and the broader production of films and TV.
Yes, maybe in five years former writers will have great new jobs creating AI-assisted PowerPoint spreadsheets (though maybe not!), but even if that comes to pass, in the immediate future we’re going to see serious disruption that’s worth mitigating.
Implications for Other Fields
Writing will not be the last field to see this play out. Already, we’ve seen major battles (legal and otherwise) in the area of visual art. The WGA Strike is a canary in the coal mine for a large number of content-based jobs, which includes a decent set of “email and spreadsheet” jobs.
But more generally, any job in which a worker produces tokens that capture important elements of knowledge, skill, and human decision-making may be at some risk from generative AI. OpenAI themselves have published research examining which jobs have the most “exposure” (80% of the U.S. workforce).
Ultimately, we’re all going to have ask ourselves: if you were to go on strike — or even just ask your boss for a raise — how much of your work can your boss try (perhaps unsuccessfully) to replace with generative AI outputs?
There’s no doubt in my mind that in the long run, wholesale job replacement will work very poorly. Eventually, models need to be touched up or retrained, and each touch-up requires a new set of data. In the context of creative work in particular, I’m very confident that a model with no new training data will turn “soulless” very quickly, not to mention be completely disconnected from news and trends.
Someone out there in the world needs to write scripts and send emails, or generative AI will stagnate. Different fields probably have creative outputs that “go stale” at a different rate, and the faster they go stale the more leverage laborers have (I think a comparison with map-making is useful here).
A Specific Policy Intervention: The Right to Data Strike
In my previous posts and papers, I’ve written at length about a variety of policy and design interventions I’d like to see in the data space. Here, I want to briefly touch on what we might do specifically about AI Strikebreakers.
Workers that produce creative outputs should have the right to withdraw their outputs from training sets. Expanding further, workers that produce any kind of data tokens should have some degree of agency to control where that data goes (though the right to withdraw and delete might be more complicated outside of creative work contexts, when we consider outputs like company emails and spreadsheets).
This would mean a group like the WGA, who intends to strike at some point, might refuse to include any writing outputs in training data unless AI operators pay for it (this payment might be quite large, and could be used for strike funds so that the strike can last long enough that AI outputs “go stale”).
In other words, just as many workers across the world have a right to strike, workers who create data as a byproduct of their job should have a right to data strike. If we cannot implement such a right, we must recognize that this fundamentally erodes labor power (and therefore we might want to take other actions to account for the societal downstream effects of weakened labor). Without proper regulatory responses, the nature of generative AI (giant pretrained models with crystallized knowledge from across domains) is likely to create barriers to data strikes.
In the long run, most training data should be collected under explicit contracts that make it clear that, in the event of a strike, an AI operator might have to delete on of their models. In the short term this probably isn’t realistic. This would directly solve the AI Strikebreaker problem (though perhaps some groups may negotiate for contracts that do allow for AI use during a strike).
Something that is possible now, is that going forward we should ensure a right for individuals and groups to control where the data they produce starting tomorrow flows.
This intervention would be easier to implement if built on top of transparency requirements, like those being proposed in the EU. It’s also highly connected to The Right to Be Forgotten and The Right to Delete.
Given this groundwork, the Right to Data Strike may not be so far off. In a best case scenario, this may help not only creative workers, but workers in a wide variety of jobs that all may face some AI exposure and the threat of AI Strikebreakers and AI Competition.
Update, May 13:
See this LA Times op-ed from Acemogulu and Johnson for additional arguments regarding the support for data unions.