In recent years, models like Google's BERT, OpenAI's GPT-3, and others have achieved impressive performance gains in language tasks through scaling up to hundreds of billions of parameters trained on massive text datasets. However, the authors argue the environmental, financial, and social costs of this approach outweigh the benefits, and more focus should go towards better understanding models rather than simply making them bigger.
On the environmental front, training these models requires prohibitive amounts of computing power, racking up massive carbon footprints. This compounds inequality when the benefits accrue mainly to wealthy nations but the environmental consequences are borne globally. The financial costs of training also centralize progress in a few well-resourced labs.
The authors also highlight problems with training data. Web-scale datasets amplify dominant viewpoints and encode harmful biases against marginalized groups. Attempting to filter out toxic content is insufficient and risks suppressing minority voices. More investment is needed in thoughtful data curation versus simply amassing unfathomable quantities.
Additionally, while larger models post impressive scores on NLP leaderboards, they don't actually perform true language understanding. Their inner workings remain opaque and they succeed by picking up on spurious statistical cues. This risks misdirecting research efforts away from real progress on AI interpretability and accountability.
When deployed, huge models can generate remarkably fluent but meaningless and incoherent text. The authors liken them to "stochastic parrots" given their tendency to amplify toxic patterns in training data. The term refers to how these models randomly stitch together linguistic patterns they have observed, without any grounding in meaning or intent. If people interpret their outputs as credible despite lack of grounding, it enables spreading misinformation and abuse.
Given these downsides, the authors advocate rethinking the goal of ever-larger models. They recommend prioritizing energy efficiency, curating training data carefully, engaging stakeholders to shape ethical systems, and exploring alternative research directions not dependent on unfathomable data quantities.
While large models can sometimes benefit applications like speech recognition, risks need balancing with harm mitigation measures like watermarking their outputs. Overall, the paper compellingly argues that continuing blindly on the path of scaling up carries severe risks that require urgent attention.
This paper became controversial when some authors published it while working at Google Research. Google allegedly requested they withdraw the paper for internal review, then fired several of the co-authors, including well-known AI ethics researcher Timnit Gebru. The incident highlighted risks of speaking out against dominant research paradigms, especially when papers critique an employer's technology direction. It increased scrutiny of research freedom and ethics in AI.
Sources:
Emily M. Bender, Timnit Gebru