In 2015 a cohort of well-known scientists and entrepreneurs including Stephen Hawking, Elon Musk, and Steve Wozniak issued a public letter urging technologists developing artificial intelligence systems to “research how to reap its benefits while avoiding potential pitfalls.” To that end, they wrote, “We recommend expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial: our AI systems must do what we want them to do.”
More than eight thousand people have now signed that letter. While most are academics, the signers also include researchers at Palantir, the secretive surveillance firm that helps ICE round up undocumented immigrants; the leaders of Vicarious, an industrial robotics company that boasts reductions for its clients of more than 50 percent in labor hours—which is to say, work performed by humans; and the founders of Sentient Technologies, who had previously developed the language-recognition technology used by Siri, Apple’s voice assistant, and whose company has since been folded into Cognizant, a corporation that provided some of the underpaid, overly stressed workforce tasked with “moderating” content on Facebook.
Musk, meanwhile, is pursuing more than just AI-equipped self-driving cars. His brain-chip company, Neuralink, aims to merge the brain with artificial intelligence, not only to develop life-changing medical applications for people with spinal cord injuries and neurological disorders, but, eventually, for everyone, to create a kind of hive mind. The goal, according to Musk, is a future “controlled by the combined will of the people of Earth—[since] that’s obviously gonna be the future that we want.”
It turns out, then, that the most significant takeaway from a letter warning of the potential dangers of artificial intelligence might be its insistence that AI systems “must do what we want them to do.” And what is that? Even now, just six years later, that list is too long to catalog. Most of us have encountered scripted, artificially intelligent customer service bots whose main purpose seems to be forestalling conversations with actual humans. We have relied on AI to tell us what television shows to watch and where to dine. AI has helped people with brain injuries operate robotic arms and decipher verbal thoughts into audible words. AI delivers the results of our Google searches, as well as serving us ads based on those searches. AI is shaping the taste profile of plant-based burgers. AI has been used to monitor farmers’ fields, compute credit scores, kill an Iranian nuclear scientist, grade papers, fill prescriptions, diagnose various kinds of cancers, write newspaper articles, buy and sell stocks, and decide which actors to cast in big-budget films in order to maximize the return on investment. By now, AI is as ambient as the Internet itself. In the words of the computer scientist Andrew Ng, artificial intelligence is “the new electricity.”
In 2017 Ng summarized his vision in a valedictory post on the blogging platform Medium announcing his resignation from the Chinese technology company Baidu. “The industrial revolution freed humanity from much repetitive physical drudgery,” he wrote. “I now want AI to free humanity from repetitive mental drudgery, such as driving in traffic.” If freeing people from that sort of mental drudgery seems trivial in the face of, say, climate change and other current and impending global calamities, its real value will be to stakeholders in a global autonomous car market that is expected to grow to more than $809 billion this year and $1.38 trillion by 2025. Overall, according to a report from PriceWaterhouseCoopers, AI could add up to $15.7 trillion to the global economy by 2030.
Such unbridled growth is not without other, less compensatory consequences. As Kate Crawford’s trenchant Atlas of AI demonstrates again and again, artificial intelligence does not come to us as a deus ex machina but, rather, through a number of dehumanizing extractive practices, of which most of us are unaware. Crawford, a senior researcher at Microsoft and a cofounder of the AI Now Institute at NYU, begins her tour of the AI universe in Silver Peak, Nevada, looking at the “open, iridescent green ponds” of brine pumped out of North America’s largest lithium mine. Lithium—the “li” in “li-ion” batteries—is an essential ingredient in our digital lives. Without it there are no laptop computers, no smart watches, no cell phones.
“The term ‘artificial intelligence’ may invoke ideas of algorithms, data, and cloud architectures,” Crawford writes, “but none of that can function without the minerals and resources that build computing’s core components.” She adds:
Many aspects of modern life have been moved to “the cloud” with little consideration of these material costs. Our work and personal lives, our medical histories, our leisure time, our entertainment, our political interests—all of this takes place in the world of networked computing architectures that we tap into from devices we hold in one hand, with lithium at their core.
Calling those networked computers “the cloud” is a perfect example of what Crawford sees as “the strategic amnesia that accompanies stories of technological progress.” While the metaphor invokes an image of data floating weightlessly in the sky, the reality is that the cloud takes up hundreds of thousands of acres of terrestrial real estate, typically located where electricity is cheap. (The world’s largest data center, as of 2018, in Langfang, China, covers 6.3 million square feet, the equivalent of 110 football fields.) Cheap, of course, is a relative term. A study from researchers at McMaster University found that, if unchecked, the computing industry as a whole could account for 14 percent of all greenhouse emissions by 2040—“about half of the entire transportation sector worldwide.”
Some of this carbon intensity has been driven by the belief that ever-bigger datasets are essential to train machine learning algorithms in order to create workable AI systems. (Machine learning is a kind of artificial intelligence, in which algorithms sort through enormous amounts of data using statistical methods to make classifications and predictions; the assumption is that more data delivers more accurate outcomes.) When researchers from the University of Massachusetts Amherst calculated the carbon emissions required to build and train a single natural language processing system—which teaches computers to interpret and use everyday language—they determined that it was around five times the lifetime emissions of the average American car.
In the early days of what we now think of as digital computing, natural language processing was the holy grail of artificial intelligence. It is central to what has become known as the Turing test, a method of determining if a machine has achieved human-level cognition, derived from the British mathematician Alan Turing’s 1950 paper “Computing Machinery and Intelligence.” In its simplest formulation, the test posits that we will know that machines have achieved real intelligence once people are unable to figure out if they are conversing with a human or a machine.
Leaving aside the inadequacy of the Turing test to actually determine intelligence, as well as its reductive understanding of what intelligence is, it’s indisputable that natural language processing systems have made tremendous strides, especially in the past few years. It’s now possible to have a rudimentary exchange with Amazon’s Alexa device, though there is a good chance that Alexa’s answers will be wildly off the mark or inane. (Also, Alexa has begun to initiate conversations, almost always to promote some aspect of Amazon-related commerce; to wit: “You may be running low on Stash Irish breakfast tea. Would you like to reorder?”) Google Translate can take words and phrases from Hmong, say, and switch them into Serbian—a triumph, but again, one with varying degrees of success.
Recently, the OpenAI research institute released GPT-3, an updated iteration of its natural language processor. The acronym stands for Generative Pre-trained Transformer. It is “pre-trained” because its algorithms have already sorted through something like 570 gigabytes of text, finding the most statistically significant clusters of words. With only a few prompts, GPT-3 is able to write short stories and essays. Not long after it was released, I asked it to compose an essay with the title “The Future of Humanity.” If one did not read the result too closely, it appeared to address the subject with an uncanny degree of sophistication. That’s because it was, essentially, a collection of words and phrases one might expect to see in such an essay. Strung together, though, they were vacuous:
There was a time when the future was certain. That time is now reaching its conclusion. The present, like everything else, will soon come to an end…. We are on the brink of a technological revolution that has the potential to eradicate human suffering while simultaneously bringing an end to our existence as a species.
Natural language processors like GPT are trained on millions of documents and datasets scraped from the Internet, including Wikipedia and the entire cache of emails seized from Enron employees during the bankruptcy proceedings against the company, which were later released online by the Federal Energy Regulatory Commission. Like pretty much everything on the Internet, they became fair game for machine learning research. In addition to raising questions about the privacy implications of sharing personal correspondence without consent, Crawford asks readers to consider other, sometimes subtle, ramifications of training AI systems in this way, since those systems will reflect the linguistic norms of their sources. “Text archives were seen as neutral collections of language, as though there was a general equivalence between the words in a technical manual and how people write to colleagues via email,” she writes.
If a language model is based on the kinds of words that are clustered together, it matters where those words come from. There is no neutral ground for language, and all text collections are also accounts of time, place, culture, and politics.
It’s a crucial point, and one that begins to get at the ways that AI training models can replicate entrenched social and cultural biases.
Bias is a complicated term, and it’s a useful one to keep in mind when trying to understand how AI systems operate. For developers building machine learning systems, “bias” refers to the task they are building the AI to address, such as playing chess or making restaurant reservations. In that situation, it’s neutral. More typically (and colloquially), it not only describes AI systems that perpetuate prejudices and trade on stereotypes but also suggests how they got this way. Machines only know what they know from the data they have been given.
Historical data, for example, has the built-in problem of reflecting and reinforcing historical patterns. A good example of this is a so-called talent management system built a few years ago by developers at Amazon. Their goal was to automate the hiring of potential software engineers with an AI system that could sort through hundreds of résumés and score them the way Amazon shoppers rate products. The AI selected the highest scorers and rejected the rest. But when the developers looked at the results, they found that the system was only recommending men. This was because the AI system had been trained on a dataset of Amazon résumés from employees the company had hired in the past ten years, almost all of whom were men.
In his surprisingly lively examination of AI regulation, We, the Robots?, the legal scholar Simon Chesterman points to an audit of another résumé-screening program that found that “the two most important factors indicative of job performance…were being named Jared and having played high school lacrosse.” Bias can be inadvertently introduced into AI systems in other ways, too. A study that looked at the three major facial recognition systems found that they failed to identify gender just 1 percent of the time when the subject was a white male. When the subject was a darker-skinned female, however, the error rate was nearly 35 percent for two of the companies, and 21 percent for the third. This was not a mistake. The algorithm builders trained their algorithms on datasets composed primarily of people who looked like them. In so doing, they introduced bias into the system.
The consequences of these kinds of errors can be profound. They have caused Facebook to label Black men as primates, they could cause autonomous vehicles to fail to recognize a woman with dark skin crossing the street, and they could lead the police to arrest the wrong man. In fact, last year The New York Times reported on the case of Robert Williams, a Black man who got a call from the Detroit police while he was at work, telling him to report to the police station to be arrested. At the station, Williams was taken into an interrogation room, where detectives showed him three grainy photographs. These turned out to be surveillance photos from a store where someone had stolen nearly $4,000 of merchandise. The person in question was a heavyset Black man, like Williams. But that is where the similarities ended. How had the police department’s computer identified Williams? Through a match between the grainy surveillance photos and Williams’s driver’s license photo. In this case, a badly trained facial recognition system was used to arrest an innocent man and toss him in jail, even though there was no physical evidence connecting him to the crime.
Databases used by law enforcement include about 641 million driver’s license and ID photos from twenty-one states. In many states, personal information collected by municipal agencies like the Department of Motor Vehicles is for sale to third parties and can then be incorporated into commercial facial recognition systems. Crawford points out that mug shots, too, have been fair game: “A person standing in front of a camera in an orange jumpsuit, then, is dehumanized as just more data,” she writes.
And like a tightening ratchet, the faces of deceased persons, suspects, and prisoners are harvested to sharpen the police and border surveillance facial recognition systems that are then used to monitor and detain more people.
Artificial intelligence systems are now a staple of the criminal justice system. In some jurisdictions, like Los Angeles, AI helped determine where the police should patrol, a determination that is often made on the basis of where the most crimes are committed. That might sound reasonable, but sending more police to patrol those neighborhoods has resulted in more people being arrested for nonviolent, minor offenses. It becomes a self-reinforcing loop: the more crime, the more police; the more police, the more crime—and on and on. Then, once someone is arrested, a judge may use risk assessment software in deciding if they should go to jail and for how long. If the arrestee lives in a high-crime neighborhood, they are much more likely to get jail time, because the algorithm is not simply assessing their propensity to commit another crime—which, of course, it cannot know—it is looking at the criminal records of an aggregate of people with similar backgrounds and characteristics.*
Judges, prosecutors, and parole boards who use these kinds of risk assessment tools often believe that they are fairer than decisions made by humans, failing to see that, in reality, the assessments have been made by the humans who designed these AI systems in the first place. Additionally, as Chesterman notes, a Canadian study
of lawyers and judges…found that many regarded [risk assessment] software…as an improvement over subjective judgment: though risk assessment tools were not deemed especially reliable predictors of future behaviour, they were also favoured because using them minimized the risk that the lawyers and judges themselves would be blamed for the consequences of their decisions.
Other kinds of bias are even more subtle. Many AI systems are proprietary. Shielded behind intellectual property laws, they are often opaque, even to the people employing them. They are similarly inscrutable to the population at large. Consider credit scores: for most of us, this is a number lurking in the background, not just of our financial lives but of what our financial lives lead to, like mortgages and spending limits on credit cards. In the past, a credit score was typically a reflection of how conscientiously one paid bills and settled debts. Now there is talk of enhancing this with “alternative” data, culled from social media and the Internet.
There is no way to know where all the data is coming from, if it’s accurate, how it’s weighted, or whether the algorithmic engine powering the system is relying on data that itself replicates historical prejudices, like where someone lives or where they went to college. Moreover, while it may be illegal in certain circumstances to ask for some personal information, like gender, algorithms can be riddled with assumptions—made by their human authors—such as that an elementary school teacher is female or a commercial pilot is male. They may also use one variable as a proxy for another, such as zip code for wealth or surname for race and ethnicity.
Not long ago, the online insurance company Lemonade posted a series of tweets “explaining” how the company’s algorithms assess claims. As reported by the website Recode, Lemonade maintained that it collected more than 1,600 data points on each user, but
didn’t say what those data points are or how and when they’re collected, simply that they produce “nuanced profiles” and “remarkably predictive insights” which help Lemonade determine, in apparently granular detail, its customers’ level of risk.
Risk, here, actually refers to the company’s level of risk, which it aimed to mitigate by requiring customers making insurance claims to submit videos that its AI “carefully analyzes…for signs of fraud,” including “non-verbal cues.” The Twitter thread concluded that Lemonade’s AI was responsible for the company making more money in premiums than it had to pay out in claims.
Lemonade’s use of video to assess a client’s truthfulness is part of a new trend involving the use of AI to “read” human emotions. Ostensibly, “affective AI” can scan a face and “know” how a person is feeling. One of the leaders in this field, a company called Affectiva, claims it is “humanizing technology.” Pitching its service to companies that hope to gauge consumer interest in their products, Affectiva says that it can measure a person’s moment-by-moment micro-facial expressions as they view an advertisement, using “the world’s largest emotion database,” and correlate these microscopic twitches with human attributes like trustworthiness and attentiveness. Affective AI systems are now used by airport screeners to “identify” terrorists, universities to assess student engagement, corporations to weed out job candidates.
How does the AI know if someone is bored or grief-stricken or euphoric? Affective AI is rooted, first, in an assumption that there is a shared taxonomy of facial expressions, and second, in the idea that that taxonomy can be translated into a numerical system. Is this specious? At least one study, from the University of Maryland, has shown that Black faces are more likely than white faces to be classified by AI as angry. And, of course, there’s the surveillance issues that this raises, and the many ways surveillance leads to self-censorship and the curtailment of self-expression.
This, though, is not usually what people fear about artificial intelligence. More often it’s replacement—that AI will supersede us intellectually, or that it will take our jobs. The concern about employment is not misplaced. According to a team of economists from MIT and Boston University, automation has been subsuming jobs faster than it is creating them. Another study, from the forecasting firm Oxford Economics, predicts the loss of 20 million jobs to automation by 2030. The prospect that we will soon work for our machines, and not the other way around, is already the norm at Amazon warehouses, where humans are, in Crawford’s words, “there to complete the specific, fiddly tasks that robots cannot.” But even before Amazon’s unprecedented dehumanization project, AI developers were reliant on legions of underpaid scut workers to tag audio clips and label images, among other things. “Exploitative forms of work exist at all stages of the AI pipeline,” Crawford writes,
from the mining sector…to the software side, where distributed workforces are paid pennies per microtask…. Workers do the repetitive tasks that backstop claims of AI magic—but they rarely receive credit for making the systems function.
AI is cannibalizing the white-collar sector, too. A study from Wells Fargo estimates that as many as 200,000 finance jobs will disappear in the next decade. AI now reads legal documents with a speed and accuracy unmatched by its human counterparts, generates corporate reports, and is responsible for hiring, assessing, and firing workers. AI is also moving into creative fields like musical composition. Aiva (Artificial Intelligence Virtual Artist) has learned music theory from a database of classical compositions, produces its own sheet music, contributes to movie soundtracks, and is the first AI to be officially designated as a composer, with its own copyright under the France and Luxembourg authors’ rights society.
But the years ahead won’t all be about loss, as Kevin Roose points out in his ultimately genial assessment of the prospect of our coexistence with automated and artificially intelligent machines, Futureproof: 9 Rules for Humans in the Age of Automation. Aside from the potential savings from the added efficiencies and reduced labor costs associated with automation, which may or may not be passed along to consumers but will certainly accrue to corporations and their owners (among them, the richest person in the world, Amazon founder Jeff Bezos), there will be new jobs in fields that don’t yet exist. Cognizant, the company that supplied Facebook with content moderators, imagines some of them to be “personal data brokers,” “augmented reality journey builders,” and “juvenile cybercrime rehabilitation counselors.” And then there’s this: The Wall Street Journal recently reported that Pepper, a humanoid robot created by the SoftBank group in Japan, was so incompetent at the various jobs for which it was tasked, among them Buddhist priest and nursing home attendant, that the company stopped making it, which suggests that some humans may not be obsolete yet.
The other fear—that AI systems will acquire human-level intelligence and eventually outwit us—remains, thus far, the stuff of science fiction. True, AI can perform certain functions more quickly and accurately than people, but that is hardly a measure of intelligence. In the estimation of the computer scientist and AI entrepreneur Erik J. Larson, “As we successfully apply simpler, narrow versions of intelligence that benefit from faster computers and lots of data, we are not making incremental progress, but rather picking low-hanging fruit.” His thoughtful new book, The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do, makes a convincing case that artificial general intelligence—machine-based intelligence that matches our own—is beyond the capacity of algorithmic machine learning because there is a mismatch between how humans and machines know what they know. Human knowledge is diverse, as are our capacities. Our intelligence derives from the range of our experiences and thrives, at times, on the irrational, the serendipitous, the spiritual, the whimsical. In the estimation of the French machine learning scientist François Chollet, Larson writes, “Your brain is one piece in a broader system which includes your body, your environment, other humans, and culture as a whole.”
By contrast, even machines that master the tasks they are trained to perform can’t jump domains. Aiva, for example, can’t drive a car even though it can write music (and wouldn’t even be able to do that without Bach and Beethoven). AlphaGo can best the most accomplished Go player in the world, but it can’t play chess, let alone write music or drive a car. Machine learning systems, moreover, are trained on datasets that are, by definition, limited. (If they weren’t, they would not be datasets.) As Larson notes, the real world—the one we inhabit and interact with—generates data all day long: “Common sense goes a long way toward understanding the limitations of machine learning: it tells us life is unpredictable.” AI can’t account for the qualitative, nonmeasurable, idiosyncratic, messy stuff of life. The danger ahead, then, is not that artificially intelligent systems will get smarter than their human creators. It’s that by valorizing these systems without reservation, humans will voluntarily cede the very essence of ourselves—our curiosity, our compassion, our autonomy, our creativity—to a narrow, algorithmically driven vision of what counts.
October 21, 2021
‘Who Designs Your Race?’
Are the Kids All Right?
A 2016 ProPublica study of more than seven thousand arrests in Broward County, Florida, found that the county’s AI system was twice as likely to label Black defendants future criminals than whites. One of the questions it used to assess risk was “Was one of your parents ever sent to jail or prison?,” thus perpetuating mass incarceration. ↩