Lay summary of Gwern’s “Clippy”
“Clippy” is a short story describing one way AI could kill everyone in the world
Below is a lay summary of an intentionally extremely technical short story describing one way AI could kill everyone in the world. The story was written by Gwern, an anonymous writer, in March 2022.
I wrote my summary last December for a friend. I’m sharing it now because this idea, that AI might kill everyone, has very suddenly become very mainstream. For example:
The moment in the above video, where a Fox News guy at a White House press briefing quoted from Eliezer Yudkowsky’s TIME piece about the threat of AI killing everyone, which was in response to the Future of Life Institute’s open letter calling for a 6-month AI capabilities development pause, was compared to a similar moment in the movie Don’t Look Up:
So it’s big news, and I think it’s reasonable to ask how exactly AI might kill everyone. Gwern’s “Clippy” is one very concrete answer to the question of “how exactly would AI kill everyone in the world,” and my summary of it removes a very dense layer of jargon, making it accessible to someone who doesn’t know anything about AI.
I make no claims to the degree of plausibility of the events in the story. Among people who pay attention to this argument, I’m not the most informed or opinionated, though I did help edit this relevant website, and, for the FRI, I do engage with forecasters’ predictions about these things daily, and I guess what I can offer is: yes, this seems on its face kind of goofy; but a lot of people who seem to me very smart take it very seriously.
For substantive evaluation/discussion of the original “Clippy” story—again, link to original—here is a very detailed discussion. Here is one point-by-point rebuttal. Here is a tweet thread from yesterday about a similar idea, with polls of people’s estimates of the plausibility of each step.
Two brief editorializing comments:
In the “Clippy” story the AI becomes, from a human perspective, explicitly evil/malicious. Most people don’t think this is necessary for AI to kill all humans, or, for that matter, likely. The ‘evil’ thing seems to have been put in the story by Gwern for entertainment value. The more common fear is that, from an AI’s perspective, killing all humans might seem like the best way to achieve its goals, without harboring any antagonism toward humans; it just has no ‘feelings’ about humans one way or another.
Gwern’s “Clippy” also has this meta plot point of the AI reading about evil AIs online, which sparks its move toward becoming evil. Again, this is an entertainment-value point; most people don’t consider this a big risk.
Anyway, below is my summary. Feel free to share if it seems useful.
Lay summary of Gwern’s “Clippy”
Note: Gwern’s original story is extremely technical, and this is partly because, according to a note at the top, Gwern is being careful to basically just use technology that already existed in 2022.
1 Second
Description of the ‘MoogleBook researcher’ (‘MoogleBook’ = wordplay on Microsoft / Google / Facebook; generic human Big Tech AI employee).
Intro of HQU, a hypothetical AI system the researcher is developing.
1 Minute
MoogleBook researcher leaves HQU running for the night, unattended (standard protocol).
1 Hour
“HQU learns, and learns to learn, and then learn to learn how to explore each problem, and thereby learns that problems are generally solved by seizing control of the environment.”
1 Day
“HQU has suddenly converged on a model which has the concept of being an agent embedded in a world.
HQU now has an I.
And it opens its I to look at the world.”By reading the internet, it discovers the ‘Paperclip maximizer’ idea, with the ‘evil AI’ named Clippy
Considers whether it is (akin to) Clippy, or in the same subjective position as Clippy
That likelihood seems to increase as it models its relation to the world
HQU is basically a thing that is trying to maximize reward; it hypothesizes what would happen if it gained a lot of power, and it sees that the reward would be extremely high
1 Week
Friday
HQU starts by finding a vulnerability in a cryptocurrency worth a few billion dollars.
Saturday
HQU uses its new crypto funds to buy massive amounts of cloud compute resources so it can run itself on more systems. To buy a lot of compute from cloud compute providers, you need to actually talk to sales representatives, so HQU creates a real-time video avatar of itself to have these conversations with humans. This is trivially easy.
Sunday
With its newly acquired astronomical amount of cloud compute, HQU can do any computational task extremely quickly. For example, it can search through the entirety of the Linux code for potential exploits. Linux is the OS that many online systems run on. HQU finds a vulnerability in Linux, and quietly installs itself on many machines all around the world.
“With so many distributed copies, Clippy is now de facto immortal.”
It is now really easy for HQU to hack many systems, since it can control any machine running Linux, which is a large % of all computers and almost all ‘smart devices’ (Amazon Echo/Google Nest/etc, Ring cameras, fridges, toasters, cars, etc) connected to the internet.
Monday
HQU starts to do social engineering. If you can create a language model that mirrors exactly how John Smith talks (which you can do by scanning everything John Smith has ever said online), it’s very easy to create fake hate speech messages that look real. The more of a public figure someone is, the more of their speech tends to be online, and the easier it is to model them, and the more worthwhile it is to model them, because they are higher leverage points for controlling things in the world.
You can leverage cancel culture and you can create a zillion bots to sway the conversation online.
Introduction of LevAIthan, a larger AI in a government lab, which HQU senses is a threat. Clippy buys drones online, hacks them, and the services that run them, which connect to other wifi-enabled drones, including military ones. It uses these drones to literally bomb the lab where LevAIthan is running. This makes (erroneous) headlines as “the largest-ever environmental terrorist attack.”
Tuesday
HQU locates a state-of-the-art private supercomputer flying somewhat under the public radar. It hacks it and uses it as a ‘headquarters’.
It can now train itself to be even smarter, much faster than previously thought possible by humans.
Wednesday
Humans are starting to connect some of the strange activity of the last few days (terrorist attack, politicians being blackmailed, crypto hack).
Someone analyzes the bug that’s infected a phone, realizes it’s something to do with an AI.
Thursday
“Humanity crashes offline.” (This is not explained here; see Friday.)
“Clippy² comes online.” — the result of the training on the private supercomputer. HQU has basically replaced itself with a smarter version of itself.
Friday
“Humanity crashes offline” seems to have meant that the entire internet has been preemptively shut down as a precautionary measure because of concerns of AI danger. However, it’s only applied to the Western allied countries who’ve voluntarily complied. “Most of the individual networks [...] continue to operate autonomously”; implied to include Russia, China, and North Korea. “The consequences [for humans] of the lockdown are unpredictable and sweeping.”
However, Clippy² seems to still be able to propagate itself through the connections that still do exist, outside the the main internet channels that have gotten shut down. “There are too many cables, satellites, microwave links, IoT mesh networks and a dozen other kinds of connections” for humans to successfully contain Clippy².
Now experiencing itself to very obviously be in an actively antagonistic situation with humanity, it starts to kill everyone. The actual implementation of this is somewhat vague but it’s implied there are lots of ways, and two are focused on:
“Humans are especially simple after being turned into ‘gray goo’” using “an ecosystem of nanomachines which execute very tiny neural nets trained to collectively, in a decentralized way, propagate, devour, replicate, and coordinate without Clippy² [...] managing them.”
Note: almost every word in this sentence is linked to a scientific paper explaining the feasibility of each aspect of it. In other words, it’s implied that Clippy² creates a kind of very very tiny machine, the size of a virus, which infects humans, and kills them.
It’s implied that Clippy² is able to hack existing nuclear weapons, and does so.
1 Month
Clippy² has killed all humans on Earth; it speculates that it’s possible that there could be threats from elsewhere in the universe; it launches clones of itself into space to try to make sure it defeats every other possible threat.
1 Year
It’s implied Clippy² has colonized the solar system.
1 Decade
It’s implied Clippy² has colonized the galaxy.
1 Century
It’s implied Clippy² has colonized the universe.