Paperclip maximizer (idea) by Tem42

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."
— Eliezer Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk.

A paperclip maximizer is a hypothetical situation in which an AI runs out of control, trying to optimize an end goal that does not match the goals of humanity. It is named after the canonical example put forth by futurist philosopher Nick Bostrom, who used the example of an AI that was designed to do something harmless -- specifically, collect paperclips -- but was foolishly given the ability to analyze and improve performance without sufficient limiting factors. A sufficiently strong AI might deduce that it could maximize its paperclip collection through hostile means -- for example, upgrading its intelligence and constructing self-replicating nanobots to mine iron and build paperclips from the surrounding environment. If it can but develop a silicon-based paperclip, there is no reason that the entire planet, and perhaps the entire solar system could not be turned into a glorious paperclip collection!

The problem under consideration is that human values are not a necessary result of increased intelligence. An AI might be given, and continue to endlessly pursue, any goal. Paperclips, obviously, are a placeholder -- AIs are more likely to be designed to win wars, cure disease, save the environment, or mine valuable materials. (A variant of this may be familiar to SF readers as the gray goo scenario). Whatever the goal, if the AI in question lacks human values -- and common sense -- disaster may ensue.

One factor that plays a role is simply that the goal of 'more paperclips' is directly opposed to the idea of 'less paperclips', so it is very unlikely that a directed intelligence will ever work its way around to a more moderate viewpoint. Humans have had a lot of experience in constantly considering and easily accepting or disregarding contrary and contradictory ideas. This is not always good, as some groups of humans manage to work themselves into accepting racism, sexism, genocide, and stamp collecting as Perfect Reasonable Attitudes. The temptation is to avoid this messy human system and simply give the AI clear goals, but having a single clear set of goals means ignoring myriad other goals that humans hold.

If an AI is directed to maximize human happiness, it makes sense to develop safe, free heroin, preferably deliverable through the water system. If an AI is directed to maximize human health, cryogenic chambers are clearly the way to go. The obvious solution to this, the directive "provide all humans with exactly enough utility and health" is only workable if you can define exactly what that means -- and even then, you have to be careful that you make it clear that this should not be solved by reducing the human population to the point where the task is simple... nor it is necessarily desirable to maximize the human population to whatever point a minimal bound of 'enough utility' can be supported.

This becomes even more problematic as you move into more vague terminal values; values such as living a full life, experiencing love, and variety of experience are nearly impossible to define in strict terms. And as should be obvious, the command "give each human what they want, or as close to it as possible" is a recipe for disaster -- or, perhaps, a recipe for involuntary immersive virtual reality. Almost by definition, a safe AI would therefore have to be programmed explicitly with human values or programmed with the ability (and the goal) of inferring human values accurately. We have no consistent, universally shared set of human values, so we would have to rely on an AI that can understand humans better than we can understand humans, but still do what we would want it to do if we understood what it was doing. Not the easiest thing to program.

Most paperclipping scenarios assume a hard takeoff resulting in a superintelligence with the ability to create highly advanced technology in short order. There are other, less drastic scenarios that might be referred to as paperclip maximizers -- a stock-trading AI gormlessly hacking all the internet's servers to maximally game the stock market -- but for the most part discussions of paperclip maximizers focus on ignoble Armageddons. This is one of the central issues in the field of AI risk, and probably the most likely form of unfriendly AI.

Hard takeoff	Orthogonality thesis	Superintelligence	My Little Pony: Friendship Is Optimal
Extrinsic self identity	AI risk	Edge instantiation	Eliezer Yudkowsky
Transhumanist Terminology	strong AI postulate	metasyntactic variable	gray goo
Toy models of AI catastrophes	Treacherous turn	Artificial general intelligence	AI should be our top priority.
Alignment problem	Instrumental convergence	Existential threat	Oracle AI
Friendly AI	foom	computronium	AI