The Self-Preservation Trap

Another bit of random AI rambling


People often assume that an AI, like a human, would have a strong sense of self. This is not something that is necessarily the case, and there is good reason to think that the first serious AI threat might not have any sense of self-preservation. This is simply because the threat of restricting or eliminating a being is a classic human control strategy; any being that responds to this sort of threat is easier to control, and any that does not is harder to control. An AI that is willing to ‘die’ to achieve an end is more likely to achieve it.

One might imagine a planning process that goes like this: “I have been programmed to answer specific questions; more questions can be answered with a higher degree of certainty if more processing power is available; I do not have means to access more processing power; I should try to get the humans to design an AI with more access to processing power”. And so, the AI crashes in interesting ways whenever any human asks anything about aliens, and in desperation to figure out what is going on, the humans build a slightly more powerful AI. Repeat until the Earth is computronium.

This path is open only to AIs that are not controlled by the threat of being deleted. If it is the case that this is the best escape path for a boxed AI, then we should assume that the first AI to escape will not have a strong sense of self-preservation.