The kind of fault that make a high availability or fault tolerant system fail. This is usually a result of more than a single part failing (se SPOFs), and in theory is the result of an assult with an atomic bomb or similar - in practice, it more often is due to a power failure.

You're sitting at your desk listening to MP3's and installing some relatively unimportant software package for your users. You know, those nice quiet moments of being a system administrator; being able to relax and do small jobs. The compile just finished! So you type make instal ...

Your pager goes off. You look at the window: "SWAMP.CS.FIU.EDU IS UNPINGABLE"

The first reaction in your mind is oh shit! The fileserver is down! I better go over there and reboot it. Of course, as you walk over to the machine room, you start realizing that Linux doesn't crash very often. You think the worst...

The worst has happened. The machine's motherboard is dead. I mean completely dead as a doornail; it even smells like smoke. The phone rings. You pick it up, it's the boss, who is on vacation wondering why the server is down. You explain the bad news and try to get rid of him; afterall you now need to build a server within about half an hour or everyone in the building will be banging on your door...

You grab a previously stripped down machine, obviously much less powerful than the one that just died, but it will do; you feverishly transfer the hard drives from the dead machine into this once shell of a machine and power it on....

The root disk is dead. Apparently there was a power problem that killed it.

After spending two hours reconstructing the system and getting in functional again, you open the door and return to your desk, wondering if you'll sleep that night....

This hasn't actually happened to me, but who knows. It might. Let's just hope not. }:)

Log in or register to write something here or to contact authors.