As an experimentalist, I feel bound to let experiment guide me into any train of thought which it may justify; being satisfied that experiment, like analysis, must lead to strict truth if rightly interpreted; and believing also, that it is in its nature far more suggestive of new trains of thought and new conditions of natural power.
The F Man
So. E = mc². Pretty cool huh? Here's how it works.
Einstein's proof is very short, and in the grand scheme of things is rather simple. Let there be two systems of coordinates, K and κ, of values K = (x, y, z, t) and κ = (ξ, η, ζ, τ), such that the x and ξ axes line up, and let them move relative to one another at some speed v, such that a point at rest in κ (which we'll call the 'moving' system) has for its coordinates in K (the 'resting' system) ξ = x  vt.
Get that? It's complicated at first. Basically:
K κ
y  η
  v >
 
 
_________________.___
 x  ξ
 
 
 
 
You can see (I hope) the κ system is moving with the velocity v in the direction of increase along the X axis (following Einstein, lowercase letters, except for K, will denote the "resting" frame, Greek letters the "moving" frame, and capital X, Y, Z the axial directions). So a point at rest in κ will be moving in K with the velocity v. Because of this, the value of the x coordinate will always be increasing. However, the value of ξ stays the same, since the object is at rest in κ. Thus, the transformation equation from K to κ along the X axes is ξ = x  vt. This is known as a Galilean transformation equation, since it doesn't take Special Relativity into account.
This is the general setup of the thought experiment. It turns out that the equation for ξ is really a little different, but with no loss of functionality we can assume that a body is at the origin of both systems for an instant, and we can set the clocks of both systems at that instant, so that in that instant t = τ = x = ξ = 0.
Now, in that instant, let that body, which we'll consider at rest in K, emit a sudden burst of energetic light in some direction, and a burst of the same amount of energy (as measured in K) in the opposite direction.
K, κ
 y, η
 /
 / < light emitted
 /
_________./φ (pronounced, whatever your math professors say, 'fee')
φ/ x, ξ
/ 
/ 
/ 

Now, in the resting system, the light emitted in each burst will have some energy 0.5L, so the total energy lost by the body, as measured in the resting system K, will be L. What about an observer in κ, does he measure the same drop in energy? No. The reasons for this will consume the body of this writeup, but the observer moving relative to the body will measure, for the burst of light moving in the direction of the body's motion,
1  (v/c)cos(φ)
E_{f} = 0.5L * ,
√(1  v²/c²)
and, in the other direction,
1 + (v/c)cos(φ)
E_{r} = 0.5L * .
√(1  v²/c²)
Fortunately, these both simplify to
L
E_{t} = E_{f} + E_{r} = .
√(1  v²/c²)
Now this is what I want to explore. Those of you who just want the rest of the proof can skip to the end. For the rest of you, who I hope are the majority of you, I present:
A Natural History of Light
by Kurin
A philosophic inquiry into where the poop did √(1  v²/c²) come from?
The term √(1  v²/c²), or its inverse 1 / √(1  v²/c²), or γ as the latter is shortened to, is called the Lorentz contraction factor. It turns out to be the amount by which time dilates, length contracts, and mass increases, as a function of the velocity. Actually, the proper equation is not E = mc², but E = γmc², because the relativistic mass of an object increases with the relative velocity.
But where does this come from? Einstein didn't pull this relation out of his butt (though at times I think Lorentz did), so where in nature was it suggested to him? To answer this question, I think, you'll have to go all the way back to
Michael Faraday
Actually, I lied. This section isn't really about the history of light. A lot of people were trying to get into the theory of light, including such notables as Newton (who was convinced light must be made of particles) and Huygens (who thought maybe light was made of waves), and I don't talk about any of them here. Further, Faraday wasn't really investigating light. He was looking at electricity. But I still think it is in his researches that the modern (nonquantum) theory of light is founded.
I'm really not sure how much of a mad scientist Faraday was. These guys who worked with electricity were, to put it bluntly, nuts. They have passages like "and then such a potential was collected in the Leyden jar, that when touched to the tongue or to the hand, a violent spark was produced, and the senses for many tens of minutes were rendered altogether inoperative" and stuff like that. It's insane. Actually most of the oldtime "natural philosophers" were like that; drinking acids and shocking assistants. Faraday, I'm sure must have shared in their spirit.
However, there is also an unimpeachable scrupulousness to Faraday that one wouldn't think compatible with the mad scientist archetype. It is his quote which opens this writeup. He is such an honest scientist. He refuses to speculate more than is necessary, or to bind himself to a theory that hasn't been completely suggested by physical observation. He didn't keep his mind closed; on the contrary he gave more to the subject than any other scientist. But he kept his eyes open, and refused to let his reason run ahead of his senses, so that what his reason did ultimately take up and examine were not halfshadows of speculative theory, but unassailable pieces of the physical world.
Anyway, I like the guy.
So Faraday wrote this threevolume set called Experimental Researches in Electricity, which is a collection of 29 series (think chapters) and some papers(think appendices), each dealing with an aspect of electric phenomena.
Faraday begins the Researches by taking up the phenomenon first noticed fairly recently (I think by Örsted) of electromagnetic induction. This is where an electric current in one wire seems, under certain conditions, to produce an electric current in another wire.
Faraday tries all sorts of amazing and bizarre experiments to get to the heart of the phenomenon. He has solenoids of many hundreds of feet of copper, and some of the currents he uses are pretty intense. In one example, he has two intertwining helices of wire, and through one he is sustaining a fairly severe current. In the other, he says, "I could obtain by no evidence by the tongue, by spark, or by heating the fine wire or charcoal, of the electricity passing through the wire under induction."
All who have studied basic physics in high school or college know the law that Faraday is heading (not stumbling!) for: that it is change in the electric current that causes a current in an adjacent wire. So he gives the law: if a wire pass near a magnet in such a way as to cut the magnetic curves, then the current will flow in a predictable direction.
This talk of 'magnetic curves' warrants closer attention.
When Faraday was writing, Newton was the God of natural philosophy. The Principia had come out about a hundred and fifty years ago. His method, of centers of force, and his mathematics, the calculus, had proven themselves brilliantly, and shed much light on many other aspects of the world, and their applicability to nearly every worldly problem was well known.
So there were many scientists, such as I think Ampère, looking at magnets, and at electricity, and trying to apply a centers of force model. This seemed, at first, pretty straight forward. It had been shown that the magnetic and electrostatic forces of repulsion and attraction obeyed the inverse square law, just like gravity. In fact, if you assume that each pole of a magnet is a center of force, and that a body is attracted to one pole, and repulsed by the other, in an inverse square relationship, you can exactly explain the phenomena. Then, it looks as if both electrostatic action and magnetic action are of the same kind as Newton's gravity.
So what's a magnetic curve?
The footnote where the term "magnetic curve" is first used reads: "By magnetic curves I mean the lines of magnetic forces, however modified by the juxtaposition of poles, which would be depicted by iron filings; or those to which a very small magnetic needle would form a tangent." He does not mean to say that magnetic curves are real, that they have being. He is just saying that if you traced an imaginary line in space, along which, say, a compass needle would point, then you have an imaginary line which, when cut by a copper wire, will give the law according to which current will arise in the wire.
The point is that it's not real. Yet. But as Faraday kept experimenting, in that gentle juggernaut way of his, he became more and more convinced of the physical being of the magnetic curves. Since he can never find incontrovertible proof that they are real, and are not some phantom shadows of other phenomena, he never says in the Researches that they do in fact exist, but only treats them as tokens of actual magnetic force.
After the Researches, though, he wrote a more speculative paper called "On the Physical Character of the Lines of Magnetic Force." In the opening paragraph, where he explains his treatment of lines of force in the Researches, he says,
The definition then given had no reference to the physical nature of the force at the place of action, and will apply with equal accuracy whatever that may be; and this being very thoroughly understood, I am now about to leave the strict line of reasoning for a time, and enter upon a few speculations respecting the physical character of the lines of force, and the manner in which they may be supposed to be continued through space.
So Faraday believed that lines of magnetic and electric force were real (and was able to give an account of the electric), but left the matter undecided. Lines of force, real or not, were good ways to give an accounting of the investigated phenomena.
Faraday's Experimental Researches are fairly exhaustive regarding phenomena, but they are not at all quantitative. Ever since Newton and Descartes, people had really loved to put the world under the lens of geometry and calculus, and here is this wonderful collection of investigative analysis that doesn't do that at all. Probably some men tried to fit this or that Series into mathematical form, but the real mind that went to work on this matter belonged to
James Clerk Maxwell
What did Maxwell do? Why, he spoke Faraday in the language of mathematics. But how did he do this?
Like the Researches above, I'm going to avoid as much as possible from talking about the content of Maxwell's "A Treatise on Electricity and Magnetism" in favor of discussing how I think that content got there in the first place. However, for some reason (probably for a very, very interesting reason) as I go from Faraday through Einstein, it gets more and more difficult for me to separate the content from the method. So I apologize if I get pointlessly technical.
The heart of Maxwell's Treatise is, of course, the mathematics which he brought to the subject. He gives Faraday to us in terms of what today we would call the vector calculus. This calculus fits almost exactly on top of Faraday's lines of force.
When Maxwell defines a line of potential, and shows that the integral along it in suchandsuch a manner is equal to suchandsuch a quantity, it is plain that he is integrating along a line of force, magnetic or electric. And when he shows that it is possible to move from point A to B without changing potential, we have Maxwell's equipotential surfaces, which play such a prominent role in his theory.
If Faraday saw a world full of lines of magnetic and electric force, Maxwell saw a world full of energy and potentialities. This way of thinking lends itself more or less directly into the language of calculus.
Now, and this is where I think the kicker kicks the kicked, Maxwell took phenomena that were known to Faraday and mathematized them, coming up with equations for the curl of the electric and magnetic fields. Of course, Maxwell was able to give accounts of all of the MaxwellHertz equations, but these are two we're specifically concerned with. Those of you familiar with this story will recognize these as Ampère's Law and Faraday's Law of Induction.
These equations are derived, remember, from the content in Faraday's Researches. In the Treatise, they are derived in and around Articles 531 and 585 (although Maxwell does add a term to complete them).
With these two equations, Maxwell is able to put them together in such a way that out falls the wave equation. This is the equation that says light travels at the speed c, where c is the ratio of the electrostatic units to the electromagnetic units.
What are the electrostatic and electromagnetic units? If you're examining a electric system, such as charged pith balls suspended from wire, and you want to quantify the situation, you will probably apply numbers to the relative charges on the pith balls. Then you can get the forces exerted on the balls, and work from there. These are electrostatic units. If, instead, you are looking at currents in wire, you might want to assign a value to that current, and work from that. These are electromagnetic units.
If you do the former, your fundamental unit is 'charge', and current is 'charges per second.' If you do the latter, your fundamental unit is current and charge is 'current times time.' It turns out that the two quantities of charge are numerically different, and that the units on them are different. The same is true, and in the same way, for the quantities of current and all other effects. The electrostatic unit of charge, say, is called the statcoulomb, while the electromagnetic unit of charge is the abcoulomb. If you divide the abcoulomb by the statcoulomb, or any abunit with its corresponding statunit, you get a constant, which just happens to be a speed, and is equal to about 3.0x10^{8} m/s. Physicists usually call this value 'c'.
Maxwell has, I swear to God this is so cool, a thought experiment explaining how this ratio is a speed, and what that means. Imagine two plates in space, separated by some distance, and both positively or negatively charged. Now, two plates of the same charge will repel each other. BUT if the plates are moving, then they can be treated as two currents moving in the same direction. Two currents moving in the same direction attract each other. So at some point the electromagnetic attractive force will balance the electrostatic repulsive force, and the two will balance. According to Maxwell, the speed at which this happens is 'c'.
You may have some objections to this experiment. You might think, "But what if there is an observer on those plates, and that observer considers himself at rest?" It is clear that Maxwell did not believe such an observer could do that. For Maxwell, there was an absolute rest in the ocean of the ether, and experiments such as this one measured speed relative to that.
Another man who believed in absolute rest was
Hendrik Antoon Lorentz
Lorentz came into the party on the heels of the MichelsonMorley experiment, in which it was shown that Maxwell's equations conflicted with reality. Specifically, the speed they predict the ether to be moving past the earth should be measurable, but it is not. Lorentz postulates that moving bodies may shrink according in the ratio 1:√(1  v²/c²). This is where the term first pops up. He shows that if his postulate is true, the phenomena are explained. I could say more but this writeup's getting real long now and he's not central to its story.
The next and final character is, of course, the man himself,
Albert Einstein
Now, I'm not going to run wild with mindless praise for the big man, I think that as far as Special Relativity goes, he wasn't the first to see the entire picture. But 1905 was a big year for him, and he did put it all together.
Einstein took Maxwell's equations and said, "They predict light to travel at one speed, and one speed only. That means they can only be valid when light actually travels that speed. But we've done experiments, and it looks like light always travels that speed. What if the MaxwellHertz equations are always valid?"
And those are the ONLY two postulates in the Special Theory of Relativity:
 The speed of light is a constant, and
 The laws of physics are covariant in any unaccelerated frame of motion.
What this means is that Maxwell's equations are good for any observer who feels no
acceleration.
This requires a change of the familiar Galilean transformation equations. Whereas before we had τ = t, now we have τ = γ(t  vx/c²); where we had ξ = x  vt, now it is ξ = γ(x  tv), where γ = 1/√(1  v²/c²).
Where do these equations come from? They are not too difficult to derive, but I will still only touch the surface. Given that light goes the same speed all the time, if you are at a point A and want to know what is going on at B (at rest relative to A), at a given time, you can do so easily by observing the light that B gives off, and then calculating what was happening at B at the time in question. That is, if the distance from A to B is r, then if you send a signal from A to B and it bounces back, you can tell how long the trip took with the easy formula,
t = 2 * r/c.
So you know that it took half the time to get there, and half the time to get back. This can be done with any two points in a given system of reference, K. So we say that the clocks in K are synchronized, because if you give me a value for a time at a point in K, it is very easy for me to tell you what the face of any other clock in K reads. The speed of light may limit how soon after the event we can do this, but eventually it is possible. For example, I can't tell you what is going on at the sun right now at t = 0, but in eight minutes at t = 8 I will be able to say what was happening on the sun and right here at t = 0.
What if a clock is not in our system of coordinates? Well, it turns out that the formula has to change, and
t = r/(c + v) + r/(c  v).
Since c is constant for us, we see the light take more time to go the same distance when the distance itself is moving in the direction of light's propagation, and similarly it takes less time to go the other way. When light goes orthogonal to the direction of motion, the time taken is
t = 2 * r/√(c²  v²),
because the light is moving along a diagonal.
What Einstein does it make τ a function of x, y, z, and t. Finding this, ξ is equal to cτ, and similarly η and ζ. The final transformation equations, as we have seen above, are
τ = γ(t  vx/c²)
ξ = γ(x  vt)
η = y
ζ = z,
where γ = 1/√(1  v²/c²).
Finally, given these he is able to show the equations for E_{f}, E_{r}, and E_{t} from (way, way) above. So now we can complete
The proof
So we have the expression for the energy as measured in K (remember, K(x, y, z, t) is the "resting" system (with us at rest in it), and κ(ξ, η, ζ, τ) the "moving" system, with a body at rest in it, such that the body is moving relative to us). Now what? Well! Let the energy the body started with in κ be denoted by E_{0} and the energy it ended up with be E_{1}, and similarly for K, H_{0} and H_{1}. So
E_{0} = E_{1} + L, and H_{0} = H_{1} + γL.
Then,
(H_{0}  E_{0})  (H_{1}  E_{1}) = L(γ  1).
But H and E are measures of energy of the same body, and the only difference between them is the motion relative from K to κ. So the only way the quantity (H  E) can differ from the kinetic energy of the body is by some constant C which can depend on spin, for example, or heat, but not on anything that the emission of light would affect. So, since C is constant, and if V is kinetic energy
H_{0}  E_{0} = V_{0} + C
H_{1}  E_{1} = V_{1} + C
But how could the kinetic energy of the body change? K and κ have not changed their relative motions, and the body is still at rest in κ. Since V = 1/2mv², only a change in mass can account for the change in V_{0}  V_{1}. Thus, if a body gives off the energy E, measured in its own system, its mass decreases by E/c². That is, E = mc².
So there you go.
Much thanks to Swap for a prepost reading! He caught some painful typos, and helped me clarify myself. Also wrinkly for postpost error catching. Also! jrn helped find many links, typos and such.