Let M be a manifold in Rn defined by a vector equation f(x)=0 for some differentiable function f:Rn→Rm. (A solution of a smooth vector equation is generally some manifold.) Let g:M→R be a differentiable function, and let v be an extremum point of M.
The Lagrange multiplier rule says that at v, the system of m+1 vectors
(∇ f)(v)
(∇ g)(v)
has less than full
degree (i.e. is
linearly dependent).
If we write f=(f1,...,fm), with each fi:Rn→R, the system of vectors is the perhaps more familiar
(∇ f1)(v)
...
(∇ fm)(v)
(∇ g) (v),
and one possible
linear dependence is given by (∇ g)(v) being a
linear combination of the (∇ f
i)(v)'s; writing
(∇ g)(v) = a1 (∇ f1)(v) + ... + am (∇ fm)(v)
in this case shows you why the a
i's are called Lagrange
multipliers.
If you think about it, what the Lagrange multiplier rule is telling you is merely that if you constrain an end of a rubber band to a curved surface and pull the other end in some direction, the constrained end will come to a stop when the rubber band is perpendicular to the surface.