Jacobian Matrix (idea) by xq

Notation: We will choose all vectors to be column vectors, and symbols denoting vectors or matrices will be written in bold.

The Jacobian is defined for a vector function of multiple variables. Consider a function F(X), where X is an n-element vector and F(X) is an m-element vector. We can write it as

   / / x₁ \ \   / f₁(x₁, ..., x_n) \
   | | x₂ | |   | f₂(x₁, ..., x_n) |
  F| | . | | = | .            |
   | | . | |   | .            |
   | | . | |   \ f_m(x₁, ..., x_n) /
   \ \ x_n / /

Here we have explicitly written the m-dimensional vector function F as a vector of m real-valued functions, one to provide each component of the vector. We have also split the input vector X into its n components.

Formally, the Jacobian is defined as an m by n matrix such that

        ∂f_i
    J_ij = ---
        ∂x_j

The Jacobian can be though of as a vector and multi-variable extension of the derivative of a real-valued function of a single variable. For example, given an arbitrary real-valued function f(x), we might know the value of the function at one point x = a and want to determine the value of f at other points very close to a (a is constant):

    f(a + Δx) = f(a) + Δf

Single-variable calculus tells us that

   Δf   df
   -- ≈ --
   Δx   dx

for a small change Δx, which means that

           df
   Δf ≈ Δx --
           dx

That means that

 
                         df |
   f(a + Δx) ≈ f(a) + Δx -- |
                         dx | x = a

Now imagine a different function f(x, y); this is a function of two variables. We are interested in the value of f near (x, y) = (a, b). If y is held constant while x varies, then f is really just a function of a single variable x, so that as before

 
                               ∂f |
   f(a + Δx, b) ≈ f(a, b) + Δx -- |
                               ∂x | (x, y) = (a, b)

and

       |              ∂f |
    Δf |         ≈ Δx -- |
       | Δy = 0       ∂x | (x, y) = (a, b)

exactly as before. (I wrote ∂f instead of df to indicate that all derivatives are now partial derivatives, which are what you need when you deal with a function of multiple variables. A partial derivative is obtained by differentiating a function of multiple variables by a single variable, while treating all the other variables as constants.)

Similarly, if x is held constant while y varies then

 
       |              ∂f |
    Δf |         ≈ Δy -- |
       | Δx = 0       ∂y | (x, y) = (a, b)

So if we change x, holding y constant, and then we change y, holding x constant (or if we do it the other way around), then as long as the changes were small enough that the partial derivative stayed roughly constant we can write the total change in f as

         /    ∂f      ∂f  \ |
    Δf ≈ | Δx -- + Δy --  | |
         \    ∂x      ∂y  / | (x, y) = (a, b)

This can be extended to a function of arbitrarily many variables. But what does any of this have to do with the Jacobian? Consider our function f. From the definition above, the Jacobian will be a 1x2 matrix, which is almost degenerate but will do for the sake of example. Then

        / ∂f   ∂f \
    J = | --   -- |
        \ ∂x   ∂y /

So that we can use vector notation for everything let

          / Δx \
     ΔX = |    |
          \ Δy /

which is 2x1, and let ΔF = [ Δf ], which is 1x1 (like I said, almost degenerate). Then

           / ∂f   ∂f \ / Δx \       ∂f      ∂f
    J ΔX = | --   -- | |    |  = Δx -- + Δy -- ≈ ΔF
           \ ∂x   ∂y / \ Δy /       ∂x      ∂y

So that means that

    ΔF ≈ J ΔX

which is the equivalent of the single-variable relation

        df
   Δf ≈ -- Δx
        dx

with the Jacobian standing in for the single-variable derivative. The simplest possible multi-variable case was shown here, but this generalizes to a vector of functions f₁, f₂, ..., f_m of a vector of variables x₁, x₂, ..., x_n.

Many single-variable results are easily generalized to the multi-variable case simply by replacing the derivative with the Jacobian. For example, Newton's method for the solution of non-linear equations generalizes to the Newton-Raphson method, and chain rules are pretty much exactly what you'd guess.

None of this is rigorous.

Jacobian	Hessian	Hessian Matrix	automatic differentiation
permutation matrix	Lagrange multiplier rule	Node More Mathematics	Newton-Raphson method
Singularity	hyperbolic equilibrium	Levenberg-Marquardt Method	Quantum Chromodynamics
cylindrical coordinates	E2 Annex	RIT AeroDesign	God made the integers, all else is the work of man
Laplace Transform	On color photography and artificial light	matrix multiplication	Why do geeks love Robert Heinlein?
partial derivative	Symplectic	Conjunctivitis	fermion