Notation: We will choose all
vectors to be column vectors, and symbols denoting vectors
or matrices will be written in
bold.
The Jacobian is defined for a vector function of multiple
variables. Consider a function
F(X), where X is an n-element vector and F(X)
is an m-element vector. We can write it as
/ / x1 \ \ / f1(x1, ..., xn) \
| | x2 | | | f2(x1, ..., xn) |
F| | . | | = | . |
| | . | | | . |
| | . | | \ fm(x1, ..., xn) /
\ \ xn / /
Here we have explicitly written the
m-
dimensional vector function
F as a vector of
m real-valued
functions, one to provide
each component of the vector. We have also split the input vector
X
into its
n components.
Formally, the Jacobian is defined as an m by n matrix such that
∂fi
Jij = ---
∂xj
The Jacobian can be though of as a vector and multi-variable extension
of the derivative of a real-valued function of a single variable.
For example, given an arbitrary real-valued function f(x), we might know the value
of the function at one point x = a and want to determine the value of
f at other points very close to a (a is constant):
f(a + Δx) = f(a) + Δf
Single-variable
calculus tells us that
Δf df
-- ≈ --
Δx dx
for a small change Δ
x, which means that
df
Δf ≈ Δx --
dx
That means that
df |
f(a + Δx) ≈ f(a) + Δx -- |
dx | x = a
Now imagine a different
function f(x, y); this is a function of two
variables. We are interested in the value of
f near
(x, y) = (a, b). If
y is held
constant while
x varies, then
f is really just a function of a
single
variable x, so that as before
∂f |
f(a + Δx, b) ≈ f(a, b) + Δx -- |
∂x | (x, y) = (a, b)
and
| ∂f |
Δf | ≈ Δx -- |
| Δy = 0 ∂x | (x, y) = (a, b)
exactly as before. (I wrote ∂f instead of df to indicate that all
derivatives are now
partial derivatives, which are what you need when
you deal with a function of multiple variables. A partial derivative is
obtained by differentiating a
function of multiple variables by a single
variable, while treating all the other variables as constants.)
Similarly, if x is held constant while y varies then
| ∂f |
Δf | ≈ Δy -- |
| Δx = 0 ∂y | (x, y) = (a, b)
So if we change
x, holding
y constant, and
then we change
y, holding
x constant (or if we do it the other way around), then
as long as the changes were small enough that the partial derivative stayed roughly constant we can write the total change in
f as
/ ∂f ∂f \ |
Δf ≈ | Δx -- + Δy -- | |
\ ∂x ∂y / | (x, y) = (a, b)
This can be extended to a function of arbitrarily many variables. But what
does any of this have to do with the Jacobian? Consider our function
f. From
the definition above, the Jacobian will be a 1x2
matrix, which is almost
degenerate but will do for the sake of example. Then
/ ∂f ∂f \
J = | -- -- |
\ ∂x ∂y /
So that we can use
vector notation for everything let
/ Δx \
ΔX = | |
\ Δy /
which is 2x1, and let Δ
F = [ Δf ], which is 1x1 (like I said, almost
degenerate). Then
/ ∂f ∂f \ / Δx \ ∂f ∂f
J ΔX = | -- -- | | | = Δx -- + Δy -- ≈ ΔF
\ ∂x ∂y / \ Δy / ∂x ∂y
So that means that
ΔF ≈ J ΔX
which is the
equivalent of the single-variable relation
df
Δf ≈ -- Δx
dx
with the Jacobian standing in for the single-variable derivative. The
simplest possible multi-variable case was shown here, but this generalizes
to a vector of functions
f1, f2, ..., fm of
a vector of variables
x1, x2, ..., xn.
Many single-variable results are easily generalized to the multi-variable case
simply by replacing the derivative with the Jacobian. For example, Newton's
method for the solution of non-linear equations generalizes to the
Newton-Raphson method, and chain rules are pretty much exactly what you'd guess.
None of this is rigorous.