inverse function theorem

The inverse function theorem is the foundation stone of calculus on manifolds, that is, of multivariable calculus done properly. It says that if f: Rⁿ → Rⁿ is continuously differentiable, and the derivative Df(x) at a point x is an invertible matrix, then f itself is actually invertible near x, and the inverse is also continuously differentiable. Succinctly put, when a function is smooth enough, infinitesimal invertibility implies local invertibility. The chain rule then forces the derivative of f^-1 to be the right thing, that is Df^-1(f(x)) = Df(x)^-1. You may remember from one-variable calculus a rule of the form (f^-1)′(y) = 1 / f′(f^-1(y)). The inverse function theorem is the correct generalization of that rule to several variables.

Here is a setup for a formal statement and proof of the inverse function theorem. You don't need to understand every word to get the proof, but (except for Banach spaces) the following notions should at least be familiar.

X and Y are Banach spaces. (If you don't know about Banach spaces just read both X and Y as Rⁿ. The inverse function theorem holds for maps of Banach spaces using exactly the same proof as for Rⁿ, so we might as well use that generality.)
U is an open neighborhood of x₀ ∈ X; f: U → Y is a function, and y₀ = f(x₀).
The derivative Df(x₀) of f at x₀, if it exists, is a member of the Banach space L(X, Y) of continuous linear operators from X to Y. If there exists T ∈ L(X, Y) such that
||f(x) - f(x₀) - T(x - x₀)||_Y = o(||x - x₀||_X) --- that is, ||f(x) - f(x₀) - T(x - x₀)||_Y / ||x - x₀||_X → 0 as x → x₀ ---
for x in a neighborhood of x₀, then we say that f is differentiable at x₀ and T is the derivative Df(x₀). (In the case X = Y = Rⁿ, every linear map from X to Y is continuous, and L(X, Y) is just the space of all n-by-n matrices.)
Since the derivative Df takes values in a Banach space, we can ask whether it is continuous. If Df: U → L(X, Y) is continuous, we say that f is continuously differentiable, or C¹ for short.
Of course, Df may also be differentiable, and we may get a continuous D²f: U → L(X, L(X, Y)), in which case f is said to be C². (Actually, D²f always lies in the subspace S of L(X, X; Y) := L(X, L(X, Y)) consisting of bilinear maps which are symmetric in their two arguments; you may know this fact as the equality of mixed partial derivatives.) In general the kth derivative of f at a point is a symmetric k-multilinear map on X with values in Y. We say that f is C^∞ or smooth if it is C^k for every natural number k. (There are a few contexts in which "smooth" means only C¹ rather than C^∞.)

Now we can give the statement:

Inverse function theorem. Suppose f: U → Y is C¹. Say that g: V → X is a local inverse for f at x₀ if

V is an open neighborhood of y₀ = f(x₀), and g is C¹ on V;
there is a smaller neighborhood x₀ ∈ U' ⊂ U so that f(U') ⊂ V and (g o f)|_U' is the identity map 1_U' (g is a left inverse of f near x₀);
there is a smaller neighborhood y₀ ∈ V' ⊂ V so that g(V') ⊂ U and (f o g)|_V' is the identity map 1_V' (g is a right inverse of f near y₀).

Then for such a local inverse g to exist, it is necessary and sufficient that the derivative Df(x₀) ∈ L(X, Y) be bijective (a linear homeomorphism); and in this case g is unique.

A pedant might insist that g is only unique in the "sheaf-theoretic" sense that any two choices g₁ and g₂ coincide when restricted to the intersection of their domains --- since f winds up having a local inverse over any sufficiently small neighborhood of x₀ and pedantically speaking two functions with different domains are unequal. This is strictly true but it's morally not the point. If you don't understand the significance of this remark, ignore it.

In fact the two conditions can be separated: f has a local left inverse at x₀ iff Df(x₀) has a left inverse A ∈ L(Y, X) (that is, A Df(x₀) = 1_X), and f has a local right inverse at x₀ iff Df(x₀) has a right inverse B ∈ L(Y, X) (Df(x₀) B = 1_Y). (In case X is finite dimensional --- and only in this case --- A exists iff Df(x₀) is injective, and B exists iff Df(x₀) is surjective.) However uniqueness no longer holds in the one-sided case.

Proof of the theorem.

There are an awful lot of words in this proof because I'm trying to explain the motivation for what we do. If you want the concise and elegant version, read the reference I'm expanding on for this writeup, that is Theorem 1.1.7 of The analysis of linear partial differential operators by Lars Hörmander.

1. Necessity is obvious from the chain rule: If g is a local inverse for f at x₀, then the equations

(g o f)|_U' = 1_U' and (f o g)|_V' = 1_V'

imply (taking derivatives) that

Dg(y₀) Df(x₀) = 1_X and Df(x₀) Dg(y₀) = 1_Y

and this says exactly that Df(x₀) is invertible in L(X, Y), with inverse Dg(y₀). --- The other direction is the meat of the theorem:

2. (If you get lost skip to 3 below.) First let's simplify the problem a bit. Notice that if g_L is a local left inverse and g_R is a local right inverse for f at x₀, then for y in the intersection of their domains,

g_L(y) = (g_L o f o g_R)(y) = g_R(y);

hence g_L = g_R on a smaller neighborhood of y₀, and this function is a local two-sided inverse for f. Thus it's enough to prove separately that each local one-sided inverse exists.

Next, observe that if A is a left inverse for Df(x₀), if we set F = A o f, then by the chain rule

DF(x₀) = DA(f(x₀)) Df(x₀) = A Df(x₀) = 1_X

since the derivative of a continuous linear map is itself. Now if F has a local left inverse G near x₀ (F and G are both maps X → X) then G o F = G o A o f = 1 in a neighborhood of x₀; thus defining g = G o A gives a local left inverse for f itself. Similarly, if B is a right inverse for Df(x₀), put

F = f o B; DF(x₀) = Df(x₀) B = 1_Y;

and if G is a local right inverse for F near y₀ (now F and G are maps Y → Y) then F o G = f o B o G = 1 shows that g = B o G is a local right inverse for f. What we have done is reduce the problem of constructing a local left or right inverse for f, to that of constructing a local left or right inverse for a map F whose derivative is known to be the identity (on either X or Y, it works the same).

3. Now let's adjust our notation a little bit to the simplified situation: we have a C¹ function F: Z → Z, with F(x₁) = y₁ ∈ Z, and DF(x₁) = 1_Z. Here Z is either X or Y, x₁ is either x₀ or Bx₀, and y₁ is either Ay₀ or y₀, according as we chose F = A o f to get a local left inverse for f, or F = f o B to get a local right inverse for f. By the last paragraph, we are reduced to proving that in this case F has a local two-sided inverse at x₁. Any norm || || without a subscript is the norm on Z, || ||_Z.

To get a local inverse we first need f to be locally injective near x₁. Because Df(x₁) = 1_Z, and Df is continuous (f is C¹), there must be a small neighborhood of x₁ where Df is almost 1_Z: choose δ > 0 such that

||Df(x) - 1_Z||_{L(Z; Z)} < 1/2 when ||x - x₁|| ≤ δ.

Suppose x and y are two points in this ball B(x₁; δ). Then applying the mean value theorem to the function g(x) = f(x) - x gives

||f(y) - f(x) - (y - x)|| ≤ ||y - x|| sup_0<t<1 ||Dg(x + t(y - x))||_{L(Z; Z)}.

Since Dg(x) = Df(x) - 1_Z, and we just said that ||Df(x) - 1_Z||_{L(Z; Z)} < 1/2 for every point in B(x₁; δ), what this says is that

||f(y) - f(x) - (y - x)|| ≤ ||y - x|| / 2, i.e., ||f(y) - f(x)|| ≥ ||y - x|| / 2.

In particular, for x, y ∈ B(x₁; δ), if x ≠ y then f(x) ≠ f(y). That is, f is locally injective near x₁. This pattern of argument may seem complicated but is quite fundamental.

Now we can attempt to solve the equation f(x) = y for x, given y near y₁; the local injectivity of f tells us that if we find one solution for x it's the only solution. We do this by iterative approximation. Fix y ∈ B(y₁; δ/2), and define x₂, x₃, ... ∈ B(x₁; δ) by

x_k+1 = x_k + y - f(x_k).

We show by induction that ||x_k+1 - x_k|| < 2^-k δ, and consequently (by the triangle inequality) x_k+1 ∈ B(x₁; δ) for each k. First of all

||x₂ - x₁||_Z = ||y - y₁|| < 2^-1 δ,

and then by the mean value theorem inequality above,

||x_k+1 - x_k|| = ||x_k - f(x_k) - (x_k-1 - f(x_k-1))|| ≤ ||x_k - x_k-1|| / 2 < 2^-k δ.

But this tells us that {x_k} is a Cauchy sequence, and since Z is complete there is a limit x_∞ ∈ B(x₁; δ). By continuity of f, x_∞ is a fixed point of our iteration:

x_∞ = x_∞ + y - f(x_∞), i.e., f(x_∞) = y.

So we have constructed a function g(y) = x_∞, defined for ||y - y₁|| < δ/2, which is a local inverse for f.

If you have recently studied metric spaces you may recognize that I have essentially repeated the proof of the contraction mapping theorem. The construction, not the theorem per se, is what's important.

4. It remains to prove that g is actually C¹ near y = y₁ (it would be no good if our smooth function had a rough inverse). Choose two points y, y + k ∈ B(y₁; δ/2), and write g(y) = x, g(y + k) = x + h. We know that f is differentiable at x, so that

k = f(x + h) - f(x) = Df(x)h + o(||h||).

What we really want is the reverse, where Dg(y) ought to be Df(x)^-1:

h = g(y + k) - g(y) = Df(x)^-1k + o(||k||).

The first equation is equivalent to

h = Df(x)^-1k - o(||Df(x)^-1||_{L(Z; Z)}||h||);

since we know ||Df(x)^-1||_{L(Z; Z)} < 2 for every x ∈ B(x₁; δ) (from ||Df(x) - 1_Z||_{L(Z; Z)} < 1/2), it suffices to prove that a function which is o(||h||) is also o(||k||). But again our mean value theorem relation gives

||k - h|| < ||h/2||, hence ||h||/2 < ||k|| < 2||h||

which shows that h and k have the same asymptotic order near zero, thus that

g(y + k) - g(y) = Df(x)^-1k + o(||k||), i.e., Dg(y) = Df(g(y))^-1.

Since Df(g(y))^-1 is continuous in y (f is C¹, g is continuous, and the inversion map is smooth), this shows that g is C¹ near y₁, which completes the proof. ///

Reference: Lars Hörmander, The analysis of linear partial differential operators, volume 1, theorem 1.1.7. Springer-Verlag 1983, 1990.

Banach space	Calculus on Manifolds	implicit function theorem	Chain rule
contractive sequence	Multivariable calculus	Mean value theorem	triangle inequality
composite function	partial derivative	Law of Negative Reversal	inverse trigonometric function
Riemann mapping theorem	inverse function	surjective	diffeomorphism
Submersion	Convolution	Ancient Scottish tradition of basing your food on a dare	Fourier series
neighborhood	injective	Derivative

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups