The goal of this writeup will be to take the quantitative concept of curvature, concretely defined in three-dimensional space, and abstract the quantity further and further, retaining as much of the meaning as possible, but allowing us to eventually generalize it to describe incredibly abstract spaces and physical phenomena. As the concrete visual picture slowly becomes an abstract mathematical one, the terminology will eventually become much more technical and specialized. I apologize for this, but hope that you realize that it's inescapable; abstract formalism is necessary to describe abstract concepts, and I can't (and shouldn't try to) spend this entire node explaining my notation. In any case, I've tried to use enough hardlinks to help guide the curious mind.
Radius of Curvature
We start in (what should be) familiar waters. The object in question is a curve. At each point on the curve, the curve is generally "curving", as is their wont. The radius of curvature at a given point is just the radius of the circle that's the closest approximation to the curve at that point. If the curve is a straight line (or at least has constant velocity at that point), the radius of curvature becomes infinite.
How would we calculate this quantity? It should come as little surprise that curvature is directly associated with acceleration. A straight line has zero acceleration; that is, its velocity vector is constant. A circle has constant acceleration toward its center. This can be confirmed directly. We parameterize a circle as follows:
β(t) = (Rcos(t), Rsin(t), 0)
(We assume we are working in three-dimensional space, though we won't yet use the third coordinate)
Since we can reparameterize this curve any way we like by t' = f(t), our notions of "velocity" and "acceleration" will be ambiguous. We can resolve this ambiguity by always choosing a parameterization so that the curve is parameterized by arclength.
dβ/dt = (-Rsin(t), Rcos(t), 0)
In our case, our arclength parameter will be s = Rt.
β(s) = ( Rcos(s/R), Rsin(s/R), 0)
dβ/ds = ( -sin(s/R), cos(s/R), 0)
Since we are always parameterizing curves by arclength, the velocity will always be normalized:
|dβ/ds|2 = 1
Now, the acceleration of the curve will just be its second derivative with respect to arclength:
d2β/ds2 = -1/R ( cos(s/R), sin(s/R), 0)
The acceleration of the circle is a vector pointing towards the center of curvature, whose magnitude is 1/(Radius). Note that since we used purely local quantities to derive this relationship, it is going to remain true when β is just a curve being approximated by a circle at some point. Thus,
Radius of Curvature = 1/(magnitude of acceleration) for any curve parameterized by arclength.
Now, I suppose it's good news that we can now calculate the radius of curvature for an arbitrary curve. However, the acceleration will generally be the more useful quantity; at this point, we can even state "acceleration = curvature" for arbitrary curves. Using this definition, we get zero curvature for a straight line, and large curvature for really bendy curves which have a small radius of curvature. This is exactly the kind of intuitive description we'd want for this quantity. Therefore, at this point we informally state our first definition for curvature:
Curvature is Acceleration
This is incredibly satisfactory, if we only want to describe one-dimensional curves moving through three-dimensional space. However, many find that there are other objects, like spheres and donuts, which we will also want to quantitatively characterize with some notion of "curvature". Thus, we leave our small one-dimensional world, and abstract the definition one dimension further.
From One to Two Dimensions
A two-dimensional surface is certainly a step more abstract than a simple one-dimensional curve. However, we can still concretely embed these surfaces into three dimensions, so our notions of curvature will still carry some amount of visual bias, i.e. objects which have curvature will tend to "look curvy".
Now, what is meant mathematically by a "surface"? For now, we can define a surface to be a smoothly varying vector function of two parameters. For example, the sphere can be parameterized by the azimuthal and axial angles, θ and φ:
f(θ,φ) = (R sinθ cosφ, R sinθ sinφ, R cosθ)
The surface of a donut can also be expressed in terms of two angles:
g(θ,φ) = (( R + a sinθ) cosφ, ( R + a sinθ) sinφ, a cosθ)
So, for our current purposes, a surface can be considered a smootly varying three-dimensional function of two parameters. There are some requirements that the parameters give independent directions on the surface, but we'll worry about that later. For now, this definition will do.
It is useful to think of a surface as being built from the set of all possible curves defined on the surface. Since we already know how to measure the curvature of curves, we can use this knowledge to develop notions of curvature in the surface f(s,t). Since the curves are restricted to lie on the surface, this forces them to have a certain amount of curvature. A curve defined on a sphere is going to curve at least as much as the sphere itself. Curvature for a surface, then, might be defined as the extent to which curves within the surface are forced to accelerate, simply due to the fact that they are restricted to the surface.
Now, it is entirely possible for a curve to have greater acceleration than the surface requires. In other words, curves can accelerate within the surface in addition to their acceleration as a result of the surface's curvature. For example, a great circle on a sphere accelerates only due to the sphere's curvature, but any other curve on the sphere has an additional component to its acceleration. It will be important for us to mathematically separate these two types of acceleration, which we can do.
At each point p on the surface, there exists the useful notion of a tangent plane to the surface. The velocity of any curve β in M through p will lie in the tangent plane. Now, there is a special type of curve known as a geodesic, whose acceleration is always perpendicular to the tangent plane. In this case, its acceleration is always due to the curvature of the surface itself; it has no component of acceleration along the surface. Thus, the notion of a geodesic separates these two types of acceleration for us. It is a fact that for each point p in the surface, given a velocity in the tangent plane, there is a unique geodesic which passes through that point with that velocity. Since its acceleration is always perpendicular to the tangent plane, it has no acceleration from the perspective of the two-dimensional universe that is the surface. Geodesics also have the property that they are the paths of minimal distance between any two points along them. We will see more on this later.
Geodesics are useful because they help us to formulate our notion of curvature in a surface much more clearly. We can now define a kind of directional curvature, given a point p and a direction in the tangent plane, V. For a smooth surface M, given a point p and a tangent vector V, we will tentatively define "directional curvature" k(p,V) to be the acceleration of the geodesic with velocity V at p.
This is a well-defined quantity, but not very useful. We'd rather have some quantity which doesn't require us to input a direction; we'd like some number K(p) which just gives us "curvature" at each point p. Hold your horses; we'll get there.
The tangent plane can be spanned by two orthonormal basis vectors, e1 and e2. There also exists the unit normal to the surface, defined by taking the cross product of basis vectors.
U = e1 × e2
Note that the unit normal is not unique; U → -U gives an equally reasonable unit normal, which we would have constructed had we reversed our choices for basis vectors. Since e1 and e2 are orthonormal, U is a unit vector, and moreover is orthogonal to the tangent plane. e1, e2 and U provide what is known as an orthonormal frame at p.
So how does this help us? Well, it helps us to define what is known as the shape operator S = -∇U. The ∇ means rate of change in a given direction within the surface (since the unit normal is only defined on the surface), so the shape operator requires the input of a direction V in the tangent plane. Notice that since U is a unit vector, ∇VU must ouput a vector perpendicular to U (because ∇V(U • U) = 2 U • ∇VU = 0). In other words, S also outputs vectors in the tangent plane. So, S is a linear map from vectors in the tangent plane to vectors in the tangent plane. Thus, S can be thought of as a 2 × 2 matrix. Let's look at its operation more closely.
First, we look at our orthogonal basis in the context of velocities of geodesics. We can always choose a geodesic β1(s) whose velocity at p is e1, and a geodesic β2(t) whose velocity is e2. Let's say we choose our direction for ∇ to be e1. Then ∇ = ∂/∂s, the rate of change along the curve β1.
S = -∂/∂s (e1 × e2)
= -∂e1/∂s × e2 - e1 × ∂e2/∂s
Now, the first term here is orthogonal to e2, and the second term is orthogonal to e1. Since this quantity must lie in the tangent plane, the first term must be in the e1 direction, and the second in the e2 direction. Let's now express e1 and e2 as velocities ∂β1/∂s and ∂β2/∂t.
S = -∂2β1/∂s2 × ∂β2/∂t - ∂β1/∂s × ∂2β2/∂s∂t
Since the first term is in the e1 direction, and the second in the e2 direction, we can explicitly write down the matrix coefficients, assuming that we can write down similar terms for ∇ acting in the e2 direction. I omit writing down the explicit matrix, as it is cumbersome.
Now, the matrix entries for S are dependent on our choice of basis. Moreover, we know from linear algebra that it is possible to diagonalize S if we choose an appropriate basis. Such a basis is known as a principal basis for the tangent plane, and the corresponding matrix just looks like:
| k1 0 |
S = | 0 k2 |
Where k
1 and k
2 are known as the
principal curvatures of the surface at p. We have broken down the problem of defining curvature at a point in an arbitrary direction in two dimensions by defining the curvatures of two principal directions. It is fairly easy to check that k
1 and k
2 are just the curvatures of the
principal geodesics given by the principal directions at p. In the direction of e
1, U is rotating into e
1; the plane spanned by these two vectors is rotating about e
2. Since they are both unit vectors, e
1 is changing at exactly the same rate as U, meaning the rate of change of velocity is the same as the rate of change of the unit normal, for a principal geodesic. Thus, the shape operator can be associated with our original intuitions about the acceleration of curves, if we choose to evaluate it in a principal basis.
Independence of basis choice
We want to define curvature independent of our choice of basis. Given a point p in the manifold, we know that the shape operator can be diagonalized if we choose a special basis, but we should be able to draw a basis-independent connection between the shape operator and some notion of curvature. This is not too hard.
Given a linear square matrix S, there are generally a couple of basis-independent quantities we can define; namely, the trace and the determinant. For a 2 × 2 matrix,
tr S = S11 + S22
det S = S11S22 - S21S12
We could prove that these two quantities are basis-independent, but our time would be better spent defining the two notions of curvature that these two quantities represent.
Mean Curvature H is the trace of the shape operator.
Gaussian Curvature K is the determinant of the shape operator.
Note that in terms of principal curvatures,
H = k1 + k2
K = k1k2
Also note that H is dependent upon the choice of unit normal, ±U, but K is independent of this choice. If K is positive, it means k1 and k2 have the same sign, which means they are "curving" in the same direction, and the surface locally looks like a "bump". If K is negative, it means k1 and k2 have opposite signs, meaning they are curving in opposite directions, and the surface locally looks like a "saddle".
A minimal surface is one in which the mean curvature H = 0. This, of course, requires that the gaussian curvature K ≤ 0. Essentially, this means that it's curving equally and oppositely in the two respective principal directions at any given point, hence it looks like a symmetric saddle (assuming K ≠ 0). A flat plane is naturally a minimal surface. A nontrivial example of a minimal surface is a catenoid.
At this point, it would do us some good to look at an example or two to see how all of these notions of curvature play out.
Curvature on the Sphere
It is fairly easy to verify that geodesics on the sphere are just great circles, i.e. circles which divide the sphere into two equal hemispheres. The curvature of any such circle is exactly 1/R, where R is the radius of the sphere. At a given point, all directions are principle directions, because of the maximal symmetry of the sphere. So, we can arbitrarily choose two orthogonal directions at any point, and the shape operator will just be the identity times 1/R. Thus, at any point on a sphere, the mean curvature is 2/R, while the gaussian curvature is 1/R2.
We can verify this explicitly, given the formulae we have written down for the sphere and the shape operator.
f(θ,φ) = R(sinθ cosφ, sinθ sinφ, cosθ)
At a point given by (θ0, φ0), we can reparameterize θ and φ so that they give us unit velocity vectors.
f(s,t) = R(sin(s/R) cos(t/Rsinθ0), sin(s/R) sin(t/Rsinθ0), cos(s/R) )
∂f/∂s = (cos(s/R) cos(t/Rsinθ0), cos(s/R) sin(t/Rsinθ0), -sin(s/R) )
∂f/∂t = (1/sinθ0) (-sin(s/R) sin(t/Rsinθ0), sin(s/R) cos(t/Rsinθ0), 0)
You can check that these vectors are orthonormal.
We could evaluate U by taking the cross product of these vectors, but it is easier to note that it is simply f/R.
U = (sin(s/R) cos(t/Rsinθ0), sin(s/R) sin(t/Rsinθ0), cos(s/R) )
Now, we calculate the shape operator, S = ∇U. It is very easy to write this operator in terms of the velocities ∂f/∂s and ∂f/∂t, since U = f/R:
∂U/∂s = 1/R ∂f/∂s (no component along ∂f/∂t)
∂U/∂t = 1/R ∂f/∂t (no component along ∂f/∂s)
Thus, the shape operator is, in fact, the 2 × 2 identity matrix divided by R.
H = 2/R
K = 1/R2
The Curvature of a Torus
Since the torus (commonly called a "donut" in pastry circles) has less symmetry than the sphere, the curvature will vary from point to point, giving us a more interesting example of a surface with curvature. As stated above, the torus can be explicitly represented with the following function:
g(θ,φ) = (( R + a sinθ) cosφ, ( R + a sinθ) sinφ, a cosθ)
We will eventually show that the θ-direction and φ-direction are principal directions at any given point on the torus. For now, they are just arbitrary orthogonal coordinates. We again need to normalize them so that we have an orthonormal basis (which is what we use to define the shape operator).
At a given point, (θ0, φ0), we will get unit velocity vectors if we reparameterize to (s,t) = (aθ, (R + a cosθ0)φ). This notation will be cumbersome, so we will instead remember to divide by the appropriate normalization constant when taking θ and φ derivatives.
e1 = (1/a) ∂g/∂θ = (cosθ cosφ, cosθ sinφ, -sinθ)
e2 = (1/(R + a sinθ)) ∂g/∂φ = (-sinφ, cosφ, 0)
U = e1 × e2 = (sinθ cosφ, sinθ sinφ, cosθ)
Now, the shape operator:
(1/a) ∂U/∂θ = (1/a)(cosθ cosφ, cosθ sinφ, -sinθ) = (1/a) ∂g/∂s
(1/(R + a sinθ)) ∂U/∂φ = (1/(R + a sinθ)) (-sinθ sinφ, sinθ cosφ, 0) = (sinθ/(R + a sinθ)) ∂g/∂t
We now see that θ and φ do indeed provide principal directions for our shape operator; the rate of change of U in the θ direction is proportional to the θ-velocity vector, and likewise for φ. To put it another way, our shape operator is already diagonal again:
| 1/a 0 |
S = | 0 sinθ/(R + a sinθ) |
And our mean and gaussian curvatures are:
H = (R + 2a sinθ)/a(R + a sinθ)
K = sinθ/a(R + a sinθ)
Note, for points on the inner and outer circles (θ = 0 and θ = π), at which points θ and φ trace out geodesics, the principal curvatures are exactly 1/(radius of curvature) for the θ and φ curves, respectively.
Shedding the Ambient Space
Eventually, we will want to generalize to higher-dimensional spaces. Before we do, there is one further level of abstraction necessary. This step will be mostly a conceptual shift in viewpoint, but it will require us to reformulate the way we do calculations. Up to this point, we have been embedding our two-dimensional surface in three-dimensional space. This is not always convenient, nor is it always possible. We wish to formulate an intrinsic notion of curvature, from the perspective of a person living in the two-dimensional space, unaware of any higher-dimensional space it's sitting in. The 2-D observer only knows about distances and angles of curves along the surface. How would you go about calculating the curvature of the space in which you exist, when your measurements are restricted to calculations within the space itself?
As it turns out, of all the quantities we've written down that describe curvature of a surface, the only quantity which is intrinsic to the surface is Gaussian Curvature. Gaussian Curvature measures whether your surface locally looks like a sphere or a saddle, and how small a sphere or sharp a saddle. These notions seem extrinsic at first, but notice they can be inferred from the trajectories of geodesics; specifically, geodesics have a tendency to converge on a sphere, and diverge on a saddle. This causes differences in quantities like the sum of angles in a triangle (whose sides are geodesics). In euclidean geometry, the sum of angles in a triangle is just 180° (or π radians), but this changes on a curved surface. Explicitly,
θ1 + θ2 + θ3 = π + ∫ K dA
The difference from our euclidean 180° total is just the total curvature over the area of the triangle. The proof requires a lot of math, and is thus omitted.
Gaussian curvature is intrinsic to the surface, derivable directly from a distance function on the surface (the formula being really big and not all that enlightening). Thus, if two surfaces have the same distance function, but sit in three dimensions differently, they will have the same Gaussian Curvature. For example, the plane has the same distance function as a cylinder; you can curl a piece of paper into a cylinder. Note you can't curl a piece of paper into a sphere (without ruining your piece of paper). This is because bending the paper without deforming it preserves distances along the paper, and hence must preserve its gaussian curvature (which is equal to zero). Examples of surfaces of the same negative curvature are the catenoid and the helicoid.
There are many nice theorems which relate the local concept of curvature to the surface's global properties. For example, all compact surfaces embedded in three dimensions must have some point of positive curvature. The most amazing theorem is Gauss' Theorema Egregium, which relates the total curvature of a surface of finite area to its global topology, characterized by the euler characteristic χ(M). χ is a number which can be thought of as characterizing the number of "holes" or "handles" in the surface. In the case of the sphere and the torus, χ(S2) = 2, and χ(T2) = 0. No matter how you contort or deform a surface, its euler characteristic remains the same. Thus, no matter how much you contort or deform the surface, its total curvature remains the same:
∫ K dA = 2π χ(M)
You can check that these relations hold for the sphere and the torus, independent of the values of R and a.
From two to n Dimensions
Sadly, when we move from two to three and higher dimensions, we no longer have such a nice scalar quantity as Gaussian curvature. We will need yet another way of formulating curvature, similar to what we have done so far, but generalizable to any arbitrary number of dimensions. We have seen that our notions of curvature mathematically amount to the way our orthonormal frame transforms as we move around the surface. The rate of change of the frame is tantamount to acceleration for a unit-velocity curve, and the rate of change of the frame is also how we defined the shape operator. If we can describe the frame without any reference to an ambient space, it will then be possible to formulate a notion of curvature associated with how a frame transforms, which is independent of how the surface sits in three dimensions. The curvature will still only be dependent on local measurements of distances and angles.
In order to formulate curvature in this manner, we need to first overhaul our formalism for everything we've been studying, including the surface itself. If we continued to define a surface as an embedding into higher dimensions, our formalism for curvature on the surface would not be intrinsic. Thus, in the move from two to n dimensions, we also move from the language of surfaces to the language of manifolds. At this point, I'll be making references to tensors, tangent spaces, and metrics, without explaining what they are. If you haven't heard about these things but would like to continue on, it would be good to read related nodes first.
Our surface has a metric, g. The metric is a (0,2) tensor which tells us how to take dot products of vectors on the surface. From this information we can derive distances on the surface. For if we have a curve β(t) with velocity dβ/dt, we can measure the length of the curve by integrating the magnitude of the velocity,
S = ∫ [√(dβ/dt • dβ/dt)]dt
Since we have a distance function on the surface, we can define geodesics intrinsically, in that they are the paths of minimal distance between two given points. This amounts to a specific mathematical formula, which we will write down momentarily.
In order to define curvature, we need an intrinsic notion of parallel transport. Previously, we defined curvature as the extrinsic change in the unit normal. We did not need to compare tangent vectors in the tangent space; we simply compared unit normals as they sat in three dimensions. Our new approach will measure the rate of change of the frame, and this rate of change must be intrinsically defined, meaning we need to compare vectors in different tangent spaces. Since we have a metric, this is possible, but it is nontrivial. Parallel transport can be well-defined in terms of geodesics. Specifically,
Geodesics parallel transport their velocity vectors.
Thinking about this intuitively, it should make sense. Since geodesics were earlier defined to have acceleration perpendicular to the tangent plane, when we redefine them intrinsically they become the curves with zero acceleration, i.e. their velocity is constant.
We will define parallel transport using geodesics, then define curvature using parallel transport. The procedure vaguely goes like this:
- Find general equations of motion of geodesics along our coordinate system
- Determine quantitatively how geodesics transport their velocity vectors
- Define parallel transport by the equations derived
- Use parallel transport to determine the rate of change of the frame along a path
- Define curvature based on the results.
A geodesic can be found by minimizing distance between two points. This amounts to minimizing the interval,
S = ∫ [√(dβ/dt • dβ/dt)]dt
We write β in component form, {xi(t)}, and explicitly write in the metric:
S = ∫ [√(gij dxi/dt dxj/dt)]dt
We seek to find the curve xi(t) which minimizes this integral. This problem falls into the realm of variational calculus. The integral is minimized if we can vary the curve xi → xi + δxi and find to first order,
δS = 0
This eventually leads to the following equations of motion (I choose not to derive them explicitly, as it involves many lines of calculation, none of which is enlightening):
gij d2xj/dt = (½ ∂igjk - ∂kgij) dxj/dt dxk/dt
We can rewrite this, defining the inverse matrix gij = gij-1:
d2xi/dt = gin (½ ∂ngjk - ½ ∂kgnj - ½ ∂jgkn) dxj/dt dxk/dt
This is known as the geodesic equation:
d2xi/dt + Γijk dxj/dt dxk/dt = 0
Where Γijk = ½gin (∂kgnj + ∂jgkn - ∂ngjk)
Γ is a triple-indexed object known as the Levi-Civita connection. It gives us the equations of motion for geodesics in space with a metric, and it will also give us our rule for parallel transport. In this way, it literally provides a "connection" between different tangent spaces; parallel transport tells us how to compare vectors from one tangent space to another.
We seek to rewrite the geodesic equation in terms of parallel transport. In other words, what does this equation tell us about the velocity vector Vi to the curve?
dVi/dt + ΓijkVj Vk = 0
Now, taking the derivative with respect to t is the same as taking the directional derivative:
Vj ∂j Vi + ΓijkVj Vk = 0
Vj(∂j Vi + Γijk Vk) = 0
This will become our equation for parallel transport, but first we must generalize to include any direction of transport (not just along the geodesic with velocity Vi). To do so, we interpret the Vj outside the parentheses as the direction of differentiation, and just drop it, leaving us with a derivative in the j-th coordinate direction:
∂j Vi + Γijk Vk = 0
This equation is supposed to describe a constant velocity. Normally we would associate such a vector with the equation:
∂j Vi = 0
However, this additional term involving Γ shows up. This term tells us how to properly transport the vector. To take this a step deeper we may note that the equation we wrote down directly above transforms nontrivially under coordinate transformations, meaning we cannot set it impartially to zero. This "equation" is therefore meaningless, since it's not invariant. The equation we wrote down for parallel transport, however, does transform properly; Γ transforms in just such a way to cancel the terms picked up by transforming the first term.
Vector transformation issues aside (for now), we have found the proper equation for parallel transport. This concept is so useful that we define the covariant derivative ∇j as a derivative which acts on vectors in the following way:
∇j Vi = ∂j Vi + Γijk Vk
Covariant differentiation is therefore our generalization of differentiation for curved space with a metric. We can now define parallel transport with the following concise equation:
∇j Vi = 0
A vector field that satisfies this equation is parallel-transported along the jth direction. It is the curved-surface equivalent of being a constant vector field. The equation allows us to write the vector field explicitly in a taylor expansion:
Vi(xj + Δxi) = Vi(xj) + Δxj ∂jVi|xj (to first order)
= Vi(xj) - Δxj Γijk Vk(xj)
Now, we can iterate this formula twice if we want to parallel transport it twice, i.e. we want to taylor expand in Δ1xj + Δ2xj:
Vi(xj + Δ1xj + Δ2xj) = Vi(xj + Δ1xj) - Δ2xj Γijk Vk(xj + Δ1xj)
Don't forget, when we do the second taylor expansion, we also have to expand the Levi-Civita connection Γ(xj + Δ1xj) = Γ(xj) + Δ1xj ∂jΓ
= Vi - Δ1xj Γijk Vk - Δ2xj (Γijk + Δ1xl ∂lΓijk) (Vk - Δ1xm Γkmn Vn)
Now, in doing this calculation we implicitly assumed that we are following a path from x to (x + Δ1x) to (x + Δ1x + Δ2x). However, it is possible to perform the two transports in the other order. It will, in general, give us a different result when we do so. The difference can be found by taking the above equation and antisymmetrizing it, i.e. subtracting the same equation with 1 ↔ 2:
V12 - V21 = Δ2xj (-Γijk Δ1xm Γkmn Vn + Δ1xl ∂lΓijkVk) - Δ1xj (-Γijk Δ2xm Γkmn Vn + Δ2xl ∂lΓijkVk)
= Vj Δ1xk Δ2xl (∂k Γijl - ∂l Γijk + Γikm Γmjl - Γilm Γmjk)
And when the dust settles, we can write the difference in the following form:
V12 - V21 = Vj Δ1xk Δ2xl Rijkl
Where Rijkl = ∂k Γijl - ∂l Γijk + Γikm Γmjl - Γilm Γmjk
Rijkl is known as the Riemann curvature tensor. It is a (1,3) tensor, so we might think of it as a linear operator which takes three vectors as input and one vector as output. The three vectors Rijkl takes as input are the original vector being transported, and the two directions along which it's transporting. The output vector is the difference in the resulting vector when we reverse the direction. Reversing our definitions for the two directions will just give us the same result with a minus sign, meaning Rijkl must be antisymmetric in its last two indices, which you can quickly check in its formula above.
We needed to describe curvature in this way; once we'd transported our vector, we had to compare it with another vector to see how this transportation transformed it. The only reasonable thing to compare it to was what we would have found had we taken a different path.
Curvature is the difference in parallel transport along different paths.
Other Curvature Quantities
Since Riemannian curvature is specified by a rank-4 tensor, we can formulate other quantities by contracting indices. The Ricci curvature tensor is easily found by contracting the first and third indices:
Rμν = Rαμαν
The Ricci scalar is found by contracting the two indices of the Ricci tensor:
R = Rαα = gμν Rμν
The Einstein Curvature tensor is a combination of these two quantities:
Gμν = Rμν + ½ gμν R
This tensor has the important property of being divergenceless or conserved:
∇μGμν = 0
Einstein used this curvature quantity to write down the Einstein Equation, describing the curvature of spacetime:
Gμν = (8πG/c4) Tμν
Where the constant G (not to be confused with Gμν) is the gravitational constant of the universe, and Tμν is the energy-momentum tensor. This equation tells us how the curvature of spacetime is influenced by the matter and energy present. Informally,
Spacetime Curvature = Matter and Energy density.
No Metric
It would seem at first that the notion of distance is a necessary piece of structure for defining curvature. However, all we really need to define curvature is parallel transport. Since parallel transport is given explicitly by a connection Γijk, we can write down a smoothly varying connection, and our definition for curvature immediately follows by the same formula as above. This more general connection is often called the Christoffel Connection. Notice that the geometric interpretation of the connection can be made explicit by operating on a coordinate basis vector, ej = ∂/∂xj:
∇k(ej)i = ∂k(ej)i + Γilk(ej)l = Γijk
(Remember, (ej)i = δij) In other words, Γijk is the rate of change of the ith component of the jth basis vector, parallel transported in the kth direction. This quantity is, of course, completely dependent on the choice of basis.
If we wanted, we could recover the Levi-Civita connection on a surface with a metric by requiring the conditions of metric compatibility,
∇igjk = 0
and zero torsion,
Γijk = Γikj
This second requirement that the Levi-Civita connection is torsion-free is given without reason, but currently all measurements of spacetime curvature have given us no reason not to trust this requirement. Some physical theories have been posited in which the connection has a very small torsion, but so far none of these theories have made any exciting predictions. When I originally wrote down the Levi-Civita connection, I did it using geodesics, and swept this condition under the rug, but it still exists as a mildly annoying unexplained physical axiom.
Describing the curvature of a space without a metric requires a relatively small amount of abstraction; we are no longer imposing the structure of a metric, but we still have the structure of a connection. However, this is a useful step in bridging the gap to our next great hurdle:
Generalizing Past the Tangent Bundle
We defined curvature on a manifold by transporting vectors from one tangent space to another. Thus, curvature is best thought of as a quantity defined on the tangent bundle of a manifold. In this light, we can now generalize this quantity to other vector bundles.
In order to make this jump, another conceptual reformulation must occur. When we formulated the concept of a connection Γijk, we did so using the transformation properties of coordinate basis vectors. Before we move into more general territory, we need to reformulate our definition of a connection (which was shaky at best anyway) and do so in the context of general basis vectors, not necessarily tied to any coordinate system. We define a connection to tell us how our basis transforms as we move along the manifold. Specifically, if we have a general vector basis {eμ(p)} for the tangent space TpM, , and a basis {e*ν(p)} for the cotangent space T*pM as well, we define the connection ω by how the basis transforms when we move a small distance in a given direction away from p. This derivative should be a vector, so we can express this as a linear combination of basis vectors:
∇ eμ = ωμρ ⊗ eρ
The notation here is somewhat different, but it is, in fact, consistent with our formulation earlier. The two-indexed object ωμρ is a one-form for a given value of μ and ρ,
ωμρ = ωμρν e*ν
In other words, ω is an n × n matrix of one-forms. The one-form component of ω tells you in which direction you're moving in the manifold to determine how the vector basis transforms.
It will be useful to us to determine how the components of ω transform when we use a different set of basis vectors, {fμ}. Each basis vector will be expressible as a linear combination of our old basis:
fβ = Λαβ eα
Then the equation for the f's looks like:
∇ fβ = ω'μν fν
∇ (Λαβ eα) = ω'μν Λρν eρ
Then we carry this out, letting ∇ act on eα to give us our old connection (relabeling indices to get the same basis vector eκ in each term):
dΛκβ eκ + Λαβ ωακ eκ = ω'μν Λκν eκ
Since this holds for each eκ, we drop this from our sum and act on the right with Λ -1:
dΛκβ Λνκ-1 + Λαβ ωακ Λνκ-1 = ω'βν
Writing this all out in abstract matrix notation (instead of all these summations), we arrive at the resulting change in ω for a change of basis of the tangent space:
ω' = Λ ω Λ -1 + dΛ Λ-1
You can check that when we change our basis from {eμ} to {fμ} that this is the same transformation law for the components of ω that one would acquire for Γ by change of coordinates in our earlier formulation (assuming we also perform this transformation on the basis covector that we've contracted with ω). The important point is that we are no longer changing coordinates on M; we are just changing the basis we are using for the tangent space. This is the subtle change that we are required to make before moving to the more general language of vector bundles.
Connections on Vector Bundles
Consider an arbitrary vector bundle V, with vector space fibers F ≅ Rk.
Let {hμ(p)} be a basis for Fp,
{eν(p)} a basis for TpM,
{e*ρ(p)} a basis for Tp*M
Where μ runs from one to k, and ν and ρ from one to n.
Define ∇νhμ = Rate of change of the μth basis vector of the fiber F in the νth direction in the base M. This can be expressed as a linear combination of the hρ's:
∇νhμ = ωμρν hρ
Once again, we can think of the ∇ operator as returning a one-form, by contracting its lower index with the basis covector e*ν(p):
∇ hμ = ωμρ hρ
Where ωμρ = ωμρν e*ν
Note that the μ and ρ components of ω refer to directions in the vector-space fiber, while the ν component (which we have begun to repress) refers to directions in the manifold.
If we change the fiber basis to gβ = Λαβ hα, we get the same tansformation law as before:
ω' = Λ ω Λ -1 + dΛ Λ -1
ω is a matrix-valued one-form, meaning each of its n2 components sits in the cotangent space of the manifold. The matrix itself also lives in an interesting space, as we will now explore.
Let's say we parallel-transport a basis vector along some coordinate direction. We can find out what new vector results by taking the exponential map of the covariant derivative:
V = [ exp{Δxα ∇α} ] hμ = hμ + Δx ∇hμ + ½ (Δx)2 ∇2hμ + ...
We're omitting the notation for the sum Δxα∇α because it's clear that's what we're doing, and adding more indices will only confuse us.
= hμ + Δx ωμρhρ + ½ Δx2 ωμρωρδhδ + ...
= [ exp{Δx ω} ]μρ hρ
Thus, [ exp{Δx ω} ] is a transformation matrix that gives the value of a parallel-transported vector, given an initial vector. We can imagine setting up a coordinate system in which the basis {gμ} is parallel-transported along this curve. In this case, the transition function from the g-basis to the h-basis will just be the matrix [ exp{δx ω} ]μρ.
What we are attempting to demonstrate here is that exp{ω} is a matrix which sits in the structure group of the vector bundle, exp{ω} ∈ G. Thus, ω must live in the lie algebra of the structure group, ω ∈ £[ G ]. To summarize, a connection ω over a vector bundle is specified by a lie-algebra-valued one-form.
Curvature of Vector Bundles
We are finally able to define curvature on vector bundles. At this point, we've set things up to be a fairly straightforward generalization, if abstract. First we look back at the more familiar case, that V = TM. After introducing the Christoffel Connection, we parallel-transported a vector along two paths. Comparing Vpath1 - Vpath2 gives us the curvature. In component form,
Where Rijkl = ∂k Γijl - ∂l Γijk + Γikm Γmjl - Γilm Γmjk
Now, since R is antisymmetric in its last two indices, we can think of these last two indices as representing a two-form:
½ Rαβσλ dxσ ∧ dxλ = Rαβ = a matrix of 2-forms.
This viewpoint becomes very natural when noting the formula for R is much simpler:
Rαβ = d(Γαβλ dxλ) + Γασκ dxσ ∧ Γκλ βdxλ
= dω - ω ∧ ω
(Remember, ω is a matrix of one-forms, so the wedge product ω ∧ ω implies matrix multiplication as well, meaning it is a nontrivial combination of one-forms; if the matrix size were just 1 × 1, &omega would just be a single one-form, and ω ∧ ω = 0, by antisymmetry of the wedge product)
So, the connection ω is a matrix of one-forms, and the curvature R = dω - ω ∧ ω is a matrix of two-forms. We generalize this to the language of vector bundles, basically by replacing the letter R with the symbol Ω.
½ Ωαβ ij dxi ∧ dxj is the curvature 2-form on the vector bundle V.
It is a matrix of 2-forms, and its equation is simply given by Ω = dω - ω ∧ ω.
Clearly, this reduces to the Riemann curvature tensor, for V = TM. Note that α and β run from 1 to k, and i and j run from 1 to n. In other words, these indices refer to vectors in different spaces; α and β represent vectors in the fiber, and i and j represent tangent vectors.
Since Ω is a tensor, it is possible to think of it as a multilinear map
Ω: TpM ⊗ TpM ⊗ F*pM ⊗ FpM → R
It is basically the same concept as the Riemann curvature; we transport a vector in the fiber around the manifold on two different paths, specified by two tangent vectors. The difference in the result of the two transports will be found by contracting Ω with the two directions and the original vector.
When we change basis in the fiber, it is fairly easy to check that Ω transforms by
Ω' = Λ -1 Ω Λ
Now, why all the fuss with the Λ's? Why do we care how things transform under changes of fiber bases? Well, we'd like to be able to construct quantities whose components transform, but which can be identified with invariant objects. Currently, the curvature transforms like a (1,1) tensor on the fiber, and thus we can identify it as such, but ω's transformation law doesn't lend itself to anything of this nature. To put it more plainly, ω's components are one-forms over the base manifold, but they are not one-forms over the total space. When we deal with principal bundles, we will be able to define an invariant connection which will indeed be a one-form on the total space, but for now we have to settle with this half-defined object.
Remembering the language of fiber bundles, the Λ's all sit in the bundle's structure group, G. Now, if Λ can be found by taking the exponential map of a lie algebra element, κ,
Λαβ = exp{κ}αβ
ω' = Λ ω Λ -1 + dΛ Λ -1
= exp{κ} ω exp{-κ} + dκ
Note, for small values of κ, this just looks like:
ω' = ω + dκ + [ κ, ω ]
hilighting the fact that these ω and κ are both lie algebra elements, and can be added and commuted to produce lie algebra elements (matrix commutation is the multiplication law for the lie algebra).
Principal Bundles
In the case of a principal bundle, we are no longer able to abstract the tangent bundle's vector language any further, but we can still perform most of the same calculations, though much less concretely. We start by redefining parallel transport, for the case of a general fiber bundle.
The need for a definition of parallel transport is built into the language of fiber bundles. The projection map π: E → M allows us to difinitively say which point p in M we are over, but given a point in E, we can not say difinitively which point in F we are at. F provides a local coordinate system for points in the neighborhood of p, but we can change these coordinates at will, thus they can tell us nothing about parallel transport. We need to provide additional information which tells us whether we are moving up or down the fiber as we move along a path in E. This additional data is our final notion of a "connection".
Start with a fiber bundle π: E → M, with fiber F. Imagine studying the tangent bundle of the total space, TE. This is a fairly complex object; it's the tangent bundle of the total space of a fiber bundle. Just bear with me. Now, imagine we look at a specific tangent space TuE at an arbitrary point u ∈ E. Further imagine that we have a means of decomposing the vector space TuE into a combination of two vector spaces:
TuE = VuE ⊕ HuE, where
VuE ≅ TfF is the "vertical" subspace, along the fiber F, and
HuE ≅ TpM is the "horizontal" subspace, along the base M.
(for some f and p corresponding to the point u ∈ E)
We haven't been very rigorous, but the idea here is that we are providing a difinitive split between "horizontal" and "vertical" transport. Given a curve γ through E, we say that γ(t) is parallel-transported if the velocity γ'(t) ∈ Hγ(t)E. Thus, if we can provide data which specifies a way of splitting TuE = VuE ⊕ HuE, we have defined parallel transport in the fiber bundle.
In the special case that π is a principal bundle, P → M, with fiber G given by the structure group, this additional data can, in fact, be expressed as a lie-algebra-valued one-form over the total space, ω ∈ £ [ G ] ⊗ T*P. Note that we previously defined ω to sit in the cotangent space of the base manifold, T*M, but under the condition that its components transformed under a change of fiber basis. This new ω lives in the cotangent space of the total space, ω ∈ T*P. A principal bundle will be the most abstract space on which we can define curvature.
To find the appropriate lie-algebra-valued one-form, we first choose bundle coordinates (x,g) where x ∈ M, g ∈ G. Then, any one-form can be expressed as:
β = Akdxk + Bijdgij
It will become convenient for us to express this differently. We can remove the terms Bij from the expression by reparameterizing our fiber using the transition function g -1. In other words, we locally rotate g to the identity for a given one-form, giving us d1 = 0, leaving us with only the first piece to contend with.
β' = A' + 0
We write this cryptically as A' because we have not yet stated how A transforms. We are going to set things up so that A corresponds to our previous definition for the connection, i.e. that it transforms like
A' = Λ A Λ -1 + Λ -1 dΛ
Setting Λ = g -1, we find that we can always get a parameterization such that β has the following form:
β = -g -1ij dgij + g -1 Ak g dxk
We transformed Ak like our old definition of a connection. It transforms annoyingly under fiber reparameterizations, but the expression above has just the right form to cancel terms picked up by A's unusual transformation law. We change our notation again, redefining what we mean by the connection ω:
ω = -g -1ij dgij + g -1 A g, where
A = Aka(x) Ta dxk.
Here, {Ta} is a basis for the lie algebra matrices, and {Aka(x)} is just a collection of coefficients for the {Ta}. We will now show that ω so defined provides a split of TuP into VuP ⊕ HuP.
First, we write down a basis for HuP:
{∂/∂xμ + Cμij ∂/∂gij}
The ∂/∂xμ are, of course, directions in the base manifold, while the ∂/∂gij correspond to directions along the fiber (in a given parameterization). Cμij are coefficients specified as input, selecting a definition of "horizontal".
Given ω, we can determine a set of C's, via the following equation:
Define HuP = {V ∈ TuP | ω(V) = 0}
We write a general vector V ∈ TuP as
V = (αij ∂/∂gij) + βμ (∂/∂xμ + Cμij ∂/∂gij)
And as we wrote before,
ω = -g -1ij dgij + g -1il Aμa(x) Talk gkj dxμ
When αij = 0, V ∈ HuP, meaning that
ω(V) = 0 when αij = 0, for all {βμ}
We can now compute the Cμij's:
ω(V) = -βμ g -1 Cμ + g -1 Ta g Aμa(x) βμ
Since this is true for all βμ, we find:
g -1 C = g -1 Ta g Aμa(x)
Cμij = Aμa(x) Taik gkj
The {Aμa(x)} is data supplied by ω, which tells us the Cμij, which nails down HuP.
HuP = {∂/∂xμ + Aμa(x) Taik gkj ∂/∂gij}
Parallel Transport
It will now be possible to write down an equation for parallel transport in a principal bundle. Let δ(t) be a path in the base M.
δ(t) = (x1(t), x2(t), ..., xn(t))
We want to perform parallel translation in the bundle, lifting δ(t) to Δ(t), with
π • Δ = δ, Δ = (xμ(t), g(t))
And under the requirement that dΔ/dt ∈ HΔ(t)P, for all t.
Explicitly,
d/dt = dxμ/dt ∂/∂xμ + dg/dt ∂/∂g ∈ HΔ(t)P
Thus, dxμ/dt ∂/∂xμ + dg/dt ∂/∂g = βμ (∂/∂xμ + Aμa Ta • g ∂/∂g
This quickly reduces to the requirement:
dgij/dt - dxμ/dt Aμa(x) Taik gkj = 0
Given a connection ω specified by A, we simply solve this equation for g(t) to determine parallel transport on a principal bundle.
How now do we determine a rule for assigning curvature to a principal bundle? We cannot proceed with our previous analogy, for there is no obvious way of comparing group elements, as there is for vectors. In order to motivate our search for curvature on principal bundles, we take a second look at vector bundles, now in a more specific context.
Connections on Associated Vector Bundles
Recall that given a principal bundle, we can build associated vector bundles, given a representation of the structure group, G. Mathematically, this is given by the product space P × V subject to the equivalence relation (x, g, v) ~ (x, gh-1, ρ(h)v). Intuitively, we start with a vector space glued to one point of the base, and get the vector space at every other point by multiplying by the representation of the group fiber at that point.
It should come as no surprise that a connection on a principal bundle induces vector-bundle-connections on all of its associated vector bundles. Given an associated vector bundle with representation ρ, we can parallel-transport a vector V at a point p along a curve δ(t) in the base by the following method:
Look at the point (p, e) in the principal bundle over M, where e is the identity element of G. Parallel-transport the identity element along the curve δ(t) ⊂ M, giving the curve Δ(t) ⊂ P. Act with the representation of the parallel-transported group element on the vector Vp to get the vector Vδ(t), the parallel-transported vector in the vector bundle. Explicitly,
Vδ(t) = ρ(Δ(t)) • Vp
Recall that we already have an equation of motion for Δ(t) = g(t), given by
dgij/dt - dxμ/dt Aμa(x) Taik gkj = 0
These two equations induce an equation of motion for V:
dV/dt = ρ(dΔ(t)/dt) • Vp = ρ( dxμ/dt Aμa Ta ) • V
Where we have evaluated the group element at the identity, as this is part of our algorithm. Rewriting dV/dt = dxμ/dt ∂V/∂xμ,
dxμ/dt ∂V/∂xμ - dxμ/dt Aμa ρ(Ta) • V = 0
∇ V = 0
Where ∇ = dxμ ( &part/∂xμ - Aμa ρ(Ta))
Remember that a representation of a lie group induces a representation of its lie algebra. This is what is meant by ρ(Ta). So, we find that the connection ω on the principal bundle induces the connection A = -Aa(x) ρ(Ta) on its associated vector bundles. This is the same kind of connection we previously defined for vector bundles; the curvature of an associated vector bundle can be directly given by:
F = dA - A ∧ A
When we constructed ω, we had this piece corresponding to A, which by itself was not a one-form on the total space, but it was a one-form defined on the base with a connection-like transformation law. As it turns out, this piece is the connection over the associated vector bundles of the principal bundle, when evaluated in the given representation.
Curvature on Principal Bundles
Finally, we are in a position to define curvature on principal bundles. Curvature over a principal bundle will be associated with the curvature that it induces over its associated vector bundles. We state the answer and show that it is consistent:
Ω = dω - ω ∧ ω
Where now ω = -g -1 dg + g -1 A g ∈ £(G) ⊗ T*P
In a given fiber parameterization,
Ω = d(g -1 A g) + (-g -1dg + g -1 A g) ∧ (-g -1dg + g -1 A g)
And, after a series of cancellations (remembering that the exterior derivative gives a minus sign in its liebnitz rule for one-forms)
Ω = g -1 (dA - A ∧ A) g
Remembering that the curvature of a vector bundle transforms like Ω' = Λ -1 Ω Λ, this is motivation enough for us to define the curvature of a principal bundle in this way.
Summary of Principal Bundles
The curvature over a principal bundle is found by the connection form ω = -g -1 dg + g -1 A g, a lie-algebra-valued one-form over the total space, P. The curvature Ω = dω - ω ∧ ω, a lie-algebra-valued two-form. For each representation of the structure group G, there exists an associated vector bundle, with vector bundle connection given by A, which is a lie-algebra-valued one-form over the base, and curvature F = dA - A ∧ A, a lie-algebra-valued two-form over the base, and (1,1) tensor over the vector space fiber.
Curvature over a principal bundle gives the basis for a classical theory of everything, in which the curvature F is equated to the force field strength, and A is the gauge potential. Sadly, I don't have enough characters left to sa