sheaf theory (idea) by allispaul Mon Nov 30 2009 at 4:15:45

Introduction

Sheaf theory is a subfield of topology that generalizes the notion of gluing continuous functions together and specializes in describing objects by their local data on open sets. Above all, it's a tool, usable to define connectedness, manifolds, the fundamental group, and several other concepts in algebraic topology and differential geometry.

Its history may be more interesting than the subject itself: the French topologist Jean Leray developed it in the oodles of free time he had as a prisoner of the Nazis in Edelbach, Austria. This was part of a general trend that my topology teacher described as "the Nazis' biggest blunder" — while German minds fled as the political climate changed, the imprisoned French did tons of research in prison, so that by the war's end France was the new math hotbed. "The Nazis wanted to preserve German culture, and they ended up destroying it. It's totally ridiculous."

One of the advantages of Sheaf theory is that you really don't need that much topology to study it. We define a topological space as a set X along with a set of subsets of X that we consider "open," such that any union or finite intersection of open sets is also open. (In particular, both X itself — the intersection of no sets — and the empty set — the union of no sets — are open.)

The delightful agricultural terminology is there in the French, too. I have no idea why.

Presheaves

A presheaf is a map assigning to each open set U in X a set F(U). (The "F" stands for "faisceau" — that means "sheaf"!) F(U) is just an arbitrary set, not necessarily related to the structure of X. We call its elements "sections" because of one particular example below. We do require it to restrict properly — for every open set U, and every open subset V of U, there must be a map ρ_V,U that carries each element of F(U) to an element of F(V), such that ρ_U,U is the identity map for any U, and such that, for any for any W subset of V subset of U, ρ_W,V ° ρ_V,U = ρ_W,U.

So we have a set for every open set, and we can restrict them properly. Here are a few examples.

The constant presheaf at a set S is the presheaf F_S such that for any open U, F_S(U)=S. The restriction maps are the identity.
For any spaces X and Y, the presheaf of continuous functions from X to Y. This is usually written O_Y^X, and it returns for any U in X the set of continuous functions from U to Y. The restriction maps are simply restriction of functions.
Similarly, if Y is a metric space (especially the reals), we can form the presheaf of bounded functions from X to Y.
Given a projection map p from Y to X, the presheaf of sections Γ(p) or Γ(Y/X) gives, for Γ(Y/X)(U), the set of continuous functions from U to Y that are locally inverses of p. If Y is R², X is R, and p is the expected projection, this gives graphs of continuous functions y=f(x).

If you know any category theory, you can more concisely define the category of presheaves on X with values in some category S as the category of contravariant functors from Op(X) to S.

Sheaves

Sheaves attach to the properties of presheaves the ability to paste together sections on different domains. Take an open cover of an open set U by {V_a} for a in some index set A. Choose a section s_a in F(V_a), for each a, such that whenever V_b and V_c intersect, ρ_{V_b∩V_c,V_b}(s_b) =ρ_{V_b∩V_c,V_c}(s_c). F is a sheaf if there exists a unique s in F(U) such that ρ_{V_a,U}(s)=s_a for each a.

That seems like a mouthful. In essence, we want our sheaves to have a good local-to-global property. This is a generalization of the statement, sometimes known as the pasting lemma, that given continuous functions that agree on the intersections of their domains, one can define a continuous function on the union of their domains that restricts to each of the original functions. The best way to think of this property is to look at the places where it fails. The presheaf of bounded functions is a good example: one can cover R with open intervals and take a section on each interval equal to, say, the restriction of y=x. Then the sections all agree, and they are each bounded, but the obvious global extension - the function y=x on R - is not a section of the presheaf because it is not bounded (and there is no section that does work). So the presheaf of bounded functions is not a sheaf.

Another counterexample in the opposite direction is the constant presheaf at a set S with more than two elements. Cover Ø, the empty set, with the empty collection. Then the "section for each element of the cover" is in fact a tuple containing no sections - it is therefore the empty tuple (). Vacuously, its elements agree on intersections. But there are sections on Ø: every element of S is a section. And if a and b are two such sections, then they both restrict to nothing on no sets, so that they both satisfy the sheaf property. So the constant presheaf is not a sheaf, because no unique section exists in this case.

Here are some sheaves:

The presheaf of continuous functions is, in fact, a sheaf. Similarly, on functions from Rⁿ to itself, we can define the sheaf of analytic functions, or of k-times differentiable functions for any k.
So is Γ(Y/X), above. We call this the sheaf of sections. As we will see, it's our canonical example.
The skyscraper sheaf at x for a set S is S_x(U)=S if x is in U, * if it is not.

Morphisms

As noted above, presheaves (and sheaves) over a space form a category. The corresponding morphism is, well, the sheaf morphism. A morphism from a presheaf F to a presheaf G on X is a morphism φ_U from F(U) to G(U) for every open set U in X. We require these morphisms to commute with restrictions. In other words, for every open U and open V included in U, the following diagram must commute:

                                                            φ_U
                                                 F(U)----------------->G(U)
                                                  |                      | 
                                                  |                      | 
                                                  |                      | 
                                            Fρ_V,U  |                      | Gρ_V,U
                                                  |                      | 
                                                  |                      | 
                                                  |                      | 
                                                  V                      V
                                                 F(V)----------------->G(V)
                                                            φ_V

Morphisms can be composed by defining the new morphism's member functions F(U) -> G(U) to be the composition of the old ones from F(U) to G(U). We say that a morphism is an isomorphism if it has an inverse, or equivalently, if each of its member functions is a bijection.

Stalks

Stalks reduce sheaves and presheaves to their most local information. Surprisingly, this is enough to capture most of the information about a sheaf or presheaf, though it cannot account for everything.

We start by forming the disjoint union of each section over each open set containing some x. This is most easily described as the set of ordered pairs (U,s) where s is a section in F(U). The stalk at x, F_x, is this set modulo the following equivalence relation: (U,s) ~ (V,t) if, for some W in U∩V containing x, ρ_{W, U}(s) = ρ_{W, V}(t). What we are left with is each section "around x" in its "simplest form." We call this the germ of a section.

In the format of the sheaf of sections described above, you can visualize this pretty easily: the sections of X sit above the space, they restrict to become sections on smaller sets, and then each stalk is the intersections of the vertical line above each point with the section above. If p actually is a projection, then the stalk at x is p^-1(x). We can compute other stalks similarly: for the constant presheaf, for instance, the stalk over any point is S, and for the skyscraper sheaf, the stalk over x is S and the stalk over any other point is *. (Hence the name "skyscraper.")

As I said above, you can capture information about a presheaf through its stalks. In particular, you can define an induced map on stalks based on a morphism on presheaves: if φ is a morphism from F to G and φ_U is its component map on U, let φ_x carry each (U,s) to (U,φ_U(s)). Because of the way the stalk is defined and the way morphisms commute with restrictions, this is actually well-defined. Moreover, a few convenient facts make this induced map useful: if the induced maps of two morphisms are equal for every x, then the morphisms are equal. If a morphism's induced map is a bijection for every x, the morphism is an isomorphism, and similarly, if the induced map is injective or surjective, the morphism has the respective property.

One seeming "fact" is the following: if a bijection exists between the stalks of two spaces, they are isomorphic. But this isn't true, and here is a counter example. On the unit circle S¹ in the complex plane, let p₂ be the map taking z to z², and let D be the fold map taking two copies of the circle to itself. We can form the sheaf of sections of both these maps. Then the stalk of each point in Γ(p₂) is two points (one for each square root), and that of Γ(D) is two points (one in each circle). In fact, for any open set not equal to the entire circle, the sections are isomorphic - simply two copies of the open set. But S¹ has no global section under Γ(p₂), while it does (it has two) under Γ(D). So there is no morphism from Γ(D) to Γ(p₂), even though their stalks are isomorphic.

The Espace Étalé and Sheafification

"Espace étalé" means, in French, a space that is "spread out" or "displayed." It is formed by "bundling" the stalks back into a meaningful global structure. Along the way, though, we manage to acquire the sheaf property in a very unexpected way. We define Ét(F) to be the disjoint union of the stalks of F on X, Ét(F)={(U,s,x)}. Along with this come two maps: for any (U,s) a map ~s : U → Ét(F) taking x to (U,s,x); and a projection map p : Ét(F) → X taking (U,s,x) to x. We topologize Ét(F) with the finest possible topology such that ~s is continuous for each (U,s). Then V is open in Ét(F) iff ~s^{-1}(V) is open in U for every (U,s).

This topology makes p a local homeomorphism - it is continuous and open, and injective on a neighborhood of every x. Surprisingly, if any p : Y → X is a local homeomorphism, then Y is homeomorphic to Ét(Γ(p)). This gives a correspondence — a category equivalence, actually — between sheaves and local homeomorphisms.

Now the set of all maps ~s from a set U forms Γ(Ét(F)/X)(U), and we can combine all of these together to get the sheaf Γ(Ét(F)/X). But F didn't need to be a sheaf, just a presheaf. And the set of maps s → ~s actually forms a sheaf morphism between these two structures. We call this the unit morphism, and the sheaf Γ(Ét(F)/X) the sheafification of F, or aF for short. It is the closest approximation to F with the sheaf property. We can also prove that for any x, there is a stalk isomorphism between F_x and aF_x. Also, for any morphism φ : F → G, there is a unique morphism ~φ : G -> aF such that φ ° ~φ = unit.

Finally, as you may have guessed, if F is a sheaf, the unit morphism is an isomorphism. So every sheaf is really a sheaf of sections. This interpretation might not always be convenient, as the espace étalé is often hard to conceptualize, but it's there if needed, and it explains why we use the word "section" for members of any presheaf or sheaf.

Mapping sheaves

There are two ways we can move sheaves between spaces, both ideas from category theory made more specific.

If f : X → Y is continuous, F is a sheaf on X, and G is a sheaf on Y, the direct image of F under f is the sheaf (f_*F)(V)=F(f^-1(V)). The inverse image of G under f is the sheaf (f^*G)(U)=Γ((Ét(G) x_Y X)/X)(U). In this case, A x_C B is the fiber product of A and B over C, defined from maps f : A → C, g : B → C as the set {(a,b) in A x B : f(a)=g(b)}.

Applications

Though interesting as a category, there isn't too much more to do with sheaves than that described in this article. Here are a few of the ways to extend sheaf theory.

In general topology and covering space theory, the correspondence between sheaves and local homeomorphisms demonstrated above is only a piece of the puzzle. Constant sheaves are the sheaves of sections of fold maps, that is, projections X x S^δ → X. In between these two, we have the locally constant sheaves, which are sheaves that are locally equal to a constant sheaf. These are the sheaves of sections of covering maps, maps p : E → X such that every x in X has a neighborhood U such that p^-1(U) is homeomorphic to U x p^-1(x).

In algebraic topology, we can define loop homotopy and the fundamental group using the above covering spaces. We can also define connectedness of a space: X is connected if there is a bijection between S and the constant sheaf F_S at X for any S. This is tantamount to saying that the only continuous maps from X to any S^δ are constant. If X is not connected, we can define its π₀ to be the set, with an associated global section s in F_π₀X(X), such that for any other global section t of another constant sheaf F_S(X), there is a unique morphism φ : F_π₀X → F_S such that φ_X(s)=t.

Perhaps most far-reachingly, Alexander Grothendieck made large strides in algebraic geometry in the 1950's by abstracting the idea of covering and replacing all the data about the points of the space with data about its open sets. This gave categories very similar to the category of the sheaves on the space. Nowadays, these categories are studied in their own right as topoi (singular topos), with connections not only to algebraic geometry and category theory, but also mathematical logic.

dispersion relation (thing) by Singing Raven Wed Mar 11 2009 at 12:46:59

What is a dispersion relation?

Well, that's a bit of a tricky question from a standing start. Essentially, it's something that tells you about how a specific type of wave (like a sound waves, or an electromagnetic wave, etc.) will act in a certain medium. So, before going any further, what does one mean by a 'wave'? Here's some maths... (don't be alarmed!) A 'wave' is something which looks like this:

exp(i(k x - ω t))

What does that mean? Well, exp is an exponential. k is one over the wavelength, and ω is the frequency (or the frequency times a constant, strictly). If this doesn't make sense to you, keep reading; I'll try to provide some intuition through examples. A dispersion relation is an equation that will relate ω and k. i, here, is the square root of minus one, and its presence indicates we're looking for waves - things which go up and down.

So, why care about a dispersion relation? Well, in a lot of physical cases, you can't just bung in any values of ω and k - the actual stuff your wave is travelling through has something to say about that. Let's look at a concrete example.

A concrete example

Consider speaking down a long, rectangular air duct. Air ducts don't like to transmit every sound (making your voice sound 'flatter' and more boring, unless your voice makes the metal vibrate and add tinny sounds of its own). If we do some maths, we get the dispersion relation for a rectangular waveguide:

ω² - c_s²k² = c_s² a^-2(n² + m²)

Here, the geometry of the problem has taken the usual dispersion relationship for speaking in air (the left hand side) and added in a constraint of these numbers n and m on the right - these numbers are integers, at least one.

That wasn't very interesting.

An interesting example - the Plateau-Rayleigh Instability

This is going to be more involved. See, earlier, when I said the presence if 'i' indicated we were looking for waves? We aren't necessarily going to find waves - at least not ones that travel. Some waves (called instabilities) don't travel at all, but just grow. Finding instabilities and understanding them is a fundamental part of applied mathematics.

Consider a cylinder of fluid, surrounded by nothing (air is close to being nothing). Imagine, for example, you've got some golden syrup, and you're pouring it nice and slow over some delicious waffles. Well, the vibrations from your hand (and the air currents in the room, and even thermal fluctuations) are going to make ti-iny waves on the surface of that cylinder. Let's see what happens.

For simplicity's sake, say that the long cylinder (which, without the waves, would be radius a) isn't moving, and initially has a tiny wave on its surface - to remind us that it's tiny, let's say ε is a really tiny thing, and so we write:

radius = a( 1 + ε exp(i(kx - ω t)))

All we've done is say, we have a cylinder; it has a little wave on its surface; for simplicity we're going to assume it's not initially moving (although it doesn't matter, actually). In principle, if we felt like doing some work, then we could apply our knowledge of fluid dynamics to remove everything from the problem except k and ω. Well, (skipping ahead) these waves aren't going to grow. We find that for real of k (here, you can pleasingly read 'real' to mean 'allowed'), ω is imaginary. For completeness, I include Lord Rayleigh's original result here (although it's nasty):

i ω = actually, this is too nasty to bother. Something 'real'

So what? You say. Well, this is a basic example of a stability argument. If ω is imaginary, i ω is real, and that means the size of our disturbance can grow. If our waves don't propagate, but stay where they are and grow exponentially with time, we know that a system is unstable (and won't occur in nature - or, not for long).

But my syrup pouring showed no signs of being unstable!

These instabilities need time to grow. Try pouring it from a metre onto a plate on the floor, and watch it separate into droplets before hitting the plate.

Summary

A dispersion relationship is something that tells us how waves behave in a medium or situation; it also tells us if a system is unstable.

At least, unless we have a continuous spectrum. But I don't understand them yet.

Stirling's Formula (recipe) by Singing Raven Sat Feb 07 2009 at 11:32:11

An Asymptotic Derivation

The above nodes only give a single term - but we can get an infinite series if we try slightly harder. This series will be asymptotic and diverge - but will still be very useful if we only take the first few terms. Let us consider what we actually mean by a factorial. Well, it can be shown that the factorial function (defined on the natural numbers) has precisely one analytic continuation given suitable assumptions about convexity or equivalent - the Gamma function.

Gamma (z) == ∫ exp (-t + (z-1)ln (t)) dx

Where the integral runs between 0 and infinity along the real line. Take a look at that integrand. It's largest when t = (z-1), and is pretty damned small everywhere else. I'm going to apply Laplace's Method to the region around t = (z-1) and see if I can't get somewhere1.

Write h(t) === -t + (z-1)ln (t) and Taylor expand around t=(z-1).

h(t) ~ (z-1)( -1 + log (z-1)) + 0*t - (t - (z-1))*(t - (z-1))/(2*(z-1)) + O((t - (z-1))^{3})

Why have I stopped there? After all, t is a variable - it runs from 0 to infinity, so I can hardly claim it's small. Well, it's not small. But h(t) is quite negative away from t = (z-1), and so the integrand is exponentially small; so, for my leading order term, I'm going to stop here. If I wanted higher order terms, I would simply keep more terms from h.

Okay, so this gives us

Gamma(z) ~ (z-1)^(z-1) * exp(1-z) * ∫ exp( - u*u) du*(sqrt(2 * (z - 1))

Where I have sneakily made the substitution (t-(z-1))*(t-(z-1))/(2*(z-1)) = u*u, and the limits of the integral are now from -infinity to + infinity. Remember that away from t = (z-1), everything is pretty much zero (those magic words were exponentially small) and it doesn't matter if we integrate over that space or not. It's convenient if we do. We recall our favourite Gaussian integral is equal to sqrt(pi) and that, for this to be a good approximation, z must be large, to find:

z! ~ Gamma(z+1) ~ z^(z + 1/2) * sqrt(pi) exp(-z).

Higher Order Terms

There are two approximations I made: To neglect the exponentially small region (which is really, really, really tiny, okay? There's no point keeping it here - although that's not always true) and to drop some of the larger terms from h. If I'd kept them in h, then I would have ended up with something that looks like:

exp(A + Bu*u + Cu*u*u + ...)

in my integrand. Well, an easy thing to do here (though by no means the only thing) is to expand out some of that exponential:

=exp(A + B u*u + 0 + ...)(1 + Cu*u*u + ...)

which is another integral I can evaluate, term by term. Note that with the substitution I made earlier I'd pick up another power of z^{-1/2}, making this term in some sense smaller than the previous. Then I'd find a term which looked like 1/(12*z) in my expansion; the next lowest term.

Why do this?

This one? It's a nice example of Laplace's Method, and is one of the more basic asymptotic problems. In general, if one has a nasty integral which one cannot evaluate analytically, it is often convenient to find an asymptotic expansion which is cheap computationally and retains the basic mathematical ingredients. Please believe me when I say this is not some cheap trick one might use once, but is a powerful tool in applied mathematics.

A symmetric function is a polynomial or rational function (quotient of polynomials) in n variables which remains invariant no matter how you permute variables (e.g. swap x₁ with x₂). They feature prominently in Galois theory. The elementary symmetric functions appear as the coefficients of a polynomial in n indeterminates (i.e. the coefficients of f(t) = (t - x₁)···(t - x_n) ), and the fundamental theorem of symmetric functions says that any symmetric function can be expressed as a polynomial or rational function of elementary symmetric functions. When the original function isn't symmetric, we can still say something interesting.

Theorem: Let g(x) be any polynomial where x = (x₁, ..., x_n) are n variables, and let s₁, ..., s_n be the elementary symmetric functions in n variables. Then g(x) can be written as a linear combination of monomials

x₁^ν₁ x₂^ν₂ ··· x_n^ν_n

such that ν_i ≤ i - 1 and the coefficients of the monomials are polynomials in the s_i.

This theorem, seemingly due to Emil Artin, is a slight generalisation of the fundamental theorem of symmetric functions. It gives the closest possible expression of any polynomial in terms of symmetric functions no matter if the original polynomial is symmetric or not. Or if you prefer, the fundamental theorm of symmetric functions comes as an easy corollary to this theorem.

The corollary is obvious. Observe that the nature of monomials is such that they can't be symmetrised, because the powers in the monomial have to be nondecreasing by indices. Thus, if the original polynomial g(x) was symmetric, then the only way it can still be symmetric after being written in this form is if the only monomial with nonzero coefficient is the one for which all the ν_i are zero, i.e. the constant term. But then the constant term is a polynomial of elementary symmetric functions, proving the corollary.

The proof is an algorithm for putting g(x) in the desired form.

Proof: Let f_n(t) := (t - x₁)(t - x₂ )···(t - x_n) = tⁿ - s₁t^n-1 + ··· + (-1)ⁿs_n and define recursively

f_{i - 1}(t) := f_i(t)/(t - x_i).

Three things are immediately clear:

The polynomial f_i(t) has x_i as a root, the other roots being the other x_j with j < i, because it's just f_n(t) with the last n - i linear factors divided away.

By synthetic division and by the recursive definition, the coefficients of f_i(t) are polynomials in terms of the elementary symmetric functions and the x_j with j > i.

The degree of f_i(t) is i.

Now for the algorithm to put g(x) in the desired form. Since x₁ is a root of f₁(t), it is possible to express x₁ in terms of the symmetric functions s_i and the rest of the x_i with i > 1. Substitute this expression of x₁ into g(x), and expand out the result, which does not contain any term with x₁ now.

We proceed recursively as follows. Since x₂ is a root of f₂(t), it is possible to express x₂² or any higher power in terms of the symmetric functions s_i and the rest of the x_i with i > 2, with perhaps a few terms of x₂ of degree less than 2. Substitute this expression of x₂² (or higher) into g(x), and expand out the result, which no longer contains any term with x₂² or higher degree.

Continuing in this process of eliminating all third powers of x₃ or higher with f₃(t), all fourth powers of x₄ or higher with f₄(t), we obtain the desired form for g(x).

QED.

Let's work out an example. Unfortunately, the only way to make an interesting enough example involves heavy computations. I will work out some steps of the example, but I will leave most of the boring manipulations to Maxima or to a diligent reader.

Let us consider the symmetric polynomial in 3 variables

g(x) = x₁²x₂ + x₁²x₃ + x₂²x₁ + x₂²x₃ + x₃²x₁ + x₃²x₂

Now, in 3 variables, the f_i(t) from the proof above are

f₃(t) = t³ - s₁t² + s₂t - s₃,
f₂(t) = t² + (x₃ - s₁)t + (s₂ - s₁x₃ + x₃²),
f₁(t) = t - s₁ + x₂ + x₃.

Recall that f₂ and f₁ are obtained by symbolic synthetic division of the polynomial above them and that the remainders are zero. Also, recall at this point that the elementary symmetric functions in three variables are

s₁ = x₁ + x₂ + x₃,
s₂ = x₁x₂ + x₁x₃ + x₂x₃,
s₃ = x₁x₂x₃.

Since f₁(x₁) = 0, f₂(x₂) = 0 and f₃(x₃) = 0, we obtain that

x₁ = s₁ - x₂ - x₃,
x₂² = s₁x₂ + s₁x₃ - s₂ - x₂x₃ - x₃²,
x₃³ = s₁x₃² - s₂x₃ + s₃.

So, the algorithm now says to replace this expression for x₁ into g(x), which after expanding everything out becomes

3x₂x₃² - s₁x₃² + 3x₂²x₃ - 4s₁x₂ x₃ + s₁²x₃ - s₁x₂² + s₁²x₂.

Note that we have succeeded in eliminating x₁ from this expression. Now we do the same with x₂², to obtain

-3x₃³ + 3s₁x₃² - 3s₂x₃ + s₁s₂.

Finally we replace x₃³ by its own expression to conclude that

g(x) = s₁s₂ - 3s₃,

which is the expression of g(x) in terms of elementary symmetric functions that we sought.

the smallest number that looks prime but isn't (idea) by redbaker Sun Dec 23 2007 at 20:00:54

No joke — there's a semi-serious proof involved here. We're looking for the smallest number that could easily be mistaken for prime but in fact is not. How do we find it? Well, since it's not prime, let's look for its prime factors.

The number can't be a multiple of two — even numbers are too easy to spot.
The number can't be a multiple of three — there's an easy test for that.
The number can't be a multiple of five — thanks to our base ten number system, it's too easy to find those.
Seven? Sure, why not? Multiples of seven don't look special at all. But we need more than one prime factor -- everybody knows that 49 is seven squared.
The other factor can't be eleven — 7 x 11 = 77, obviously not prime. Multiples of 11, especially low ones, are fairly obvious.
But what about thirteen? 7 x 13 = 91. That...looks prime.

So there you have it. A rigorous proof that the smallest number that looks prime but isn't is 91. Use this to impress your friends and shame your enemies at cocktail parties.

But what about the smallest number that can't be described in fewer than 15 words?

<- newer | older ->

e^2

Venerable members of this group: