Got math?

This is the E2 usergroup e2, which was originally a proper subset of the e2science group (e^2e2science). At first, the group name was e π i + 1 = 0, but some whiners thought that would be too hard to send a message:

/msg <var>e</var><sup>_&pi;_<var>i</var></sup>_+_1_=_0 After typing that, I forgot what I was going to say.

So here we are instead with a simpler (but more boring) name e2theipiplus1equalszero. Update: more complainers. Now we're just e^2. (Now does that means e² or e XOR 2 ? That is my secret.) Tough luck for those without a caret key.

e^2 often erupts into long mathematical discussions, giving members more /msgs than they care to digest. So, you have a few other options if the math is going to get seriously hairy:

  • Send to only those members of the group currently online:

    /msg? e^2 Brouwer was a Communist!.
     
  • Speak in the chatterbox. But be prepared to give non-math noders headaches.
  • Add the room Mathspeak to the list of rooms you monitor in squawkbox or Gab Central. Mathspeak died of loneliness.

You may want to read some of these while you are calculating ln π.


Venerable members of this group:

Wntrmute, cjeris, s19aw, Brontosaurus, TanisNikana, abiessu, Siobhan, nol, flyingroc, krimson, Iguanaonastick, Eclectic Scion, haggai, redbaker, wazroth, small, Karl von Johnson, Eidolos, Ryouga, SlackinWhileSleepin, ariels, quantumlemur, futilelord, Leucosia, RPGeek, Anark, ceylonbreakfast, fledy, Oolong@+, DutchDemon, jrn, allispaul, greth, chomps, JavaBean, waverider37, IWhoSawTheFace, DTal, not_crazy_yet, Singing Raven, pandorica, Gorgonzola, memplex, tubular, Tom Rook
This group of 45 members is led by Wntrmute

Introduction

Sheaf theory is a subfield of topology that generalizes the notion of gluing continuous functions together and specializes in describing objects by their local data on open sets. Above all, it's a tool, usable to define connectedness, manifolds, the fundamental group, and several other concepts in algebraic topology and differential geometry.

Its history may be more interesting than the subject itself: the French topologist Jean Leray developed it in the oodles of free time he had as a prisoner of the Nazis in Edelbach, Austria. This was part of a general trend that my topology teacher described as "the Nazis' biggest blunder" — while German minds fled as the political climate changed, the imprisoned French did tons of research in prison, so that by the war's end France was the new math hotbed. "The Nazis wanted to preserve German culture, and they ended up destroying it. It's totally ridiculous."

One of the advantages of Sheaf theory is that you really don't need that much topology to study it. We define a topological space as a set X along with a set of subsets of X that we consider "open," such that any union or finite intersection of open sets is also open. (In particular, both X itself — the intersection of no sets — and the empty set — the union of no sets — are open.)

The delightful agricultural terminology is there in the French, too. I have no idea why.

Presheaves

A presheaf is a map assigning to each open set U in X a set F(U). (The "F" stands for "faisceau" — that means "sheaf"!) F(U) is just an arbitrary set, not necessarily related to the structure of X. We call its elements "sections" because of one particular example below. We do require it to restrict properly — for every open set U, and every open subset V of U, there must be a map ρV,U that carries each element of F(U) to an element of F(V), such that ρU,U is the identity map for any U, and such that, for any for any W subset of V subset of U, ρW,V ° ρV,U = ρW,U.

So we have a set for every open set, and we can restrict them properly. Here are a few examples.

  • The constant presheaf at a set S is the presheaf FS such that for any open U, FS(U)=S. The restriction maps are the identity.
  • For any spaces X and Y, the presheaf of continuous functions from X to Y. This is usually written OYX, and it returns for any U in X the set of continuous functions from U to Y. The restriction maps are simply restriction of functions.
  • Similarly, if Y is a metric space (especially the reals), we can form the presheaf of bounded functions from X to Y.
  • Given a projection map p from Y to X, the presheaf of sections Γ(p) or Γ(Y/X) gives, for Γ(Y/X)(U), the set of continuous functions from U to Y that are locally inverses of p. If Y is R2, X is R, and p is the expected projection, this gives graphs of continuous functions y=f(x).

If you know any category theory, you can more concisely define the category of presheaves on X with values in some category S as the category of contravariant functors from Op(X) to S.

Sheaves

Sheaves attach to the properties of presheaves the ability to paste together sections on different domains. Take an open cover of an open set U by {Va} for a in some index set A. Choose a section sa in F(Va), for each a, such that whenever Vb and Vc intersect, ρVb∩Vc,Vb(sb) =ρVb∩Vc,Vc(sc). F is a sheaf if there exists a unique s in F(U) such that ρVa,U(s)=sa for each a.

That seems like a mouthful. In essence, we want our sheaves to have a good local-to-global property. This is a generalization of the statement, sometimes known as the pasting lemma, that given continuous functions that agree on the intersections of their domains, one can define a continuous function on the union of their domains that restricts to each of the original functions. The best way to think of this property is to look at the places where it fails. The presheaf of bounded functions is a good example: one can cover R with open intervals and take a section on each interval equal to, say, the restriction of y=x. Then the sections all agree, and they are each bounded, but the obvious global extension - the function y=x on R - is not a section of the presheaf because it is not bounded (and there is no section that does work). So the presheaf of bounded functions is not a sheaf.

Another counterexample in the opposite direction is the constant presheaf at a set S with more than two elements. Cover Ø, the empty set, with the empty collection. Then the "section for each element of the cover" is in fact a tuple containing no sections - it is therefore the empty tuple (). Vacuously, its elements agree on intersections. But there are sections on Ø: every element of S is a section. And if a and b are two such sections, then they both restrict to nothing on no sets, so that they both satisfy the sheaf property. So the constant presheaf is not a sheaf, because no unique section exists in this case.

Here are some sheaves:

Morphisms

As noted above, presheaves (and sheaves) over a space form a category. The corresponding morphism is, well, the sheaf morphism. A morphism from a presheaf F to a presheaf G on X is a morphism φU from F(U) to G(U) for every open set U in X. We require these morphisms to commute with restrictions. In other words, for every open U and open V included in U, the following diagram must commute:

                                                            φU
                                                 F(U)----------------->G(U)
                                                  |                      | 
                                                  |                      | 
                                                  |                      | 
                                            FρV,U  |                      | GρV,U
                                                  |                      | 
                                                  |                      | 
                                                  |                      | 
                                                  V                      V
                                                 F(V)----------------->G(V)
                                                            φV

Morphisms can be composed by defining the new morphism's member functions F(U) -> G(U) to be the composition of the old ones from F(U) to G(U). We say that a morphism is an isomorphism if it has an inverse, or equivalently, if each of its member functions is a bijection.

Stalks

Stalks reduce sheaves and presheaves to their most local information. Surprisingly, this is enough to capture most of the information about a sheaf or presheaf, though it cannot account for everything.

We start by forming the disjoint union of each section over each open set containing some x. This is most easily described as the set of ordered pairs (U,s) where s is a section in F(U). The stalk at x, Fx, is this set modulo the following equivalence relation: (U,s) ~ (V,t) if, for some W in U∩V containing x, ρW, U(s) = ρW, V(t). What we are left with is each section "around x" in its "simplest form." We call this the germ of a section.

In the format of the sheaf of sections described above, you can visualize this pretty easily: the sections of X sit above the space, they restrict to become sections on smaller sets, and then each stalk is the intersections of the vertical line above each point with the section above. If p actually is a projection, then the stalk at x is p-1(x). We can compute other stalks similarly: for the constant presheaf, for instance, the stalk over any point is S, and for the skyscraper sheaf, the stalk over x is S and the stalk over any other point is *. (Hence the name "skyscraper.")

As I said above, you can capture information about a presheaf through its stalks. In particular, you can define an induced map on stalks based on a morphism on presheaves: if φ is a morphism from F to G and φU is its component map on U, let φx carry each (U,s) to (U,φU(s)). Because of the way the stalk is defined and the way morphisms commute with restrictions, this is actually well-defined. Moreover, a few convenient facts make this induced map useful: if the induced maps of two morphisms are equal for every x, then the morphisms are equal. If a morphism's induced map is a bijection for every x, the morphism is an isomorphism, and similarly, if the induced map is injective or surjective, the morphism has the respective property.

One seeming "fact" is the following: if a bijection exists between the stalks of two spaces, they are isomorphic. But this isn't true, and here is a counter example. On the unit circle S1 in the complex plane, let p2 be the map taking z to z2, and let D be the fold map taking two copies of the circle to itself. We can form the sheaf of sections of both these maps. Then the stalk of each point in Γ(p2) is two points (one for each square root), and that of Γ(D) is two points (one in each circle). In fact, for any open set not equal to the entire circle, the sections are isomorphic - simply two copies of the open set. But S1 has no global section under Γ(p2), while it does (it has two) under Γ(D). So there is no morphism from Γ(D) to Γ(p2), even though their stalks are isomorphic.

The Espace Étalé and Sheafification

"Espace étalé" means, in French, a space that is "spread out" or "displayed." It is formed by "bundling" the stalks back into a meaningful global structure. Along the way, though, we manage to acquire the sheaf property in a very unexpected way. We define Ét(F) to be the disjoint union of the stalks of F on X, Ét(F)={(U,s,x)}. Along with this come two maps: for any (U,s) a map ~s : U → Ét(F) taking x to (U,s,x); and a projection map p : Ét(F) → X taking (U,s,x) to x. We topologize Ét(F) with the finest possible topology such that ~s is continuous for each (U,s). Then V is open in Ét(F) iff ~s^{-1}(V) is open in U for every (U,s).

This topology makes p a local homeomorphism - it is continuous and open, and injective on a neighborhood of every x. Surprisingly, if any p : Y → X is a local homeomorphism, then Y is homeomorphic to Ét(Γ(p)). This gives a correspondence — a category equivalence, actually — between sheaves and local homeomorphisms.

Now the set of all maps ~s from a set U forms Γ(Ét(F)/X)(U), and we can combine all of these together to get the sheaf Γ(Ét(F)/X). But F didn't need to be a sheaf, just a presheaf. And the set of maps s → ~s actually forms a sheaf morphism between these two structures. We call this the unit morphism, and the sheaf Γ(Ét(F)/X) the sheafification of F, or aF for short. It is the closest approximation to F with the sheaf property. We can also prove that for any x, there is a stalk isomorphism between Fx and aFx. Also, for any morphism φ : F → G, there is a unique morphism ~φ : G -> aF such that φ ° ~φ = unit.

Finally, as you may have guessed, if F is a sheaf, the unit morphism is an isomorphism. So every sheaf is really a sheaf of sections. This interpretation might not always be convenient, as the espace étalé is often hard to conceptualize, but it's there if needed, and it explains why we use the word "section" for members of any presheaf or sheaf.

Mapping sheaves

There are two ways we can move sheaves between spaces, both ideas from category theory made more specific.

If f : X → Y is continuous, F is a sheaf on X, and G is a sheaf on Y, the direct image of F under f is the sheaf (f*F)(V)=F(f-1(V)). The inverse image of G under f is the sheaf (f*G)(U)=Γ((Ét(G) xY X)/X)(U). In this case, A xC B is the fiber product of A and B over C, defined from maps f : A → C, g : B → C as the set {(a,b) in A x B : f(a)=g(b)}.

Applications

Though interesting as a category, there isn't too much more to do with sheaves than that described in this article. Here are a few of the ways to extend sheaf theory.

In general topology and covering space theory, the correspondence between sheaves and local homeomorphisms demonstrated above is only a piece of the puzzle. Constant sheaves are the sheaves of sections of fold maps, that is, projections X x Sδ → X. In between these two, we have the locally constant sheaves, which are sheaves that are locally equal to a constant sheaf. These are the sheaves of sections of covering maps, maps p : E → X such that every x in X has a neighborhood U such that p-1(U) is homeomorphic to U x p-1(x).

In algebraic topology, we can define loop homotopy and the fundamental group using the above covering spaces. We can also define connectedness of a space: X is connected if there is a bijection between S and the constant sheaf FS at X for any S. This is tantamount to saying that the only continuous maps from X to any Sδ are constant. If X is not connected, we can define its π0 to be the set, with an associated global section s in Fπ0X(X), such that for any other global section t of another constant sheaf FS(X), there is a unique morphism φ : Fπ0X → FS such that φX(s)=t.

Perhaps most far-reachingly, Alexander Grothendieck made large strides in algebraic geometry in the 1950's by abstracting the idea of covering and replacing all the data about the points of the space with data about its open sets. This gave categories very similar to the category of the sheaves on the space. Nowadays, these categories are studied in their own right as topoi (singular topos), with connections not only to algebraic geometry and category theory, but also mathematical logic.

What is a dispersion relation?

Well, that's a bit of a tricky question from a standing start. Essentially, it's something that tells you about how a specific type of wave (like a sound waves, or an electromagnetic wave, etc.) will act in a certain medium. So, before going any further, what does one mean by a 'wave'? Here's some maths... (don't be alarmed!) A 'wave' is something which looks like this:

exp(i(k x - ω t))

What does that mean? Well, exp is an exponential. k is one over the wavelength, and ω is the frequency (or the frequency times a constant, strictly). If this doesn't make sense to you, keep reading; I'll try to provide some intuition through examples. A dispersion relation is an equation that will relate ω and k. i, here, is the square root of minus one, and its presence indicates we're looking for waves - things which go up and down.

So, why care about a dispersion relation? Well, in a lot of physical cases, you can't just bung in any values of ω and k - the actual stuff your wave is travelling through has something to say about that. Let's look at a concrete example.

A concrete example

Consider speaking down a long, rectangular air duct. Air ducts don't like to transmit every sound (making your voice sound 'flatter' and more boring, unless your voice makes the metal vibrate and add tinny sounds of its own). If we do some maths, we get the dispersion relation for a rectangular waveguide:

ω2 - cs2k2 = cs2 a-2(n2 + m2)

Here, the geometry of the problem has taken the usual dispersion relationship for speaking in air (the left hand side) and added in a constraint of these numbers n and m on the right - these numbers are integers, at least one.

That wasn't very interesting.

An interesting example - the Plateau-Rayleigh Instability

This is going to be more involved. See, earlier, when I said the presence if 'i' indicated we were looking for waves? We aren't necessarily going to find waves - at least not ones that travel. Some waves (called instabilities) don't travel at all, but just grow. Finding instabilities and understanding them is a fundamental part of applied mathematics.

Consider a cylinder of fluid, surrounded by nothing (air is close to being nothing). Imagine, for example, you've got some golden syrup, and you're pouring it nice and slow over some delicious waffles. Well, the vibrations from your hand (and the air currents in the room, and even thermal fluctuations) are going to make ti-iny waves on the surface of that cylinder. Let's see what happens.

For simplicity's sake, say that the long cylinder (which, without the waves, would be radius a) isn't moving, and initially has a tiny wave on its surface - to remind us that it's tiny, let's say ε is a really tiny thing, and so we write:

radius = a( 1 + ε exp(i(kx - ω t)))

All we've done is say, we have a cylinder; it has a little wave on its surface; for simplicity we're going to assume it's not initially moving (although it doesn't matter, actually). In principle, if we felt like doing some work, then we could apply our knowledge of fluid dynamics to remove everything from the problem except k and ω. Well, (skipping ahead) these waves aren't going to grow. We find that for real of k (here, you can pleasingly read 'real' to mean 'allowed'), ω is imaginary. For completeness, I include Lord Rayleigh's original result here (although it's nasty):

i ω = actually, this is too nasty to bother. Something 'real'

So what? You say. Well, this is a basic example of a stability argument. If ω is imaginary, i ω is real, and that means the size of our disturbance can grow. If our waves don't propagate, but stay where they are and grow exponentially with time, we know that a system is unstable (and won't occur in nature - or, not for long).

But my syrup pouring showed no signs of being unstable!

These instabilities need time to grow. Try pouring it from a metre onto a plate on the floor, and watch it separate into droplets before hitting the plate.

Summary

A dispersion relationship is something that tells us how waves behave in a medium or situation; it also tells us if a system is unstable.

At least, unless we have a continuous spectrum. But I don't understand them yet.

An Asymptotic Derivation

The above nodes only give a single term - but we can get an infinite series if we try slightly harder. This series will be asymptotic and diverge - but will still be very useful if we only take the first few terms. Let us consider what we actually mean by a factorial. Well, it can be shown that the factorial function (defined on the natural numbers) has precisely one analytic continuation given suitable assumptions about convexity or equivalent - the Gamma function.

Gamma (z) == ∫ exp (-t + (z-1)ln (t)) dx

Where the integral runs between 0 and infinity along the real line. Take a look at that integrand. It's largest when t = (z-1), and is pretty damned small everywhere else. I'm going to apply Laplace's Method to the region around t = (z-1) and see if I can't get somewhere1.

Write h(t) === -t + (z-1)ln (t) and Taylor expand around t=(z-1).

h(t) ~ (z-1)( -1 + log (z-1)) + 0*t - (t - (z-1))*(t - (z-1))/(2*(z-1)) + O((t - (z-1))^{3})

Why have I stopped there? After all, t is a variable - it runs from 0 to infinity, so I can hardly claim it's small. Well, it's not small. But h(t) is quite negative away from t = (z-1), and so the integrand is exponentially small; so, for my leading order term, I'm going to stop here. If I wanted higher order terms, I would simply keep more terms from h.

Okay, so this gives us

Gamma(z) ~ (z-1)^(z-1) * exp(1-z) * ∫ exp( - u*u) du*(sqrt(2 * (z - 1))

Where I have sneakily made the substitution (t-(z-1))*(t-(z-1))/(2*(z-1)) = u*u, and the limits of the integral are now from -infinity to + infinity. Remember that away from t = (z-1), everything is pretty much zero (those magic words were exponentially small) and it doesn't matter if we integrate over that space or not. It's convenient if we do. We recall our favourite Gaussian integral is equal to sqrt(pi) and that, for this to be a good approximation, z must be large, to find:

z! ~ Gamma(z+1) ~ z^(z + 1/2) * sqrt(pi) exp(-z).

Higher Order Terms

There are two approximations I made: To neglect the exponentially small region (which is really, really, really tiny, okay? There's no point keeping it here - although that's not always true) and to drop some of the larger terms from h. If I'd kept them in h, then I would have ended up with something that looks like:

exp(A + Bu*u + Cu*u*u + ...)

in my integrand. Well, an easy thing to do here (though by no means the only thing) is to expand out some of that exponential:

=exp(A + B u*u + 0 + ...)(1 + Cu*u*u + ...)

which is another integral I can evaluate, term by term. Note that with the substitution I made earlier I'd pick up another power of z^{-1/2}, making this term in some sense smaller than the previous. Then I'd find a term which looked like 1/(12*z) in my expansion; the next lowest term.

Why do this?

This one? It's a nice example of Laplace's Method, and is one of the more basic asymptotic problems. In general, if one has a nasty integral which one cannot evaluate analytically, it is often convenient to find an asymptotic expansion which is cheap computationally and retains the basic mathematical ingredients. Please believe me when I say this is not some cheap trick one might use once, but is a powerful tool in applied mathematics.

See Also

A symmetric function is a polynomial or rational function (quotient of polynomials) in n variables which remains invariant no matter how you permute variables (e.g. swap x1 with x2). They feature prominently in Galois theory. The elementary symmetric functions appear as the coefficients of a polynomial in n indeterminates (i.e. the coefficients of f(t) = (t - x1)···(t - xn) ), and the fundamental theorem of symmetric functions says that any symmetric function can be expressed as a polynomial or rational function of elementary symmetric functions. When the original function isn't symmetric, we can still say something interesting.

Theorem: Let g(x) be any polynomial where x = (x1, ..., xn) are n variables, and let s1, ..., sn be the elementary symmetric functions in n variables. Then g(x) can be written as a linear combination of monomials

x1ν1 x2ν2 ··· xnνn

such that νii - 1 and the coefficients of the monomials are polynomials in the si.

This theorem, seemingly due to Emil Artin, is a slight generalisation of the fundamental theorem of symmetric functions. It gives the closest possible expression of any polynomial in terms of symmetric functions no matter if the original polynomial is symmetric or not. Or if you prefer, the fundamental theorm of symmetric functions comes as an easy corollary to this theorem.

The corollary is obvious. Observe that the nature of monomials is such that they can't be symmetrised, because the powers in the monomial have to be nondecreasing by indices. Thus, if the original polynomial g(x) was symmetric, then the only way it can still be symmetric after being written in this form is if the only monomial with nonzero coefficient is the one for which all the νi are zero, i.e. the constant term. But then the constant term is a polynomial of elementary symmetric functions, proving the corollary.

The proof is an algorithm for putting g(x) in the desired form.

Proof: Let fn(t) := (t - x1)(t - x2 )···(t - xn) = tn - s1tn-1 + ··· + (-1)nsn and define recursively

fi - 1(t) := fi(t)/(t - xi).

Three things are immediately clear:

  • The polynomial fi(t) has xi as a root, the other roots being the other xj with j < i, because it's just fn(t) with the last n - i linear factors divided away.
  • By synthetic division and by the recursive definition, the coefficients of fi(t) are polynomials in terms of the elementary symmetric functions and the xj with j > i.
  • The degree of fi(t) is i.

Now for the algorithm to put g(x) in the desired form. Since x1 is a root of f1(t), it is possible to express x1 in terms of the symmetric functions si and the rest of the xi with i > 1. Substitute this expression of x1 into g(x), and expand out the result, which does not contain any term with x1 now.

We proceed recursively as follows. Since x2 is a root of f2(t), it is possible to express x22 or any higher power in terms of the symmetric functions si and the rest of the xi with i > 2, with perhaps a few terms of x2 of degree less than 2. Substitute this expression of x22 (or higher) into g(x), and expand out the result, which no longer contains any term with x22 or higher degree.

Continuing in this process of eliminating all third powers of x3 or higher with f3(t), all fourth powers of x4 or higher with f4(t), we obtain the desired form for g(x).

QED.

Let's work out an example. Unfortunately, the only way to make an interesting enough example involves heavy computations. I will work out some steps of the example, but I will leave most of the boring manipulations to Maxima or to a diligent reader.

Let us consider the symmetric polynomial in 3 variables

g(x) = x12x2 + x12x3 + x22x1 + x22x3 + x32x1 + x32x2

Now, in 3 variables, the fi(t) from the proof above are

f3(t) = t3 - s1t2 + s2t - s3,
f2(t) = t2 + (x3 - s1)t + (s2 - s1x3 + x32),
f1(t) = t - s1 + x2 + x3.

Recall that f2 and f1 are obtained by symbolic synthetic division of the polynomial above them and that the remainders are zero. Also, recall at this point that the elementary symmetric functions in three variables are

s1 = x1 + x2 + x3,
s2 = x1x2 + x1x3 + x2x3,
s3 = x1x2x3.

Since f1(x1) = 0, f2(x2) = 0 and f3(x3) = 0, we obtain that

x1 = s1 - x2 - x3,
x22 = s1x2 + s1x3 - s2 - x2x3 - x32,
x33 = s1x32 - s2x3 + s3.

So, the algorithm now says to replace this expression for x1 into g(x), which after expanding everything out becomes

3x2x32 - s1x32 + 3x22x3 - 4s1x2 x3 + s12x3 - s1x22 + s12x2.

Note that we have succeeded in eliminating x1 from this expression. Now we do the same with x22, to obtain

-3x33 + 3s1x32 - 3s2x3 + s1s2.

Finally we replace x33 by its own expression to conclude that

g(x) = s1s2 - 3s3,

which is the expression of g(x) in terms of elementary symmetric functions that we sought.

No joke — there's a semi-serious proof involved here. We're looking for the smallest number that could easily be mistaken for prime but in fact is not. How do we find it? Well, since it's not prime, let's look for its prime factors.
  • The number can't be a multiple of twoeven numbers are too easy to spot.
  • The number can't be a multiple of three — there's an easy test for that.
  • The number can't be a multiple of five — thanks to our base ten number system, it's too easy to find those.
  • Seven? Sure, why not? Multiples of seven don't look special at all. But we need more than one prime factor -- everybody knows that 49 is seven squared.
  • The other factor can't be eleven — 7 x 11 = 77, obviously not prime. Multiples of 11, especially low ones, are fairly obvious.
  • But what about thirteen? 7 x 13 = 91. That...looks prime.
So there you have it. A rigorous proof that the smallest number that looks prime but isn't is 91. Use this to impress your friends and shame your enemies at cocktail parties.

But what about the smallest number that can't be described in fewer than 15 words?