display | more...

Here’s the premise. It’s important that you keep this in mind, or the model won’t make sense.

Andy encounters a poster announcing an upcoming Beer Festival, with little information other than “Soon!”

Andy doesn’t know if he wants to go.

That’s the premise. Andy wants to know if he’s going to the Festival or not. Therefore, Andy does not know if he’s going, and wants a simple yes/no answer.

We will start simple and get progressively more complex, so please bear with me.

Step 0: pro/con list

Why not use a list of pros and cons? It’s a supremely easy task.

So, Andy writes down a two-column table like this:

  Yes No
Money? ? ?
Date? ? ?
Weather? ? ?
Distance? ? ?

For now, let’s imagine only these four questions: Do I have enough money? Does that cute girl want to go with me? Will it be a nice day out? Is it close?

Easy peasy! All that’s needed is to answer these four objective questions and if there’s more “yes”es than “no”es, Andy will go. Why need anything else?

Step 1: Assigning importance

There’s something wrong with that. For instance, maybe she says yes, it’s a nice day out and the festival will be close, but I have no money. In this case, this carries a larger weight in the overall decision.

No problem! Maybe we can assign some kind of score to each vote. Money, as seen, is important, but maybe having a date is not. We can set these scores so that they reflect their relative importance. Our table now maybe looks like this:

  Importance Yes No
Money? 3 ? ?
Date? 1 ? ?
Weather? 1 ? ?
Distance? 2 ? ?
Total 7 - -

Great! Now, for every “yes” we will add the score. We don’t really need to change our final decision: if the total score at the end is 4 or more, Andy will go to the Festival. Easy, right?

Step 2: A layer of abstraction

The table is not very complicated for a reason: we have lousy questions. Look at the first one. “Money?” What does that mean?

No, really, what does that mean? The first thing that pops to mind is “Do I have enough money?” which is fine and well, but it has two problems.

The first is that “enough” is a very loose term. How much is enough? It’s obvious that “enough money” in this context must lie somewhere between zero and, say, my monthly paycheck. But where exactly? Given that this is non-critical spending, there should be a budget or limit of sorts, it’s insane to spend every cent available in a trivial thing.

Which points us to the second problem: when we say “enough” we mean “enough available money”: while there’s a fixed amount at any one time, Andy must constantly move money around for various purposes, there’s “ins” and “outs.” So we can refine the vague term “enough” into something like “enough money after critical expenditure.”

It’s easy to see how the other questions have the same problems: they don’t have a good yes/no “rule” and they all depend on other, more basic questions. Let’s tackle those.

Our model right now looks like this:

+-----------+  +/-  +---+
| Money?    +------>+ 3 +--------+
+-----------+       +---+        |
+-----------+  +/-  +---+        v
| Date?     +------>+ 1 +----->+-+-----+
+-----------+       +---+      | Grand |
+-----------+  +/-  +---+      | Total |
| Weather?  +------>+ 1 +----->+-+-----+
+-----------+       +---+        ^
+-----------+  +/-  +---+        |
| Distance? +------>+ 2 +--------+
+-----------+       +---+

So far, no major changes. Instead of a “Yes/No” question, we will use a “+/-,” whose meanings I believe to be obvious.

We will address the second question first: How to decide a “yes/no” for each of these questions? We need to include it somewhere:

+-----------------------------------------+  +/-  +---+
| Money?    || More than $500?            +------>+ 3 +--------+
+-----------------------------------------+       +---+        |
+-----------------------------------------+  +/-  +---+        v
| Date?     || Is someone else coming?    +------>+ 1 +----->+-+-----+
+-----------------------------------------+       +---+      | Grand |
+-----------------------------------------+  +/-  +---+      | Total |
| Weather?  || Will it be not-rainy?      +------>+ 1 +----->+-+-----+
+-----------------------------------------+       +---+        ^
+-----------------------------------------+  +/-  +---+        |
| Distance? || Can I get there with <$10? +------>+ 2 +--------+
+-----------------------------------------+       +---+

And we can also add the necessary preconditions. For the sake of this writeup I will simplify these as well:

| Money in  +-----------------------+
+-----------+                       |
+-----------+                       v
| Money out +-----------------------+
+-----------+                       |
+--------------+                    |      +-----------------------------------------+  +/-  +---+
| Cute barista +-------------+      +----->+ Money?    || More than $500?            +------>+ 3 +--------+
+--------------+             |             +-----------------------------------------+       +---+        |
+--------------+             v             +-----------------------------------------+  +/-  +---+        v
| Tim          +-------------+------------>+ Date?     || Is someone else coming?    +------>+ 1 +----->+-+-----+
+--------------+                           +-----------------------------------------+       +---+      | Grand |
+-----------------------+                  +-----------------------------------------+  +/-  +---+      | Total |
| Sunny?                +----+------------>+ Weather?  || Will it be not-rainy?      +------>+ 1 +----->+-+-----+
+-----------------------+    ^             +-----------------------------------------+       +---+        ^
+-----------------------+    |             +-----------------------------------------+  +/-  +---+        |
| Cloudy but not rainy? +----+      +----->+ Distance? || Can I get there with <$10? +------>+ 2 +--------+
+-----------------------+           |      +-----------------------------------------+       +---+
+----------------------------+      |
| Can I get there on subway? +------+
+----------------------------+      ^
+----------------------------+      |
| Can I get there on bike?   +------+

This is definitely more complex! Remind me again why we’re doing this.

Step 3: Putting it all together! But before that…

Look no further, for we have constructed a Neural Network!…

…kind of. This is a real, useful Neural Network in the same way a Hot Wheels is a car. What you see here is a toy model, but one that has all the main parts of the real thing. I’ll proceed to describe them here, but if you want to skip the math, just go to the next header.

Neural networks (NN), in general, have “neurons”1 that we approximate with every box in our model. In reality they hold a mathematical value, usually between -1 and 1, which we approximate with a “yes/no” answer.

A real NN also has Importance factors, but they are called weights. They more or less do the same thing: they change the relative value of each neuron to the next. In more complicated setups, the weights reflect how much of variable x affects the next neuron y

A real NN has “specific questions” like the ones we have here: some kind of rule that determines the “answer” each neuron outputs. In our model this question has the form “Do I have more than $500 to spare?” in reality it’s an activation function that specifies what will come out of each neuron.

A real NN has layers like our model; and it generally is for the same reasons: layers allow to capture or model higher levels of abstraction.2 In our model, all questions regarding cash in the end lead to the abstract concept of “Money.” In a real NN… well, sorta.3

In a real NN, these neurons have multiple connections between layers. Our toy example only has a few connections for clarity, but you could imagine how—for instance—my balance affects not only my general spending, but the “Distance” neuron as well: if I had $1,000,000 in my bank, I can surely pay an Uber there and back.

And before you go, in general you can extend this model a lot more. A real NN will most likely have more layers, the neurons will have proper weights and are a lot more connected to each other. This, again, is a toy model.4

Step 4: Cool! But why?

Why this rigmarole for a simple thing?

Granted, if the problem is as small and trivial as “Do I want to attend a beer festival?” building a NN is overkill. You’ll spend more time building one than actually using it. But, as you may already know, NN are not built for this.

What are they good for, then?

Remember the premise that kicked off all this? There are simple “inputs” that we do know (balance, companionship, etc) and there’s an output that we don’t. NNs are used when you expect to see problems with similar circumstances in the future: with known inputs and an unknown desired output. A real NN would—to take an example from my line of work—take an image as an input and give some kind of output, like “what digit this squiggle is meant to be” or “which parts of this satellite photo are streets and which are buildings.”5

So even if we really built that NN, so far it’s useful only to us. The true usefulness of a NN would be to build it so that it can “give” a “correct” answer to the same problem in different scenarios.6 In other words, our NN would really be useful only when it could take your details (money in the bank, house location, etc.) and reliably predict whether you want to attend the festival.

And that points to the next discussion, how do you “train” a NN? How does it “learn?” That will come later, for I need to try and find a nice analogy as beginner-friendly as possible (partial differential equations can be scary).

  1. They are obviously not real, biological neurons. They are usually called this because the simplest Neural Networks were designed as artificial analogues to how the brain is thought to work down to the neuronal level. Brains, however, don’t really work this way.

    Some authors prefer the term “perceptron” which is more correct from an historical point of view, but are not as intuitive for an essay for beginners.

  2. But only up to a point. More layers are not always desirable for several reasons (like overfitting) but they fall out of the scope of this essay.

  3. Generally speaking, more layers in a NN do help capture more abstract behaviors and patterns, but that doesn’t mean those are “real” abstract categories in the way humans think. Neural networks need human analogies to be explained in simple terms, but they are just that: analogies.

  4. It’s a toy model of one architecture of Neural network, arranged in one particular way, with one particular flow of information. For more serious discussions, expect variations on pretty much every part of this network.

  5. Lots of NN do one of two things: take a thing and partition it, or take a thing and establish which category it belongs to. There’s of course a million variations.

  6. Apologies for the excess quotation marks, but I prefer to make explicit the analogies and linguistic approximations than to leave them and further the narrative that NNs are essentially magic. That discussion will have to come later.

Log in or register to write something here or to contact authors.