Join is an SQL ("Structured Query Language") keyword. SQL is a language used to talk to databases.

A join operation creates a temporary database table or "record set", composed of elements of two existing tables. There are a few different ways you might go about joining. In all cases, each record in the new table is composed of at most one record from each of the two tables being joined. Your join statement will say how the database should decide which records go together. You can think of a "record" as a "row" much like in a spreadsheet such as Excel, and a "field" as a "column" (again, as in a spreadsheet).

You can also take the result of one join operation, and join that with another table, or with the result of another join operation.

We'll define two very small tables and use those for each example. Even though SQL is case-blind, we'll use all caps for keywords so they stand out clearly. Each of these examples begins with "SELECT * FROM". That's SQL for "give me all of the fields in the following records". We could select only some of the fields, but for the sake of simplicity we won't do that.

    Table A: People
    person_id   lastname    firstname       state_id
    -------------------------------------------------
    1           Bloggs      Joe             PA
    2           Sped        Fred            VT
    3           Martian     Marvin          MA
    4           Necropolis  Ashram          PA

    Table B: States
    state_id    statename
    -------------------------------------------------
    PA          Pennsylvania
    VT          Vermont
    OH          Ohio

Inner join

SELECT * FROM People INNER JOIN States ON People.state_id=States.state_id;

An "inner join" creates new records only where the condition (People.state_id=States.state_id, in this case) holds true. Think of it like this: The database looks at each record in People in turn. If the state_id for that record matches the state_id of a record in the States table, those two records are merged together into one record, which is added to the result table.

Here's the result table for this one:

person_id   lastname    firstname   People.state_id States.state_id statename
--------------------------------------------------------------------------------
1           Bloggs      Joe         PA              PA              Pennsylvania
4           Necropolis  Ashram      PA              PA              Pennsylvania
2           Sped        Fred        VT              VT              Vermont

You'll notice that Marvin Martian is not included: His state_id is "MA", and there is no record in the States table where state_id="MA".

Outer Joins

Outer joins come in two flavors: Left outer join, and right outer join. The only kind of left join is a left outer join, and the only kind of right join is a right outer join, so you're not required to specify "outer" for either one. We'll specify it for the sake of clarity.

Left join (left outer join)

SELECT * FROM People LEFT OUTER JOIN States ON People.state_id=States.state_id;

In a left join, the result record set will include a record for each record in the "left" table (People, in this case), regardless of what is or is not found in the other table. Records will merged where our "on" criterion is met. Therefore, our result table looks like this:

person_id   lastname    firstname   People.state_id States.state_id statename
--------------------------------------------------------------------------------
3           Martian     Marvin      MA      
1           Bloggs      Joe         PA              PA              Pennsylvania
4           Necropolis  Ashram      PA              PA              Pennsylvania
2           Sped        Fred        VT              VT              Vermont

Note that Marvin Martian is included this time. His record includes nothing from the States table: That's why it got left out last time. Note also that we haven't seen hide nor hair of Ohio in any of these operations. That's because we've included information from the States table only when the state_id for a States record matches a state_id in a People record. Since "OH" never appears in the People table, it has never appeared in our results.

Would you like to see Ohio? It's lovely this time of year. Let's move ahead and hope that Ohio joins the party in our last example.

Right join (right outer join)

SELECT * FROM People RIGHT OUTER JOIN States ON People.state_id=States.state_id;

A right join is like a left join except that we include all records from the right table rather than the left table. As in the left join above, records will be merged where the "on" criterion is met. Let's see our results for this query:

person_id   lastname    firstname   People.state_id States.state_id statename
--------------------------------------------------------------------------------
                                                    OH              Ohio
1           Bloggs      Joe         PA              PA              Pennsylvania
4           Necropolis  Ashram      PA              PA              Pennsylvania
2           Sped        Fred        VT              VT              Vermont

Finally, we can see Ohio. That's because we're including everything from States. We've misplaced Marvin Martian again, because no record in States has a state_id matching his. We're also seeing another interesting point: If the "on" criterion for the same record is met twice ("PA"/"Pennsylvania" in this case), we'll get a new merged record each time that happens.

How do you choose which join to use? That depends entirely on what information you're looking for. There are also questions of efficiency.

What we've learned from all this

The terms "inner" and "outer" turn out to be descriptive, in a metaphorical way: An outer join is much more free and easy than the straightlaced, inward-looking inner join.

We've only scratched the surface of SQL, but we've already seen that SQL is a very high level language: We're asking the database to perform complicated operations, with much searching and comparing and heaving of data to and fro. What's the binary representation of a database table? What's the algorithm for choosing records for each of these joins? Those questions are not addressed. We don't know even how to begin expressing the answers in this language. We do not, in fact, have any concept of process here at all: We're not describing a sequence of actions. Instead, we're defining the expected result and leaving everything else to the implementation. This is just a very terse, stilted equivalent of somebody in a tie yelling "Bill? Get me a list of all our customers!" If poor Bill was very bad in a former life, he might just write an SQL query to generate that list, or some more elaborate list that a lunatic in a tie would find more appealing.

Yeah, but...

Why do we have two tables at all? Why not just put the full text for the appropriate state name in each People record? Because states have more than names. They also have state songs, and state flowers, and average yearly rainfall. There could be half a dozen different pieces of information associated with each of the fifty states (and don't forget Guam). Are you going to reproduce all of that in each People record? That would be painful. What if somebody moves? Then you've got to change half a dozen fields in his record instead of just one simple two-letter code. The reality of the data entry process is that mistakes happen: At some point, you'll have one state flower for Kansas in Jane Doe's record, and a different one for Mary Roe. Furthermore, Kansas might change their state flower next week. Then you'll have to update hundreds, thousands, or even millions of records with the new information. If all of those records just have a state_id field equal to "KS", like the arrangement we have above, then you'll only have to change the Kansan state flower in one place. This also takes up less room on your disk, and with enough records that becomes a real issue. You also save yourself trouble if you decide to start tracking the official state lizard along with the rest: With a separate States table, you'll only have to add a new field to fifty or sixty different records. With one monolithic table you'd have to add (and then populate) a new field in, potentially, millions of records. Breaking tables up logically makes life easier at every point until you actually need the information back. When that happens, you just write a join query as we've outlined above.