Dijkstra's algorithm is the faster of two common algorithms to find all single-source shortest paths in a graph. That is, given a graph G and a root vertex r, Dijkstra will provide you with the shortest path from r to u where u is any vertex in G.

Notation

This writeup attempts to convey the essence of Dijkstra's algorithm with a minimum of formal math, nevertheless the following definitions are needed for clarity:

G = (V,E) - G is a graph which is defined by a set of vectors V and a set of edges E.
(u,v) - An arbitrary edge of G. Since shortest paths are most applicable to directed graphs, assume u is the source vertex and v is the target vertex.
w(u,v) - The weight of edge (u,v). This is technically a part of E.
d(u) - The total weight of the shortest path to u. When we start, the entire array is initialized to infinity (except the root). This is updated as we run through the algorithm.
p(u) - The predecessor of u on the shortest path. Since the shortest paths form a tree structure (ie. no cycles), every vertex needs only point to a predecessor to point the way back to the root. This is also updated as we go.
r - The root vertex. d(r) and p(r) can be initialized to special values indicating that it is the root, but it is a minor detail since the caller of the Dijkstra algorithm knows what root it requested.
Q - A priority queue used in the algorithm to keep track of the running lengths of unfound shortest paths.

Dijkstra's Insight

The Bellman-Ford algorithm is a more robust algorithm that solves the problem for all graphs with no negative-weight cycles. If a graph has a negative-weight cycle then obviously no shortest path exists because you can run through the cycle infinitely decreasing any path length. Essentially Bellman-Ford looks at every vertex |V| (number of vertices) times. At every loop, Bellman-Ford finds the shortest paths of the same length as the number of loops so far. After |V| loops we must have found all the shortest paths, because any longer path would have a cycle (by the pigeon hole principle), and in the absence of negative-weight cycles, the removal of such a cycle would produce a shorter path.

Dijkstra's is a greedy algorithm that adds an additional constraint in order to greatly decrease the number of edges considered at each step. Dijkstra's algorithm is only guaranteed to work on graphs that have no negative-weight edges at all. This constraint does little to diminish the usefulness of the algorithm since negative-weight edges are not normally found in real-world graphs (eg. networks).

Dijkstra's insight was quite simple, and very similar to Prim's algorithm. The idea is that if you split G into two sets of vertices A and B, the first being the set of vertices for which a shortest path is already known, and the second being all other vertices, you only need to consider edges crossing A and B to discover the shortest path to another vertex. If w(u,v) + d(u) is the shortest path, then you have found the shortest path to v. Why is this correct? Because any path to any other vertex will be longer, and because we have no negative weight edges, any longer paths to v will necessarily be longer still.

Bellman-Ford and Dijkstra's both run an outer loop |V| times, but Bellman-Ford considered every single edge, while Dijkstra limits itself to a select group of edges.

The only real tricky part of implementation is finding the next shortest path in each loop. Naively checking each vertex in B results in an algorithm that ends up being not much faster than Bellman-Ford. The trick is to keep a priority queue that contains all vertices in B and using d(u) as the key for every u in B.

Implementation of Dijkstra's Algorithm

The following is a pseudo-code rendition of Dijkstra's algorithm using mostly C-like syntax. Suggestions on improving clarity welcome.

Dijkstra(G,w,r) {
    initialize d(u) for all u to infinity
    initialize d(r) to 0
    initialize Q with all u using d(u) as keys
    while (Q is not empty) {
        u = Q.extractMin()
        for each vertex v that is adjacent to u (ie. u -> v) {
            if d(v) > d(u) + w(u,v) {
                d(v) = d(u) + w(u,v)
                p(v) = u
            }
        }
    }
    
    return p //A Predecessor vector is the most efficient representation of the final tree.
}